irodkin commited on
Commit
45b57e0
·
verified ·
1 Parent(s): dec028f

Training checkpoint at step 2000

Browse files
Files changed (1) hide show
  1. trainer_state.json +1886 -6
trainer_state.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "best_global_step": 1000,
3
- "best_metric": 2.488457202911377,
4
- "best_model_checkpoint": "../runs/karpathy/fineweb-edu-100b-shuffle/meta-llama/Llama-3.2-1B/linear_adamw_wd1e-03_7x1024_mem32_bs64_hf_armt_dmem64/run_20/checkpoint-1000",
5
- "epoch": 0.02,
6
  "eval_steps": 5,
7
- "global_step": 1000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -1888,6 +1888,1886 @@
1888
  "eval_samples_per_second": 3.464,
1889
  "eval_steps_per_second": 1.747,
1890
  "step": 1000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1891
  }
1892
  ],
1893
  "logging_steps": 25,
@@ -1907,7 +3787,7 @@
1907
  "attributes": {}
1908
  }
1909
  },
1910
- "total_flos": 2.785301757633233e+18,
1911
  "train_batch_size": 1,
1912
  "trial_name": null,
1913
  "trial_params": null
 
1
  {
2
+ "best_global_step": 2000,
3
+ "best_metric": 2.449084520339966,
4
+ "best_model_checkpoint": "../runs/karpathy/fineweb-edu-100b-shuffle/meta-llama/Llama-3.2-1B/linear_adamw_wd1e-03_7x1024_mem32_bs64_hf_armt_dmem64/run_20/checkpoint-2000",
5
+ "epoch": 0.04,
6
  "eval_steps": 5,
7
+ "global_step": 2000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
1888
  "eval_samples_per_second": 3.464,
1889
  "eval_steps_per_second": 1.747,
1890
  "step": 1000
1891
+ },
1892
+ {
1893
+ "epoch": 0.0201,
1894
+ "eval_loss": 2.4881434440612793,
1895
+ "eval_runtime": 33.6922,
1896
+ "eval_samples_per_second": 3.473,
1897
+ "eval_steps_per_second": 1.751,
1898
+ "step": 1005
1899
+ },
1900
+ {
1901
+ "epoch": 0.0202,
1902
+ "eval_loss": 2.4879722595214844,
1903
+ "eval_runtime": 33.6857,
1904
+ "eval_samples_per_second": 3.473,
1905
+ "eval_steps_per_second": 1.751,
1906
+ "step": 1010
1907
+ },
1908
+ {
1909
+ "epoch": 0.0203,
1910
+ "eval_loss": 2.4876134395599365,
1911
+ "eval_runtime": 33.7945,
1912
+ "eval_samples_per_second": 3.462,
1913
+ "eval_steps_per_second": 1.746,
1914
+ "step": 1015
1915
+ },
1916
+ {
1917
+ "epoch": 0.0204,
1918
+ "eval_loss": 2.4872164726257324,
1919
+ "eval_runtime": 33.7811,
1920
+ "eval_samples_per_second": 3.463,
1921
+ "eval_steps_per_second": 1.747,
1922
+ "step": 1020
1923
+ },
1924
+ {
1925
+ "epoch": 0.0205,
1926
+ "grad_norm": 0.04204734602618554,
1927
+ "learning_rate": 2.048e-06,
1928
+ "loss": 2.4708,
1929
+ "step": 1025
1930
+ },
1931
+ {
1932
+ "epoch": 0.0205,
1933
+ "eval_loss": 2.48695707321167,
1934
+ "eval_runtime": 33.821,
1935
+ "eval_samples_per_second": 3.459,
1936
+ "eval_steps_per_second": 1.744,
1937
+ "step": 1025
1938
+ },
1939
+ {
1940
+ "epoch": 0.0206,
1941
+ "eval_loss": 2.486564874649048,
1942
+ "eval_runtime": 33.82,
1943
+ "eval_samples_per_second": 3.459,
1944
+ "eval_steps_per_second": 1.745,
1945
+ "step": 1030
1946
+ },
1947
+ {
1948
+ "epoch": 0.0207,
1949
+ "eval_loss": 2.486281633377075,
1950
+ "eval_runtime": 33.927,
1951
+ "eval_samples_per_second": 3.449,
1952
+ "eval_steps_per_second": 1.739,
1953
+ "step": 1035
1954
+ },
1955
+ {
1956
+ "epoch": 0.0208,
1957
+ "eval_loss": 2.4860103130340576,
1958
+ "eval_runtime": 33.9697,
1959
+ "eval_samples_per_second": 3.444,
1960
+ "eval_steps_per_second": 1.737,
1961
+ "step": 1040
1962
+ },
1963
+ {
1964
+ "epoch": 0.0209,
1965
+ "eval_loss": 2.4855759143829346,
1966
+ "eval_runtime": 33.9097,
1967
+ "eval_samples_per_second": 3.45,
1968
+ "eval_steps_per_second": 1.74,
1969
+ "step": 1045
1970
+ },
1971
+ {
1972
+ "epoch": 0.021,
1973
+ "grad_norm": 0.03813289834436041,
1974
+ "learning_rate": 2.098e-06,
1975
+ "loss": 2.4799,
1976
+ "step": 1050
1977
+ },
1978
+ {
1979
+ "epoch": 0.021,
1980
+ "eval_loss": 2.485349416732788,
1981
+ "eval_runtime": 34.0131,
1982
+ "eval_samples_per_second": 3.44,
1983
+ "eval_steps_per_second": 1.735,
1984
+ "step": 1050
1985
+ },
1986
+ {
1987
+ "epoch": 0.0211,
1988
+ "eval_loss": 2.48506498336792,
1989
+ "eval_runtime": 34.036,
1990
+ "eval_samples_per_second": 3.438,
1991
+ "eval_steps_per_second": 1.733,
1992
+ "step": 1055
1993
+ },
1994
+ {
1995
+ "epoch": 0.0212,
1996
+ "eval_loss": 2.484771966934204,
1997
+ "eval_runtime": 34.0842,
1998
+ "eval_samples_per_second": 3.433,
1999
+ "eval_steps_per_second": 1.731,
2000
+ "step": 1060
2001
+ },
2002
+ {
2003
+ "epoch": 0.0213,
2004
+ "eval_loss": 2.4846508502960205,
2005
+ "eval_runtime": 34.0289,
2006
+ "eval_samples_per_second": 3.438,
2007
+ "eval_steps_per_second": 1.734,
2008
+ "step": 1065
2009
+ },
2010
+ {
2011
+ "epoch": 0.0214,
2012
+ "eval_loss": 2.484158992767334,
2013
+ "eval_runtime": 34.0038,
2014
+ "eval_samples_per_second": 3.441,
2015
+ "eval_steps_per_second": 1.735,
2016
+ "step": 1070
2017
+ },
2018
+ {
2019
+ "epoch": 0.0215,
2020
+ "grad_norm": 0.04289680570208033,
2021
+ "learning_rate": 2.148e-06,
2022
+ "loss": 2.4822,
2023
+ "step": 1075
2024
+ },
2025
+ {
2026
+ "epoch": 0.0215,
2027
+ "eval_loss": 2.483947992324829,
2028
+ "eval_runtime": 33.9604,
2029
+ "eval_samples_per_second": 3.445,
2030
+ "eval_steps_per_second": 1.737,
2031
+ "step": 1075
2032
+ },
2033
+ {
2034
+ "epoch": 0.0216,
2035
+ "eval_loss": 2.4836008548736572,
2036
+ "eval_runtime": 33.9465,
2037
+ "eval_samples_per_second": 3.447,
2038
+ "eval_steps_per_second": 1.738,
2039
+ "step": 1080
2040
+ },
2041
+ {
2042
+ "epoch": 0.0217,
2043
+ "eval_loss": 2.483187675476074,
2044
+ "eval_runtime": 34.1344,
2045
+ "eval_samples_per_second": 3.428,
2046
+ "eval_steps_per_second": 1.728,
2047
+ "step": 1085
2048
+ },
2049
+ {
2050
+ "epoch": 0.0218,
2051
+ "eval_loss": 2.4829964637756348,
2052
+ "eval_runtime": 34.0915,
2053
+ "eval_samples_per_second": 3.432,
2054
+ "eval_steps_per_second": 1.731,
2055
+ "step": 1090
2056
+ },
2057
+ {
2058
+ "epoch": 0.0219,
2059
+ "eval_loss": 2.482805013656616,
2060
+ "eval_runtime": 33.9291,
2061
+ "eval_samples_per_second": 3.448,
2062
+ "eval_steps_per_second": 1.739,
2063
+ "step": 1095
2064
+ },
2065
+ {
2066
+ "epoch": 0.022,
2067
+ "grad_norm": 0.03972633299982532,
2068
+ "learning_rate": 2.198e-06,
2069
+ "loss": 2.4871,
2070
+ "step": 1100
2071
+ },
2072
+ {
2073
+ "epoch": 0.022,
2074
+ "eval_loss": 2.482428550720215,
2075
+ "eval_runtime": 33.7324,
2076
+ "eval_samples_per_second": 3.468,
2077
+ "eval_steps_per_second": 1.749,
2078
+ "step": 1100
2079
+ },
2080
+ {
2081
+ "epoch": 0.0221,
2082
+ "eval_loss": 2.4822213649749756,
2083
+ "eval_runtime": 33.7954,
2084
+ "eval_samples_per_second": 3.462,
2085
+ "eval_steps_per_second": 1.746,
2086
+ "step": 1105
2087
+ },
2088
+ {
2089
+ "epoch": 0.0222,
2090
+ "eval_loss": 2.481689214706421,
2091
+ "eval_runtime": 33.7787,
2092
+ "eval_samples_per_second": 3.464,
2093
+ "eval_steps_per_second": 1.747,
2094
+ "step": 1110
2095
+ },
2096
+ {
2097
+ "epoch": 0.0223,
2098
+ "eval_loss": 2.481731414794922,
2099
+ "eval_runtime": 33.6129,
2100
+ "eval_samples_per_second": 3.481,
2101
+ "eval_steps_per_second": 1.755,
2102
+ "step": 1115
2103
+ },
2104
+ {
2105
+ "epoch": 0.0224,
2106
+ "eval_loss": 2.4812448024749756,
2107
+ "eval_runtime": 33.511,
2108
+ "eval_samples_per_second": 3.491,
2109
+ "eval_steps_per_second": 1.761,
2110
+ "step": 1120
2111
+ },
2112
+ {
2113
+ "epoch": 0.0225,
2114
+ "grad_norm": 0.041792864961431496,
2115
+ "learning_rate": 2.2480000000000003e-06,
2116
+ "loss": 2.4766,
2117
+ "step": 1125
2118
+ },
2119
+ {
2120
+ "epoch": 0.0225,
2121
+ "eval_loss": 2.4809837341308594,
2122
+ "eval_runtime": 33.7009,
2123
+ "eval_samples_per_second": 3.472,
2124
+ "eval_steps_per_second": 1.751,
2125
+ "step": 1125
2126
+ },
2127
+ {
2128
+ "epoch": 0.0226,
2129
+ "eval_loss": 2.480768918991089,
2130
+ "eval_runtime": 33.6615,
2131
+ "eval_samples_per_second": 3.476,
2132
+ "eval_steps_per_second": 1.753,
2133
+ "step": 1130
2134
+ },
2135
+ {
2136
+ "epoch": 0.0227,
2137
+ "eval_loss": 2.480337381362915,
2138
+ "eval_runtime": 33.6203,
2139
+ "eval_samples_per_second": 3.48,
2140
+ "eval_steps_per_second": 1.755,
2141
+ "step": 1135
2142
+ },
2143
+ {
2144
+ "epoch": 0.0228,
2145
+ "eval_loss": 2.4803271293640137,
2146
+ "eval_runtime": 33.6559,
2147
+ "eval_samples_per_second": 3.476,
2148
+ "eval_steps_per_second": 1.753,
2149
+ "step": 1140
2150
+ },
2151
+ {
2152
+ "epoch": 0.0229,
2153
+ "eval_loss": 2.4799482822418213,
2154
+ "eval_runtime": 33.5023,
2155
+ "eval_samples_per_second": 3.492,
2156
+ "eval_steps_per_second": 1.761,
2157
+ "step": 1145
2158
+ },
2159
+ {
2160
+ "epoch": 0.023,
2161
+ "grad_norm": 0.035383899567194975,
2162
+ "learning_rate": 2.2980000000000003e-06,
2163
+ "loss": 2.4749,
2164
+ "step": 1150
2165
+ },
2166
+ {
2167
+ "epoch": 0.023,
2168
+ "eval_loss": 2.479668140411377,
2169
+ "eval_runtime": 33.4615,
2170
+ "eval_samples_per_second": 3.497,
2171
+ "eval_steps_per_second": 1.763,
2172
+ "step": 1150
2173
+ },
2174
+ {
2175
+ "epoch": 0.0231,
2176
+ "eval_loss": 2.4794092178344727,
2177
+ "eval_runtime": 33.4264,
2178
+ "eval_samples_per_second": 3.5,
2179
+ "eval_steps_per_second": 1.765,
2180
+ "step": 1155
2181
+ },
2182
+ {
2183
+ "epoch": 0.0232,
2184
+ "eval_loss": 2.4790964126586914,
2185
+ "eval_runtime": 33.4165,
2186
+ "eval_samples_per_second": 3.501,
2187
+ "eval_steps_per_second": 1.766,
2188
+ "step": 1160
2189
+ },
2190
+ {
2191
+ "epoch": 0.0233,
2192
+ "eval_loss": 2.4789323806762695,
2193
+ "eval_runtime": 33.2576,
2194
+ "eval_samples_per_second": 3.518,
2195
+ "eval_steps_per_second": 1.774,
2196
+ "step": 1165
2197
+ },
2198
+ {
2199
+ "epoch": 0.0234,
2200
+ "eval_loss": 2.4786429405212402,
2201
+ "eval_runtime": 33.3028,
2202
+ "eval_samples_per_second": 3.513,
2203
+ "eval_steps_per_second": 1.772,
2204
+ "step": 1170
2205
+ },
2206
+ {
2207
+ "epoch": 0.0235,
2208
+ "grad_norm": 0.034819138532107045,
2209
+ "learning_rate": 2.3480000000000002e-06,
2210
+ "loss": 2.4874,
2211
+ "step": 1175
2212
+ },
2213
+ {
2214
+ "epoch": 0.0235,
2215
+ "eval_loss": 2.4784486293792725,
2216
+ "eval_runtime": 33.3374,
2217
+ "eval_samples_per_second": 3.51,
2218
+ "eval_steps_per_second": 1.77,
2219
+ "step": 1175
2220
+ },
2221
+ {
2222
+ "epoch": 0.0236,
2223
+ "eval_loss": 2.478088855743408,
2224
+ "eval_runtime": 33.2864,
2225
+ "eval_samples_per_second": 3.515,
2226
+ "eval_steps_per_second": 1.772,
2227
+ "step": 1180
2228
+ },
2229
+ {
2230
+ "epoch": 0.0237,
2231
+ "eval_loss": 2.477979898452759,
2232
+ "eval_runtime": 33.4245,
2233
+ "eval_samples_per_second": 3.5,
2234
+ "eval_steps_per_second": 1.765,
2235
+ "step": 1185
2236
+ },
2237
+ {
2238
+ "epoch": 0.0238,
2239
+ "eval_loss": 2.4778709411621094,
2240
+ "eval_runtime": 33.2611,
2241
+ "eval_samples_per_second": 3.518,
2242
+ "eval_steps_per_second": 1.774,
2243
+ "step": 1190
2244
+ },
2245
+ {
2246
+ "epoch": 0.0239,
2247
+ "eval_loss": 2.477571487426758,
2248
+ "eval_runtime": 33.3418,
2249
+ "eval_samples_per_second": 3.509,
2250
+ "eval_steps_per_second": 1.77,
2251
+ "step": 1195
2252
+ },
2253
+ {
2254
+ "epoch": 0.024,
2255
+ "grad_norm": 0.037748109041694296,
2256
+ "learning_rate": 2.398e-06,
2257
+ "loss": 2.4666,
2258
+ "step": 1200
2259
+ },
2260
+ {
2261
+ "epoch": 0.024,
2262
+ "eval_loss": 2.4772226810455322,
2263
+ "eval_runtime": 33.3603,
2264
+ "eval_samples_per_second": 3.507,
2265
+ "eval_steps_per_second": 1.769,
2266
+ "step": 1200
2267
+ },
2268
+ {
2269
+ "epoch": 0.0241,
2270
+ "eval_loss": 2.4769959449768066,
2271
+ "eval_runtime": 33.21,
2272
+ "eval_samples_per_second": 3.523,
2273
+ "eval_steps_per_second": 1.777,
2274
+ "step": 1205
2275
+ },
2276
+ {
2277
+ "epoch": 0.0242,
2278
+ "eval_loss": 2.4768526554107666,
2279
+ "eval_runtime": 33.4359,
2280
+ "eval_samples_per_second": 3.499,
2281
+ "eval_steps_per_second": 1.765,
2282
+ "step": 1210
2283
+ },
2284
+ {
2285
+ "epoch": 0.0243,
2286
+ "eval_loss": 2.476616382598877,
2287
+ "eval_runtime": 33.3341,
2288
+ "eval_samples_per_second": 3.51,
2289
+ "eval_steps_per_second": 1.77,
2290
+ "step": 1215
2291
+ },
2292
+ {
2293
+ "epoch": 0.0244,
2294
+ "eval_loss": 2.476250171661377,
2295
+ "eval_runtime": 33.3422,
2296
+ "eval_samples_per_second": 3.509,
2297
+ "eval_steps_per_second": 1.77,
2298
+ "step": 1220
2299
+ },
2300
+ {
2301
+ "epoch": 0.0245,
2302
+ "grad_norm": 0.042904100843004035,
2303
+ "learning_rate": 2.448e-06,
2304
+ "loss": 2.4698,
2305
+ "step": 1225
2306
+ },
2307
+ {
2308
+ "epoch": 0.0245,
2309
+ "eval_loss": 2.475933790206909,
2310
+ "eval_runtime": 33.3238,
2311
+ "eval_samples_per_second": 3.511,
2312
+ "eval_steps_per_second": 1.771,
2313
+ "step": 1225
2314
+ },
2315
+ {
2316
+ "epoch": 0.0246,
2317
+ "eval_loss": 2.475733995437622,
2318
+ "eval_runtime": 33.337,
2319
+ "eval_samples_per_second": 3.51,
2320
+ "eval_steps_per_second": 1.77,
2321
+ "step": 1230
2322
+ },
2323
+ {
2324
+ "epoch": 0.0247,
2325
+ "eval_loss": 2.4756155014038086,
2326
+ "eval_runtime": 33.3642,
2327
+ "eval_samples_per_second": 3.507,
2328
+ "eval_steps_per_second": 1.768,
2329
+ "step": 1235
2330
+ },
2331
+ {
2332
+ "epoch": 0.0248,
2333
+ "eval_loss": 2.475208044052124,
2334
+ "eval_runtime": 33.3567,
2335
+ "eval_samples_per_second": 3.508,
2336
+ "eval_steps_per_second": 1.769,
2337
+ "step": 1240
2338
+ },
2339
+ {
2340
+ "epoch": 0.0249,
2341
+ "eval_loss": 2.4751882553100586,
2342
+ "eval_runtime": 33.2409,
2343
+ "eval_samples_per_second": 3.52,
2344
+ "eval_steps_per_second": 1.775,
2345
+ "step": 1245
2346
+ },
2347
+ {
2348
+ "epoch": 0.025,
2349
+ "grad_norm": 0.04198064762114288,
2350
+ "learning_rate": 2.498e-06,
2351
+ "loss": 2.4544,
2352
+ "step": 1250
2353
+ },
2354
+ {
2355
+ "epoch": 0.025,
2356
+ "eval_loss": 2.4749433994293213,
2357
+ "eval_runtime": 33.219,
2358
+ "eval_samples_per_second": 3.522,
2359
+ "eval_steps_per_second": 1.776,
2360
+ "step": 1250
2361
+ },
2362
+ {
2363
+ "epoch": 0.0251,
2364
+ "eval_loss": 2.475109577178955,
2365
+ "eval_runtime": 33.293,
2366
+ "eval_samples_per_second": 3.514,
2367
+ "eval_steps_per_second": 1.772,
2368
+ "step": 1255
2369
+ },
2370
+ {
2371
+ "epoch": 0.0252,
2372
+ "eval_loss": 2.474750280380249,
2373
+ "eval_runtime": 33.5388,
2374
+ "eval_samples_per_second": 3.488,
2375
+ "eval_steps_per_second": 1.759,
2376
+ "step": 1260
2377
+ },
2378
+ {
2379
+ "epoch": 0.0253,
2380
+ "eval_loss": 2.4743547439575195,
2381
+ "eval_runtime": 33.3597,
2382
+ "eval_samples_per_second": 3.507,
2383
+ "eval_steps_per_second": 1.769,
2384
+ "step": 1265
2385
+ },
2386
+ {
2387
+ "epoch": 0.0254,
2388
+ "eval_loss": 2.4740777015686035,
2389
+ "eval_runtime": 33.3283,
2390
+ "eval_samples_per_second": 3.511,
2391
+ "eval_steps_per_second": 1.77,
2392
+ "step": 1270
2393
+ },
2394
+ {
2395
+ "epoch": 0.0255,
2396
+ "grad_norm": 0.03252077443949688,
2397
+ "learning_rate": 2.5480000000000004e-06,
2398
+ "loss": 2.4647,
2399
+ "step": 1275
2400
+ },
2401
+ {
2402
+ "epoch": 0.0255,
2403
+ "eval_loss": 2.473674774169922,
2404
+ "eval_runtime": 33.2492,
2405
+ "eval_samples_per_second": 3.519,
2406
+ "eval_steps_per_second": 1.774,
2407
+ "step": 1275
2408
+ },
2409
+ {
2410
+ "epoch": 0.0256,
2411
+ "eval_loss": 2.4734930992126465,
2412
+ "eval_runtime": 33.2934,
2413
+ "eval_samples_per_second": 3.514,
2414
+ "eval_steps_per_second": 1.772,
2415
+ "step": 1280
2416
+ },
2417
+ {
2418
+ "epoch": 0.0257,
2419
+ "eval_loss": 2.4735071659088135,
2420
+ "eval_runtime": 33.466,
2421
+ "eval_samples_per_second": 3.496,
2422
+ "eval_steps_per_second": 1.763,
2423
+ "step": 1285
2424
+ },
2425
+ {
2426
+ "epoch": 0.0258,
2427
+ "eval_loss": 2.4733572006225586,
2428
+ "eval_runtime": 33.248,
2429
+ "eval_samples_per_second": 3.519,
2430
+ "eval_steps_per_second": 1.775,
2431
+ "step": 1290
2432
+ },
2433
+ {
2434
+ "epoch": 0.0259,
2435
+ "eval_loss": 2.4730312824249268,
2436
+ "eval_runtime": 33.3551,
2437
+ "eval_samples_per_second": 3.508,
2438
+ "eval_steps_per_second": 1.769,
2439
+ "step": 1295
2440
+ },
2441
+ {
2442
+ "epoch": 0.026,
2443
+ "grad_norm": 0.034740776600877266,
2444
+ "learning_rate": 2.598e-06,
2445
+ "loss": 2.4625,
2446
+ "step": 1300
2447
+ },
2448
+ {
2449
+ "epoch": 0.026,
2450
+ "eval_loss": 2.4726204872131348,
2451
+ "eval_runtime": 33.3147,
2452
+ "eval_samples_per_second": 3.512,
2453
+ "eval_steps_per_second": 1.771,
2454
+ "step": 1300
2455
+ },
2456
+ {
2457
+ "epoch": 0.0261,
2458
+ "eval_loss": 2.4729621410369873,
2459
+ "eval_runtime": 33.3118,
2460
+ "eval_samples_per_second": 3.512,
2461
+ "eval_steps_per_second": 1.771,
2462
+ "step": 1305
2463
+ },
2464
+ {
2465
+ "epoch": 0.0262,
2466
+ "eval_loss": 2.4726085662841797,
2467
+ "eval_runtime": 33.4111,
2468
+ "eval_samples_per_second": 3.502,
2469
+ "eval_steps_per_second": 1.766,
2470
+ "step": 1310
2471
+ },
2472
+ {
2473
+ "epoch": 0.0263,
2474
+ "eval_loss": 2.4724133014678955,
2475
+ "eval_runtime": 33.3144,
2476
+ "eval_samples_per_second": 3.512,
2477
+ "eval_steps_per_second": 1.771,
2478
+ "step": 1315
2479
+ },
2480
+ {
2481
+ "epoch": 0.0264,
2482
+ "eval_loss": 2.471963405609131,
2483
+ "eval_runtime": 33.3272,
2484
+ "eval_samples_per_second": 3.511,
2485
+ "eval_steps_per_second": 1.77,
2486
+ "step": 1320
2487
+ },
2488
+ {
2489
+ "epoch": 0.0265,
2490
+ "grad_norm": 0.039738232523319775,
2491
+ "learning_rate": 2.648e-06,
2492
+ "loss": 2.4734,
2493
+ "step": 1325
2494
+ },
2495
+ {
2496
+ "epoch": 0.0265,
2497
+ "eval_loss": 2.4717814922332764,
2498
+ "eval_runtime": 33.2395,
2499
+ "eval_samples_per_second": 3.52,
2500
+ "eval_steps_per_second": 1.775,
2501
+ "step": 1325
2502
+ },
2503
+ {
2504
+ "epoch": 0.0266,
2505
+ "eval_loss": 2.471389055252075,
2506
+ "eval_runtime": 33.2159,
2507
+ "eval_samples_per_second": 3.522,
2508
+ "eval_steps_per_second": 1.776,
2509
+ "step": 1330
2510
+ },
2511
+ {
2512
+ "epoch": 0.0267,
2513
+ "eval_loss": 2.4711251258850098,
2514
+ "eval_runtime": 33.4193,
2515
+ "eval_samples_per_second": 3.501,
2516
+ "eval_steps_per_second": 1.765,
2517
+ "step": 1335
2518
+ },
2519
+ {
2520
+ "epoch": 0.0268,
2521
+ "eval_loss": 2.470979928970337,
2522
+ "eval_runtime": 33.2748,
2523
+ "eval_samples_per_second": 3.516,
2524
+ "eval_steps_per_second": 1.773,
2525
+ "step": 1340
2526
+ },
2527
+ {
2528
+ "epoch": 0.0269,
2529
+ "eval_loss": 2.4706759452819824,
2530
+ "eval_runtime": 33.3367,
2531
+ "eval_samples_per_second": 3.51,
2532
+ "eval_steps_per_second": 1.77,
2533
+ "step": 1345
2534
+ },
2535
+ {
2536
+ "epoch": 0.027,
2537
+ "grad_norm": 0.036968596903604725,
2538
+ "learning_rate": 2.6980000000000003e-06,
2539
+ "loss": 2.4642,
2540
+ "step": 1350
2541
+ },
2542
+ {
2543
+ "epoch": 0.027,
2544
+ "eval_loss": 2.470658302307129,
2545
+ "eval_runtime": 33.3288,
2546
+ "eval_samples_per_second": 3.51,
2547
+ "eval_steps_per_second": 1.77,
2548
+ "step": 1350
2549
+ },
2550
+ {
2551
+ "epoch": 0.0271,
2552
+ "eval_loss": 2.4704952239990234,
2553
+ "eval_runtime": 33.3162,
2554
+ "eval_samples_per_second": 3.512,
2555
+ "eval_steps_per_second": 1.771,
2556
+ "step": 1355
2557
+ },
2558
+ {
2559
+ "epoch": 0.0272,
2560
+ "eval_loss": 2.470270872116089,
2561
+ "eval_runtime": 33.35,
2562
+ "eval_samples_per_second": 3.508,
2563
+ "eval_steps_per_second": 1.769,
2564
+ "step": 1360
2565
+ },
2566
+ {
2567
+ "epoch": 0.0273,
2568
+ "eval_loss": 2.4699764251708984,
2569
+ "eval_runtime": 33.3696,
2570
+ "eval_samples_per_second": 3.506,
2571
+ "eval_steps_per_second": 1.768,
2572
+ "step": 1365
2573
+ },
2574
+ {
2575
+ "epoch": 0.0274,
2576
+ "eval_loss": 2.469688653945923,
2577
+ "eval_runtime": 33.4143,
2578
+ "eval_samples_per_second": 3.501,
2579
+ "eval_steps_per_second": 1.766,
2580
+ "step": 1370
2581
+ },
2582
+ {
2583
+ "epoch": 0.0275,
2584
+ "grad_norm": 0.03899590922475157,
2585
+ "learning_rate": 2.748e-06,
2586
+ "loss": 2.4579,
2587
+ "step": 1375
2588
+ },
2589
+ {
2590
+ "epoch": 0.0275,
2591
+ "eval_loss": 2.469435691833496,
2592
+ "eval_runtime": 33.34,
2593
+ "eval_samples_per_second": 3.509,
2594
+ "eval_steps_per_second": 1.77,
2595
+ "step": 1375
2596
+ },
2597
+ {
2598
+ "epoch": 0.0276,
2599
+ "eval_loss": 2.469395160675049,
2600
+ "eval_runtime": 33.2655,
2601
+ "eval_samples_per_second": 3.517,
2602
+ "eval_steps_per_second": 1.774,
2603
+ "step": 1380
2604
+ },
2605
+ {
2606
+ "epoch": 0.0277,
2607
+ "eval_loss": 2.46889328956604,
2608
+ "eval_runtime": 33.3344,
2609
+ "eval_samples_per_second": 3.51,
2610
+ "eval_steps_per_second": 1.77,
2611
+ "step": 1385
2612
+ },
2613
+ {
2614
+ "epoch": 0.0278,
2615
+ "eval_loss": 2.468695640563965,
2616
+ "eval_runtime": 33.4003,
2617
+ "eval_samples_per_second": 3.503,
2618
+ "eval_steps_per_second": 1.766,
2619
+ "step": 1390
2620
+ },
2621
+ {
2622
+ "epoch": 0.0279,
2623
+ "eval_loss": 2.4685797691345215,
2624
+ "eval_runtime": 33.252,
2625
+ "eval_samples_per_second": 3.519,
2626
+ "eval_steps_per_second": 1.774,
2627
+ "step": 1395
2628
+ },
2629
+ {
2630
+ "epoch": 0.028,
2631
+ "grad_norm": 0.03498385470366268,
2632
+ "learning_rate": 2.798e-06,
2633
+ "loss": 2.472,
2634
+ "step": 1400
2635
+ },
2636
+ {
2637
+ "epoch": 0.028,
2638
+ "eval_loss": 2.468594789505005,
2639
+ "eval_runtime": 33.5555,
2640
+ "eval_samples_per_second": 3.487,
2641
+ "eval_steps_per_second": 1.758,
2642
+ "step": 1400
2643
+ },
2644
+ {
2645
+ "epoch": 0.0281,
2646
+ "eval_loss": 2.4685287475585938,
2647
+ "eval_runtime": 33.3147,
2648
+ "eval_samples_per_second": 3.512,
2649
+ "eval_steps_per_second": 1.771,
2650
+ "step": 1405
2651
+ },
2652
+ {
2653
+ "epoch": 0.0282,
2654
+ "eval_loss": 2.467956304550171,
2655
+ "eval_runtime": 33.3679,
2656
+ "eval_samples_per_second": 3.506,
2657
+ "eval_steps_per_second": 1.768,
2658
+ "step": 1410
2659
+ },
2660
+ {
2661
+ "epoch": 0.0283,
2662
+ "eval_loss": 2.467761993408203,
2663
+ "eval_runtime": 33.3242,
2664
+ "eval_samples_per_second": 3.511,
2665
+ "eval_steps_per_second": 1.77,
2666
+ "step": 1415
2667
+ },
2668
+ {
2669
+ "epoch": 0.0284,
2670
+ "eval_loss": 2.467660903930664,
2671
+ "eval_runtime": 33.3677,
2672
+ "eval_samples_per_second": 3.506,
2673
+ "eval_steps_per_second": 1.768,
2674
+ "step": 1420
2675
+ },
2676
+ {
2677
+ "epoch": 0.0285,
2678
+ "grad_norm": 0.03333480906358989,
2679
+ "learning_rate": 2.848e-06,
2680
+ "loss": 2.4676,
2681
+ "step": 1425
2682
+ },
2683
+ {
2684
+ "epoch": 0.0285,
2685
+ "eval_loss": 2.4673027992248535,
2686
+ "eval_runtime": 33.3388,
2687
+ "eval_samples_per_second": 3.509,
2688
+ "eval_steps_per_second": 1.77,
2689
+ "step": 1425
2690
+ },
2691
+ {
2692
+ "epoch": 0.0286,
2693
+ "eval_loss": 2.467072010040283,
2694
+ "eval_runtime": 33.3596,
2695
+ "eval_samples_per_second": 3.507,
2696
+ "eval_steps_per_second": 1.769,
2697
+ "step": 1430
2698
+ },
2699
+ {
2700
+ "epoch": 0.0287,
2701
+ "eval_loss": 2.4668517112731934,
2702
+ "eval_runtime": 33.5136,
2703
+ "eval_samples_per_second": 3.491,
2704
+ "eval_steps_per_second": 1.76,
2705
+ "step": 1435
2706
+ },
2707
+ {
2708
+ "epoch": 0.0288,
2709
+ "eval_loss": 2.4666786193847656,
2710
+ "eval_runtime": 33.3405,
2711
+ "eval_samples_per_second": 3.509,
2712
+ "eval_steps_per_second": 1.77,
2713
+ "step": 1440
2714
+ },
2715
+ {
2716
+ "epoch": 0.0289,
2717
+ "eval_loss": 2.4667794704437256,
2718
+ "eval_runtime": 33.3333,
2719
+ "eval_samples_per_second": 3.51,
2720
+ "eval_steps_per_second": 1.77,
2721
+ "step": 1445
2722
+ },
2723
+ {
2724
+ "epoch": 0.029,
2725
+ "grad_norm": 0.03480548121480933,
2726
+ "learning_rate": 2.8980000000000005e-06,
2727
+ "loss": 2.4524,
2728
+ "step": 1450
2729
+ },
2730
+ {
2731
+ "epoch": 0.029,
2732
+ "eval_loss": 2.466280460357666,
2733
+ "eval_runtime": 33.4727,
2734
+ "eval_samples_per_second": 3.495,
2735
+ "eval_steps_per_second": 1.763,
2736
+ "step": 1450
2737
+ },
2738
+ {
2739
+ "epoch": 0.0291,
2740
+ "eval_loss": 2.4659922122955322,
2741
+ "eval_runtime": 33.3309,
2742
+ "eval_samples_per_second": 3.51,
2743
+ "eval_steps_per_second": 1.77,
2744
+ "step": 1455
2745
+ },
2746
+ {
2747
+ "epoch": 0.0292,
2748
+ "eval_loss": 2.4657278060913086,
2749
+ "eval_runtime": 33.326,
2750
+ "eval_samples_per_second": 3.511,
2751
+ "eval_steps_per_second": 1.77,
2752
+ "step": 1460
2753
+ },
2754
+ {
2755
+ "epoch": 0.0293,
2756
+ "eval_loss": 2.4654440879821777,
2757
+ "eval_runtime": 33.3457,
2758
+ "eval_samples_per_second": 3.509,
2759
+ "eval_steps_per_second": 1.769,
2760
+ "step": 1465
2761
+ },
2762
+ {
2763
+ "epoch": 0.0294,
2764
+ "eval_loss": 2.465367317199707,
2765
+ "eval_runtime": 33.2824,
2766
+ "eval_samples_per_second": 3.515,
2767
+ "eval_steps_per_second": 1.773,
2768
+ "step": 1470
2769
+ },
2770
+ {
2771
+ "epoch": 0.0295,
2772
+ "grad_norm": 0.03652712436191979,
2773
+ "learning_rate": 2.9480000000000004e-06,
2774
+ "loss": 2.466,
2775
+ "step": 1475
2776
+ },
2777
+ {
2778
+ "epoch": 0.0295,
2779
+ "eval_loss": 2.465318202972412,
2780
+ "eval_runtime": 33.3264,
2781
+ "eval_samples_per_second": 3.511,
2782
+ "eval_steps_per_second": 1.77,
2783
+ "step": 1475
2784
+ },
2785
+ {
2786
+ "epoch": 0.0296,
2787
+ "eval_loss": 2.465156316757202,
2788
+ "eval_runtime": 33.2661,
2789
+ "eval_samples_per_second": 3.517,
2790
+ "eval_steps_per_second": 1.774,
2791
+ "step": 1480
2792
+ },
2793
+ {
2794
+ "epoch": 0.0297,
2795
+ "eval_loss": 2.4648799896240234,
2796
+ "eval_runtime": 33.4782,
2797
+ "eval_samples_per_second": 3.495,
2798
+ "eval_steps_per_second": 1.762,
2799
+ "step": 1485
2800
+ },
2801
+ {
2802
+ "epoch": 0.0298,
2803
+ "eval_loss": 2.4646074771881104,
2804
+ "eval_runtime": 33.3194,
2805
+ "eval_samples_per_second": 3.511,
2806
+ "eval_steps_per_second": 1.771,
2807
+ "step": 1490
2808
+ },
2809
+ {
2810
+ "epoch": 0.0299,
2811
+ "eval_loss": 2.464465856552124,
2812
+ "eval_runtime": 33.3466,
2813
+ "eval_samples_per_second": 3.509,
2814
+ "eval_steps_per_second": 1.769,
2815
+ "step": 1495
2816
+ },
2817
+ {
2818
+ "epoch": 0.03,
2819
+ "grad_norm": 0.03778721361564108,
2820
+ "learning_rate": 2.9980000000000003e-06,
2821
+ "loss": 2.4684,
2822
+ "step": 1500
2823
+ },
2824
+ {
2825
+ "epoch": 0.03,
2826
+ "eval_loss": 2.464305877685547,
2827
+ "eval_runtime": 33.25,
2828
+ "eval_samples_per_second": 3.519,
2829
+ "eval_steps_per_second": 1.774,
2830
+ "step": 1500
2831
+ },
2832
+ {
2833
+ "epoch": 0.0301,
2834
+ "eval_loss": 2.464261531829834,
2835
+ "eval_runtime": 33.3761,
2836
+ "eval_samples_per_second": 3.505,
2837
+ "eval_steps_per_second": 1.768,
2838
+ "step": 1505
2839
+ },
2840
+ {
2841
+ "epoch": 0.0302,
2842
+ "eval_loss": 2.464185953140259,
2843
+ "eval_runtime": 33.4957,
2844
+ "eval_samples_per_second": 3.493,
2845
+ "eval_steps_per_second": 1.761,
2846
+ "step": 1510
2847
+ },
2848
+ {
2849
+ "epoch": 0.0303,
2850
+ "eval_loss": 2.4639229774475098,
2851
+ "eval_runtime": 33.2475,
2852
+ "eval_samples_per_second": 3.519,
2853
+ "eval_steps_per_second": 1.775,
2854
+ "step": 1515
2855
+ },
2856
+ {
2857
+ "epoch": 0.0304,
2858
+ "eval_loss": 2.4636595249176025,
2859
+ "eval_runtime": 33.3124,
2860
+ "eval_samples_per_second": 3.512,
2861
+ "eval_steps_per_second": 1.771,
2862
+ "step": 1520
2863
+ },
2864
+ {
2865
+ "epoch": 0.0305,
2866
+ "grad_norm": 0.035809836530372154,
2867
+ "learning_rate": 3.0480000000000003e-06,
2868
+ "loss": 2.4631,
2869
+ "step": 1525
2870
+ },
2871
+ {
2872
+ "epoch": 0.0305,
2873
+ "eval_loss": 2.46356201171875,
2874
+ "eval_runtime": 33.3423,
2875
+ "eval_samples_per_second": 3.509,
2876
+ "eval_steps_per_second": 1.77,
2877
+ "step": 1525
2878
+ },
2879
+ {
2880
+ "epoch": 0.0306,
2881
+ "eval_loss": 2.463318347930908,
2882
+ "eval_runtime": 33.3917,
2883
+ "eval_samples_per_second": 3.504,
2884
+ "eval_steps_per_second": 1.767,
2885
+ "step": 1530
2886
+ },
2887
+ {
2888
+ "epoch": 0.0307,
2889
+ "eval_loss": 2.4631264209747314,
2890
+ "eval_runtime": 33.4053,
2891
+ "eval_samples_per_second": 3.502,
2892
+ "eval_steps_per_second": 1.766,
2893
+ "step": 1535
2894
+ },
2895
+ {
2896
+ "epoch": 0.0308,
2897
+ "eval_loss": 2.462981700897217,
2898
+ "eval_runtime": 33.2608,
2899
+ "eval_samples_per_second": 3.518,
2900
+ "eval_steps_per_second": 1.774,
2901
+ "step": 1540
2902
+ },
2903
+ {
2904
+ "epoch": 0.0309,
2905
+ "eval_loss": 2.462719202041626,
2906
+ "eval_runtime": 33.3259,
2907
+ "eval_samples_per_second": 3.511,
2908
+ "eval_steps_per_second": 1.77,
2909
+ "step": 1545
2910
+ },
2911
+ {
2912
+ "epoch": 0.031,
2913
+ "grad_norm": 0.05979367258550731,
2914
+ "learning_rate": 3.0980000000000007e-06,
2915
+ "loss": 2.46,
2916
+ "step": 1550
2917
+ },
2918
+ {
2919
+ "epoch": 0.031,
2920
+ "eval_loss": 2.462733268737793,
2921
+ "eval_runtime": 33.3195,
2922
+ "eval_samples_per_second": 3.511,
2923
+ "eval_steps_per_second": 1.771,
2924
+ "step": 1550
2925
+ },
2926
+ {
2927
+ "epoch": 0.0311,
2928
+ "eval_loss": 2.4625959396362305,
2929
+ "eval_runtime": 33.3704,
2930
+ "eval_samples_per_second": 3.506,
2931
+ "eval_steps_per_second": 1.768,
2932
+ "step": 1555
2933
+ },
2934
+ {
2935
+ "epoch": 0.0312,
2936
+ "eval_loss": 2.462366819381714,
2937
+ "eval_runtime": 33.4047,
2938
+ "eval_samples_per_second": 3.503,
2939
+ "eval_steps_per_second": 1.766,
2940
+ "step": 1560
2941
+ },
2942
+ {
2943
+ "epoch": 0.0313,
2944
+ "eval_loss": 2.4618427753448486,
2945
+ "eval_runtime": 33.3896,
2946
+ "eval_samples_per_second": 3.504,
2947
+ "eval_steps_per_second": 1.767,
2948
+ "step": 1565
2949
+ },
2950
+ {
2951
+ "epoch": 0.0314,
2952
+ "eval_loss": 2.4616317749023438,
2953
+ "eval_runtime": 33.3414,
2954
+ "eval_samples_per_second": 3.509,
2955
+ "eval_steps_per_second": 1.77,
2956
+ "step": 1570
2957
+ },
2958
+ {
2959
+ "epoch": 0.0315,
2960
+ "grad_norm": 0.031804244667956116,
2961
+ "learning_rate": 3.1480000000000006e-06,
2962
+ "loss": 2.4477,
2963
+ "step": 1575
2964
+ },
2965
+ {
2966
+ "epoch": 0.0315,
2967
+ "eval_loss": 2.4615368843078613,
2968
+ "eval_runtime": 33.3548,
2969
+ "eval_samples_per_second": 3.508,
2970
+ "eval_steps_per_second": 1.769,
2971
+ "step": 1575
2972
+ },
2973
+ {
2974
+ "epoch": 0.0316,
2975
+ "eval_loss": 2.461198091506958,
2976
+ "eval_runtime": 33.2416,
2977
+ "eval_samples_per_second": 3.52,
2978
+ "eval_steps_per_second": 1.775,
2979
+ "step": 1580
2980
+ },
2981
+ {
2982
+ "epoch": 0.0317,
2983
+ "eval_loss": 2.4611523151397705,
2984
+ "eval_runtime": 33.3445,
2985
+ "eval_samples_per_second": 3.509,
2986
+ "eval_steps_per_second": 1.769,
2987
+ "step": 1585
2988
+ },
2989
+ {
2990
+ "epoch": 0.0318,
2991
+ "eval_loss": 2.4609127044677734,
2992
+ "eval_runtime": 33.3175,
2993
+ "eval_samples_per_second": 3.512,
2994
+ "eval_steps_per_second": 1.771,
2995
+ "step": 1590
2996
+ },
2997
+ {
2998
+ "epoch": 0.0319,
2999
+ "eval_loss": 2.4608800411224365,
3000
+ "eval_runtime": 33.3052,
3001
+ "eval_samples_per_second": 3.513,
3002
+ "eval_steps_per_second": 1.771,
3003
+ "step": 1595
3004
+ },
3005
+ {
3006
+ "epoch": 0.032,
3007
+ "grad_norm": 0.03365841309984822,
3008
+ "learning_rate": 3.198e-06,
3009
+ "loss": 2.4523,
3010
+ "step": 1600
3011
+ },
3012
+ {
3013
+ "epoch": 0.032,
3014
+ "eval_loss": 2.460757255554199,
3015
+ "eval_runtime": 33.2636,
3016
+ "eval_samples_per_second": 3.517,
3017
+ "eval_steps_per_second": 1.774,
3018
+ "step": 1600
3019
+ },
3020
+ {
3021
+ "epoch": 0.0321,
3022
+ "eval_loss": 2.4605917930603027,
3023
+ "eval_runtime": 33.4595,
3024
+ "eval_samples_per_second": 3.497,
3025
+ "eval_steps_per_second": 1.763,
3026
+ "step": 1605
3027
+ },
3028
+ {
3029
+ "epoch": 0.0322,
3030
+ "eval_loss": 2.4604575634002686,
3031
+ "eval_runtime": 33.2706,
3032
+ "eval_samples_per_second": 3.517,
3033
+ "eval_steps_per_second": 1.773,
3034
+ "step": 1610
3035
+ },
3036
+ {
3037
+ "epoch": 0.0323,
3038
+ "eval_loss": 2.4603111743927,
3039
+ "eval_runtime": 33.405,
3040
+ "eval_samples_per_second": 3.502,
3041
+ "eval_steps_per_second": 1.766,
3042
+ "step": 1615
3043
+ },
3044
+ {
3045
+ "epoch": 0.0324,
3046
+ "eval_loss": 2.460045337677002,
3047
+ "eval_runtime": 33.2598,
3048
+ "eval_samples_per_second": 3.518,
3049
+ "eval_steps_per_second": 1.774,
3050
+ "step": 1620
3051
+ },
3052
+ {
3053
+ "epoch": 0.0325,
3054
+ "grad_norm": 0.03534600587541967,
3055
+ "learning_rate": 3.248e-06,
3056
+ "loss": 2.45,
3057
+ "step": 1625
3058
+ },
3059
+ {
3060
+ "epoch": 0.0325,
3061
+ "eval_loss": 2.460045099258423,
3062
+ "eval_runtime": 33.2663,
3063
+ "eval_samples_per_second": 3.517,
3064
+ "eval_steps_per_second": 1.774,
3065
+ "step": 1625
3066
+ },
3067
+ {
3068
+ "epoch": 0.0326,
3069
+ "eval_loss": 2.4599287509918213,
3070
+ "eval_runtime": 33.2545,
3071
+ "eval_samples_per_second": 3.518,
3072
+ "eval_steps_per_second": 1.774,
3073
+ "step": 1630
3074
+ },
3075
+ {
3076
+ "epoch": 0.0327,
3077
+ "eval_loss": 2.459611654281616,
3078
+ "eval_runtime": 33.4189,
3079
+ "eval_samples_per_second": 3.501,
3080
+ "eval_steps_per_second": 1.765,
3081
+ "step": 1635
3082
+ },
3083
+ {
3084
+ "epoch": 0.0328,
3085
+ "eval_loss": 2.4594151973724365,
3086
+ "eval_runtime": 33.284,
3087
+ "eval_samples_per_second": 3.515,
3088
+ "eval_steps_per_second": 1.773,
3089
+ "step": 1640
3090
+ },
3091
+ {
3092
+ "epoch": 0.0329,
3093
+ "eval_loss": 2.4589221477508545,
3094
+ "eval_runtime": 33.4033,
3095
+ "eval_samples_per_second": 3.503,
3096
+ "eval_steps_per_second": 1.766,
3097
+ "step": 1645
3098
+ },
3099
+ {
3100
+ "epoch": 0.033,
3101
+ "grad_norm": 0.032596527761614855,
3102
+ "learning_rate": 3.298e-06,
3103
+ "loss": 2.4422,
3104
+ "step": 1650
3105
+ },
3106
+ {
3107
+ "epoch": 0.033,
3108
+ "eval_loss": 2.4589502811431885,
3109
+ "eval_runtime": 33.2986,
3110
+ "eval_samples_per_second": 3.514,
3111
+ "eval_steps_per_second": 1.772,
3112
+ "step": 1650
3113
+ },
3114
+ {
3115
+ "epoch": 0.0331,
3116
+ "eval_loss": 2.4588239192962646,
3117
+ "eval_runtime": 33.4046,
3118
+ "eval_samples_per_second": 3.503,
3119
+ "eval_steps_per_second": 1.766,
3120
+ "step": 1655
3121
+ },
3122
+ {
3123
+ "epoch": 0.0332,
3124
+ "eval_loss": 2.458603620529175,
3125
+ "eval_runtime": 33.3448,
3126
+ "eval_samples_per_second": 3.509,
3127
+ "eval_steps_per_second": 1.769,
3128
+ "step": 1660
3129
+ },
3130
+ {
3131
+ "epoch": 0.0333,
3132
+ "eval_loss": 2.458559513092041,
3133
+ "eval_runtime": 33.368,
3134
+ "eval_samples_per_second": 3.506,
3135
+ "eval_steps_per_second": 1.768,
3136
+ "step": 1665
3137
+ },
3138
+ {
3139
+ "epoch": 0.0334,
3140
+ "eval_loss": 2.458500862121582,
3141
+ "eval_runtime": 33.2335,
3142
+ "eval_samples_per_second": 3.521,
3143
+ "eval_steps_per_second": 1.775,
3144
+ "step": 1670
3145
+ },
3146
+ {
3147
+ "epoch": 0.0335,
3148
+ "grad_norm": 0.03339611698643194,
3149
+ "learning_rate": 3.348e-06,
3150
+ "loss": 2.447,
3151
+ "step": 1675
3152
+ },
3153
+ {
3154
+ "epoch": 0.0335,
3155
+ "eval_loss": 2.458252191543579,
3156
+ "eval_runtime": 33.3623,
3157
+ "eval_samples_per_second": 3.507,
3158
+ "eval_steps_per_second": 1.768,
3159
+ "step": 1675
3160
+ },
3161
+ {
3162
+ "epoch": 0.0336,
3163
+ "eval_loss": 2.4580931663513184,
3164
+ "eval_runtime": 33.2532,
3165
+ "eval_samples_per_second": 3.518,
3166
+ "eval_steps_per_second": 1.774,
3167
+ "step": 1680
3168
+ },
3169
+ {
3170
+ "epoch": 0.0337,
3171
+ "eval_loss": 2.4578795433044434,
3172
+ "eval_runtime": 33.3214,
3173
+ "eval_samples_per_second": 3.511,
3174
+ "eval_steps_per_second": 1.771,
3175
+ "step": 1685
3176
+ },
3177
+ {
3178
+ "epoch": 0.0338,
3179
+ "eval_loss": 2.4576218128204346,
3180
+ "eval_runtime": 33.248,
3181
+ "eval_samples_per_second": 3.519,
3182
+ "eval_steps_per_second": 1.775,
3183
+ "step": 1690
3184
+ },
3185
+ {
3186
+ "epoch": 0.0339,
3187
+ "eval_loss": 2.4576828479766846,
3188
+ "eval_runtime": 33.3499,
3189
+ "eval_samples_per_second": 3.508,
3190
+ "eval_steps_per_second": 1.769,
3191
+ "step": 1695
3192
+ },
3193
+ {
3194
+ "epoch": 0.034,
3195
+ "grad_norm": 0.03028181865357742,
3196
+ "learning_rate": 3.3980000000000003e-06,
3197
+ "loss": 2.4582,
3198
+ "step": 1700
3199
+ },
3200
+ {
3201
+ "epoch": 0.034,
3202
+ "eval_loss": 2.457383155822754,
3203
+ "eval_runtime": 33.2574,
3204
+ "eval_samples_per_second": 3.518,
3205
+ "eval_steps_per_second": 1.774,
3206
+ "step": 1700
3207
+ },
3208
+ {
3209
+ "epoch": 0.0341,
3210
+ "eval_loss": 2.4572579860687256,
3211
+ "eval_runtime": 33.2947,
3212
+ "eval_samples_per_second": 3.514,
3213
+ "eval_steps_per_second": 1.772,
3214
+ "step": 1705
3215
+ },
3216
+ {
3217
+ "epoch": 0.0342,
3218
+ "eval_loss": 2.4584450721740723,
3219
+ "eval_runtime": 33.3296,
3220
+ "eval_samples_per_second": 3.51,
3221
+ "eval_steps_per_second": 1.77,
3222
+ "step": 1710
3223
+ },
3224
+ {
3225
+ "epoch": 0.0343,
3226
+ "eval_loss": 2.458603858947754,
3227
+ "eval_runtime": 33.3017,
3228
+ "eval_samples_per_second": 3.513,
3229
+ "eval_steps_per_second": 1.772,
3230
+ "step": 1715
3231
+ },
3232
+ {
3233
+ "epoch": 0.0344,
3234
+ "eval_loss": 2.4579555988311768,
3235
+ "eval_runtime": 33.292,
3236
+ "eval_samples_per_second": 3.514,
3237
+ "eval_steps_per_second": 1.772,
3238
+ "step": 1720
3239
+ },
3240
+ {
3241
+ "epoch": 0.0345,
3242
+ "grad_norm": 0.03734241446236971,
3243
+ "learning_rate": 3.4480000000000003e-06,
3244
+ "loss": 2.4501,
3245
+ "step": 1725
3246
+ },
3247
+ {
3248
+ "epoch": 0.0345,
3249
+ "eval_loss": 2.4574153423309326,
3250
+ "eval_runtime": 33.4313,
3251
+ "eval_samples_per_second": 3.5,
3252
+ "eval_steps_per_second": 1.765,
3253
+ "step": 1725
3254
+ },
3255
+ {
3256
+ "epoch": 0.0346,
3257
+ "eval_loss": 2.456867218017578,
3258
+ "eval_runtime": 33.2833,
3259
+ "eval_samples_per_second": 3.515,
3260
+ "eval_steps_per_second": 1.773,
3261
+ "step": 1730
3262
+ },
3263
+ {
3264
+ "epoch": 0.0347,
3265
+ "eval_loss": 2.4567270278930664,
3266
+ "eval_runtime": 33.3694,
3267
+ "eval_samples_per_second": 3.506,
3268
+ "eval_steps_per_second": 1.768,
3269
+ "step": 1735
3270
+ },
3271
+ {
3272
+ "epoch": 0.0348,
3273
+ "eval_loss": 2.456348180770874,
3274
+ "eval_runtime": 33.3416,
3275
+ "eval_samples_per_second": 3.509,
3276
+ "eval_steps_per_second": 1.77,
3277
+ "step": 1740
3278
+ },
3279
+ {
3280
+ "epoch": 0.0349,
3281
+ "eval_loss": 2.4563136100769043,
3282
+ "eval_runtime": 33.3531,
3283
+ "eval_samples_per_second": 3.508,
3284
+ "eval_steps_per_second": 1.769,
3285
+ "step": 1745
3286
+ },
3287
+ {
3288
+ "epoch": 0.035,
3289
+ "grad_norm": 0.030782538004837847,
3290
+ "learning_rate": 3.4980000000000002e-06,
3291
+ "loss": 2.4509,
3292
+ "step": 1750
3293
+ },
3294
+ {
3295
+ "epoch": 0.035,
3296
+ "eval_loss": 2.455827236175537,
3297
+ "eval_runtime": 33.3143,
3298
+ "eval_samples_per_second": 3.512,
3299
+ "eval_steps_per_second": 1.771,
3300
+ "step": 1750
3301
+ },
3302
+ {
3303
+ "epoch": 0.0351,
3304
+ "eval_loss": 2.4558639526367188,
3305
+ "eval_runtime": 33.3716,
3306
+ "eval_samples_per_second": 3.506,
3307
+ "eval_steps_per_second": 1.768,
3308
+ "step": 1755
3309
+ },
3310
+ {
3311
+ "epoch": 0.0352,
3312
+ "eval_loss": 2.4555938243865967,
3313
+ "eval_runtime": 33.2966,
3314
+ "eval_samples_per_second": 3.514,
3315
+ "eval_steps_per_second": 1.772,
3316
+ "step": 1760
3317
+ },
3318
+ {
3319
+ "epoch": 0.0353,
3320
+ "eval_loss": 2.4551546573638916,
3321
+ "eval_runtime": 33.3145,
3322
+ "eval_samples_per_second": 3.512,
3323
+ "eval_steps_per_second": 1.771,
3324
+ "step": 1765
3325
+ },
3326
+ {
3327
+ "epoch": 0.0354,
3328
+ "eval_loss": 2.454957962036133,
3329
+ "eval_runtime": 33.3201,
3330
+ "eval_samples_per_second": 3.511,
3331
+ "eval_steps_per_second": 1.771,
3332
+ "step": 1770
3333
+ },
3334
+ {
3335
+ "epoch": 0.0355,
3336
+ "grad_norm": 0.03281862515471333,
3337
+ "learning_rate": 3.548e-06,
3338
+ "loss": 2.4439,
3339
+ "step": 1775
3340
+ },
3341
+ {
3342
+ "epoch": 0.0355,
3343
+ "eval_loss": 2.455031394958496,
3344
+ "eval_runtime": 33.264,
3345
+ "eval_samples_per_second": 3.517,
3346
+ "eval_steps_per_second": 1.774,
3347
+ "step": 1775
3348
+ },
3349
+ {
3350
+ "epoch": 0.0356,
3351
+ "eval_loss": 2.4550724029541016,
3352
+ "eval_runtime": 33.3734,
3353
+ "eval_samples_per_second": 3.506,
3354
+ "eval_steps_per_second": 1.768,
3355
+ "step": 1780
3356
+ },
3357
+ {
3358
+ "epoch": 0.0357,
3359
+ "eval_loss": 2.454719305038452,
3360
+ "eval_runtime": 33.3267,
3361
+ "eval_samples_per_second": 3.511,
3362
+ "eval_steps_per_second": 1.77,
3363
+ "step": 1785
3364
+ },
3365
+ {
3366
+ "epoch": 0.0358,
3367
+ "eval_loss": 2.4547033309936523,
3368
+ "eval_runtime": 33.2651,
3369
+ "eval_samples_per_second": 3.517,
3370
+ "eval_steps_per_second": 1.774,
3371
+ "step": 1790
3372
+ },
3373
+ {
3374
+ "epoch": 0.0359,
3375
+ "eval_loss": 2.454416275024414,
3376
+ "eval_runtime": 33.3612,
3377
+ "eval_samples_per_second": 3.507,
3378
+ "eval_steps_per_second": 1.769,
3379
+ "step": 1795
3380
+ },
3381
+ {
3382
+ "epoch": 0.036,
3383
+ "grad_norm": 0.031756006482001914,
3384
+ "learning_rate": 3.5980000000000005e-06,
3385
+ "loss": 2.4493,
3386
+ "step": 1800
3387
+ },
3388
+ {
3389
+ "epoch": 0.036,
3390
+ "eval_loss": 2.454286813735962,
3391
+ "eval_runtime": 33.326,
3392
+ "eval_samples_per_second": 3.511,
3393
+ "eval_steps_per_second": 1.77,
3394
+ "step": 1800
3395
+ },
3396
+ {
3397
+ "epoch": 0.0361,
3398
+ "eval_loss": 2.4541101455688477,
3399
+ "eval_runtime": 33.2597,
3400
+ "eval_samples_per_second": 3.518,
3401
+ "eval_steps_per_second": 1.774,
3402
+ "step": 1805
3403
+ },
3404
+ {
3405
+ "epoch": 0.0362,
3406
+ "eval_loss": 2.4541351795196533,
3407
+ "eval_runtime": 33.2421,
3408
+ "eval_samples_per_second": 3.52,
3409
+ "eval_steps_per_second": 1.775,
3410
+ "step": 1810
3411
+ },
3412
+ {
3413
+ "epoch": 0.0363,
3414
+ "eval_loss": 2.4537973403930664,
3415
+ "eval_runtime": 33.3201,
3416
+ "eval_samples_per_second": 3.511,
3417
+ "eval_steps_per_second": 1.771,
3418
+ "step": 1815
3419
+ },
3420
+ {
3421
+ "epoch": 0.0364,
3422
+ "eval_loss": 2.4534847736358643,
3423
+ "eval_runtime": 33.2973,
3424
+ "eval_samples_per_second": 3.514,
3425
+ "eval_steps_per_second": 1.772,
3426
+ "step": 1820
3427
+ },
3428
+ {
3429
+ "epoch": 0.0365,
3430
+ "grad_norm": 0.03128096989289917,
3431
+ "learning_rate": 3.6480000000000005e-06,
3432
+ "loss": 2.4526,
3433
+ "step": 1825
3434
+ },
3435
+ {
3436
+ "epoch": 0.0365,
3437
+ "eval_loss": 2.453655481338501,
3438
+ "eval_runtime": 33.3755,
3439
+ "eval_samples_per_second": 3.506,
3440
+ "eval_steps_per_second": 1.768,
3441
+ "step": 1825
3442
+ },
3443
+ {
3444
+ "epoch": 0.0366,
3445
+ "eval_loss": 2.4534049034118652,
3446
+ "eval_runtime": 33.332,
3447
+ "eval_samples_per_second": 3.51,
3448
+ "eval_steps_per_second": 1.77,
3449
+ "step": 1830
3450
+ },
3451
+ {
3452
+ "epoch": 0.0367,
3453
+ "eval_loss": 2.4529781341552734,
3454
+ "eval_runtime": 33.3325,
3455
+ "eval_samples_per_second": 3.51,
3456
+ "eval_steps_per_second": 1.77,
3457
+ "step": 1835
3458
+ },
3459
+ {
3460
+ "epoch": 0.0368,
3461
+ "eval_loss": 2.454005241394043,
3462
+ "eval_runtime": 33.3975,
3463
+ "eval_samples_per_second": 3.503,
3464
+ "eval_steps_per_second": 1.767,
3465
+ "step": 1840
3466
+ },
3467
+ {
3468
+ "epoch": 0.0369,
3469
+ "eval_loss": 2.4538745880126953,
3470
+ "eval_runtime": 33.3,
3471
+ "eval_samples_per_second": 3.514,
3472
+ "eval_steps_per_second": 1.772,
3473
+ "step": 1845
3474
+ },
3475
+ {
3476
+ "epoch": 0.037,
3477
+ "grad_norm": 0.02999582338402207,
3478
+ "learning_rate": 3.6980000000000004e-06,
3479
+ "loss": 2.4309,
3480
+ "step": 1850
3481
+ },
3482
+ {
3483
+ "epoch": 0.037,
3484
+ "eval_loss": 2.4534404277801514,
3485
+ "eval_runtime": 33.2825,
3486
+ "eval_samples_per_second": 3.515,
3487
+ "eval_steps_per_second": 1.773,
3488
+ "step": 1850
3489
+ },
3490
+ {
3491
+ "epoch": 0.0371,
3492
+ "eval_loss": 2.4529800415039062,
3493
+ "eval_runtime": 33.513,
3494
+ "eval_samples_per_second": 3.491,
3495
+ "eval_steps_per_second": 1.761,
3496
+ "step": 1855
3497
+ },
3498
+ {
3499
+ "epoch": 0.0372,
3500
+ "eval_loss": 2.453007221221924,
3501
+ "eval_runtime": 33.3414,
3502
+ "eval_samples_per_second": 3.509,
3503
+ "eval_steps_per_second": 1.77,
3504
+ "step": 1860
3505
+ },
3506
+ {
3507
+ "epoch": 0.0373,
3508
+ "eval_loss": 2.452350616455078,
3509
+ "eval_runtime": 33.3625,
3510
+ "eval_samples_per_second": 3.507,
3511
+ "eval_steps_per_second": 1.768,
3512
+ "step": 1865
3513
+ },
3514
+ {
3515
+ "epoch": 0.0374,
3516
+ "eval_loss": 2.4522666931152344,
3517
+ "eval_runtime": 33.3116,
3518
+ "eval_samples_per_second": 3.512,
3519
+ "eval_steps_per_second": 1.771,
3520
+ "step": 1870
3521
+ },
3522
+ {
3523
+ "epoch": 0.0375,
3524
+ "grad_norm": 0.0409025592520596,
3525
+ "learning_rate": 3.7480000000000004e-06,
3526
+ "loss": 2.442,
3527
+ "step": 1875
3528
+ },
3529
+ {
3530
+ "epoch": 0.0375,
3531
+ "eval_loss": 2.4521546363830566,
3532
+ "eval_runtime": 33.3782,
3533
+ "eval_samples_per_second": 3.505,
3534
+ "eval_steps_per_second": 1.768,
3535
+ "step": 1875
3536
+ },
3537
+ {
3538
+ "epoch": 0.0376,
3539
+ "eval_loss": 2.4520437717437744,
3540
+ "eval_runtime": 33.2887,
3541
+ "eval_samples_per_second": 3.515,
3542
+ "eval_steps_per_second": 1.772,
3543
+ "step": 1880
3544
+ },
3545
+ {
3546
+ "epoch": 0.0377,
3547
+ "eval_loss": 2.4519331455230713,
3548
+ "eval_runtime": 33.3746,
3549
+ "eval_samples_per_second": 3.506,
3550
+ "eval_steps_per_second": 1.768,
3551
+ "step": 1885
3552
+ },
3553
+ {
3554
+ "epoch": 0.0378,
3555
+ "eval_loss": 2.451744556427002,
3556
+ "eval_runtime": 33.3214,
3557
+ "eval_samples_per_second": 3.511,
3558
+ "eval_steps_per_second": 1.771,
3559
+ "step": 1890
3560
+ },
3561
+ {
3562
+ "epoch": 0.0379,
3563
+ "eval_loss": 2.451737642288208,
3564
+ "eval_runtime": 33.3457,
3565
+ "eval_samples_per_second": 3.509,
3566
+ "eval_steps_per_second": 1.769,
3567
+ "step": 1895
3568
+ },
3569
+ {
3570
+ "epoch": 0.038,
3571
+ "grad_norm": 0.03431980647954774,
3572
+ "learning_rate": 3.7980000000000007e-06,
3573
+ "loss": 2.4477,
3574
+ "step": 1900
3575
+ },
3576
+ {
3577
+ "epoch": 0.038,
3578
+ "eval_loss": 2.4515624046325684,
3579
+ "eval_runtime": 33.312,
3580
+ "eval_samples_per_second": 3.512,
3581
+ "eval_steps_per_second": 1.771,
3582
+ "step": 1900
3583
+ },
3584
+ {
3585
+ "epoch": 0.0381,
3586
+ "eval_loss": 2.4512295722961426,
3587
+ "eval_runtime": 33.3607,
3588
+ "eval_samples_per_second": 3.507,
3589
+ "eval_steps_per_second": 1.769,
3590
+ "step": 1905
3591
+ },
3592
+ {
3593
+ "epoch": 0.0382,
3594
+ "eval_loss": 2.4510445594787598,
3595
+ "eval_runtime": 33.339,
3596
+ "eval_samples_per_second": 3.509,
3597
+ "eval_steps_per_second": 1.77,
3598
+ "step": 1910
3599
+ },
3600
+ {
3601
+ "epoch": 0.0383,
3602
+ "eval_loss": 2.4508397579193115,
3603
+ "eval_runtime": 33.3996,
3604
+ "eval_samples_per_second": 3.503,
3605
+ "eval_steps_per_second": 1.766,
3606
+ "step": 1915
3607
+ },
3608
+ {
3609
+ "epoch": 0.0384,
3610
+ "eval_loss": 2.4510440826416016,
3611
+ "eval_runtime": 33.2905,
3612
+ "eval_samples_per_second": 3.515,
3613
+ "eval_steps_per_second": 1.772,
3614
+ "step": 1920
3615
+ },
3616
+ {
3617
+ "epoch": 0.0385,
3618
+ "grad_norm": 0.03587224652231601,
3619
+ "learning_rate": 3.848e-06,
3620
+ "loss": 2.4433,
3621
+ "step": 1925
3622
+ },
3623
+ {
3624
+ "epoch": 0.0385,
3625
+ "eval_loss": 2.450984239578247,
3626
+ "eval_runtime": 33.3263,
3627
+ "eval_samples_per_second": 3.511,
3628
+ "eval_steps_per_second": 1.77,
3629
+ "step": 1925
3630
+ },
3631
+ {
3632
+ "epoch": 0.0386,
3633
+ "eval_loss": 2.45090651512146,
3634
+ "eval_runtime": 33.3244,
3635
+ "eval_samples_per_second": 3.511,
3636
+ "eval_steps_per_second": 1.77,
3637
+ "step": 1930
3638
+ },
3639
+ {
3640
+ "epoch": 0.0387,
3641
+ "eval_loss": 2.450443983078003,
3642
+ "eval_runtime": 33.3023,
3643
+ "eval_samples_per_second": 3.513,
3644
+ "eval_steps_per_second": 1.772,
3645
+ "step": 1935
3646
+ },
3647
+ {
3648
+ "epoch": 0.0388,
3649
+ "eval_loss": 2.450309991836548,
3650
+ "eval_runtime": 33.4354,
3651
+ "eval_samples_per_second": 3.499,
3652
+ "eval_steps_per_second": 1.765,
3653
+ "step": 1940
3654
+ },
3655
+ {
3656
+ "epoch": 0.0389,
3657
+ "eval_loss": 2.4500510692596436,
3658
+ "eval_runtime": 33.3238,
3659
+ "eval_samples_per_second": 3.511,
3660
+ "eval_steps_per_second": 1.771,
3661
+ "step": 1945
3662
+ },
3663
+ {
3664
+ "epoch": 0.039,
3665
+ "grad_norm": 0.027239293031380653,
3666
+ "learning_rate": 3.898e-06,
3667
+ "loss": 2.4347,
3668
+ "step": 1950
3669
+ },
3670
+ {
3671
+ "epoch": 0.039,
3672
+ "eval_loss": 2.4498231410980225,
3673
+ "eval_runtime": 33.3306,
3674
+ "eval_samples_per_second": 3.51,
3675
+ "eval_steps_per_second": 1.77,
3676
+ "step": 1950
3677
+ },
3678
+ {
3679
+ "epoch": 0.0391,
3680
+ "eval_loss": 2.449704170227051,
3681
+ "eval_runtime": 33.3865,
3682
+ "eval_samples_per_second": 3.504,
3683
+ "eval_steps_per_second": 1.767,
3684
+ "step": 1955
3685
+ },
3686
+ {
3687
+ "epoch": 0.0392,
3688
+ "eval_loss": 2.44974684715271,
3689
+ "eval_runtime": 33.419,
3690
+ "eval_samples_per_second": 3.501,
3691
+ "eval_steps_per_second": 1.765,
3692
+ "step": 1960
3693
+ },
3694
+ {
3695
+ "epoch": 0.0393,
3696
+ "eval_loss": 2.450090169906616,
3697
+ "eval_runtime": 33.5315,
3698
+ "eval_samples_per_second": 3.489,
3699
+ "eval_steps_per_second": 1.76,
3700
+ "step": 1965
3701
+ },
3702
+ {
3703
+ "epoch": 0.0394,
3704
+ "eval_loss": 2.4494845867156982,
3705
+ "eval_runtime": 33.4607,
3706
+ "eval_samples_per_second": 3.497,
3707
+ "eval_steps_per_second": 1.763,
3708
+ "step": 1970
3709
+ },
3710
+ {
3711
+ "epoch": 0.0395,
3712
+ "grad_norm": 0.031553482039351585,
3713
+ "learning_rate": 3.948e-06,
3714
+ "loss": 2.4466,
3715
+ "step": 1975
3716
+ },
3717
+ {
3718
+ "epoch": 0.0395,
3719
+ "eval_loss": 2.449598550796509,
3720
+ "eval_runtime": 33.4853,
3721
+ "eval_samples_per_second": 3.494,
3722
+ "eval_steps_per_second": 1.762,
3723
+ "step": 1975
3724
+ },
3725
+ {
3726
+ "epoch": 0.0396,
3727
+ "eval_loss": 2.449420213699341,
3728
+ "eval_runtime": 33.4626,
3729
+ "eval_samples_per_second": 3.496,
3730
+ "eval_steps_per_second": 1.763,
3731
+ "step": 1980
3732
+ },
3733
+ {
3734
+ "epoch": 0.0397,
3735
+ "eval_loss": 2.449462890625,
3736
+ "eval_runtime": 33.4049,
3737
+ "eval_samples_per_second": 3.502,
3738
+ "eval_steps_per_second": 1.766,
3739
+ "step": 1985
3740
+ },
3741
+ {
3742
+ "epoch": 0.0398,
3743
+ "eval_loss": 2.449423313140869,
3744
+ "eval_runtime": 33.5823,
3745
+ "eval_samples_per_second": 3.484,
3746
+ "eval_steps_per_second": 1.757,
3747
+ "step": 1990
3748
+ },
3749
+ {
3750
+ "epoch": 0.0399,
3751
+ "eval_loss": 2.4491324424743652,
3752
+ "eval_runtime": 33.662,
3753
+ "eval_samples_per_second": 3.476,
3754
+ "eval_steps_per_second": 1.753,
3755
+ "step": 1995
3756
+ },
3757
+ {
3758
+ "epoch": 0.04,
3759
+ "grad_norm": 0.03314009226524554,
3760
+ "learning_rate": 3.9980000000000005e-06,
3761
+ "loss": 2.4391,
3762
+ "step": 2000
3763
+ },
3764
+ {
3765
+ "epoch": 0.04,
3766
+ "eval_loss": 2.449084520339966,
3767
+ "eval_runtime": 33.5872,
3768
+ "eval_samples_per_second": 3.483,
3769
+ "eval_steps_per_second": 1.757,
3770
+ "step": 2000
3771
  }
3772
  ],
3773
  "logging_steps": 25,
 
3787
  "attributes": {}
3788
  }
3789
  },
3790
+ "total_flos": 5.570603510971498e+18,
3791
  "train_batch_size": 1,
3792
  "trial_name": null,
3793
  "trial_params": null