irodkin commited on
Commit
1f45be6
·
verified ·
1 Parent(s): 7c7691f

Training checkpoint at step 3000

Browse files
Files changed (1) hide show
  1. trainer_state.json +1885 -5
trainer_state.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "best_global_step": 2000,
3
- "best_metric": 2.449084520339966,
4
  "best_model_checkpoint": "../runs/karpathy/fineweb-edu-100b-shuffle/meta-llama/Llama-3.2-1B/linear_adamw_wd1e-03_7x1024_mem32_bs64_hf_armt_dmem64/run_20/checkpoint-2000",
5
- "epoch": 0.04,
6
  "eval_steps": 5,
7
- "global_step": 2000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -3768,6 +3768,1886 @@
3768
  "eval_samples_per_second": 3.483,
3769
  "eval_steps_per_second": 1.757,
3770
  "step": 2000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3771
  }
3772
  ],
3773
  "logging_steps": 25,
@@ -3787,7 +5667,7 @@
3787
  "attributes": {}
3788
  }
3789
  },
3790
- "total_flos": 5.570603510971498e+18,
3791
  "train_batch_size": 1,
3792
  "trial_name": null,
3793
  "trial_params": null
 
1
  {
2
+ "best_global_step": 2985,
3
+ "best_metric": 2.4361066818237305,
4
  "best_model_checkpoint": "../runs/karpathy/fineweb-edu-100b-shuffle/meta-llama/Llama-3.2-1B/linear_adamw_wd1e-03_7x1024_mem32_bs64_hf_armt_dmem64/run_20/checkpoint-2000",
5
+ "epoch": 0.06,
6
  "eval_steps": 5,
7
+ "global_step": 3000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
3768
  "eval_samples_per_second": 3.483,
3769
  "eval_steps_per_second": 1.757,
3770
  "step": 2000
3771
+ },
3772
+ {
3773
+ "epoch": 0.0401,
3774
+ "eval_loss": 2.449021577835083,
3775
+ "eval_runtime": 33.5048,
3776
+ "eval_samples_per_second": 3.492,
3777
+ "eval_steps_per_second": 1.761,
3778
+ "step": 2005
3779
+ },
3780
+ {
3781
+ "epoch": 0.0402,
3782
+ "eval_loss": 2.449159622192383,
3783
+ "eval_runtime": 33.4845,
3784
+ "eval_samples_per_second": 3.494,
3785
+ "eval_steps_per_second": 1.762,
3786
+ "step": 2010
3787
+ },
3788
+ {
3789
+ "epoch": 0.0403,
3790
+ "eval_loss": 2.448726177215576,
3791
+ "eval_runtime": 33.9926,
3792
+ "eval_samples_per_second": 3.442,
3793
+ "eval_steps_per_second": 1.736,
3794
+ "step": 2015
3795
+ },
3796
+ {
3797
+ "epoch": 0.0404,
3798
+ "eval_loss": 2.4484922885894775,
3799
+ "eval_runtime": 33.6594,
3800
+ "eval_samples_per_second": 3.476,
3801
+ "eval_steps_per_second": 1.753,
3802
+ "step": 2020
3803
+ },
3804
+ {
3805
+ "epoch": 0.0405,
3806
+ "grad_norm": 0.029877786947315705,
3807
+ "learning_rate": 4.048e-06,
3808
+ "loss": 2.438,
3809
+ "step": 2025
3810
+ },
3811
+ {
3812
+ "epoch": 0.0405,
3813
+ "eval_loss": 2.4485254287719727,
3814
+ "eval_runtime": 33.6812,
3815
+ "eval_samples_per_second": 3.474,
3816
+ "eval_steps_per_second": 1.752,
3817
+ "step": 2025
3818
+ },
3819
+ {
3820
+ "epoch": 0.0406,
3821
+ "eval_loss": 2.448495388031006,
3822
+ "eval_runtime": 33.9733,
3823
+ "eval_samples_per_second": 3.444,
3824
+ "eval_steps_per_second": 1.737,
3825
+ "step": 2030
3826
+ },
3827
+ {
3828
+ "epoch": 0.0407,
3829
+ "eval_loss": 2.4482643604278564,
3830
+ "eval_runtime": 33.9957,
3831
+ "eval_samples_per_second": 3.442,
3832
+ "eval_steps_per_second": 1.736,
3833
+ "step": 2035
3834
+ },
3835
+ {
3836
+ "epoch": 0.0408,
3837
+ "eval_loss": 2.4481942653656006,
3838
+ "eval_runtime": 34.3014,
3839
+ "eval_samples_per_second": 3.411,
3840
+ "eval_steps_per_second": 1.72,
3841
+ "step": 2040
3842
+ },
3843
+ {
3844
+ "epoch": 0.0409,
3845
+ "eval_loss": 2.448082208633423,
3846
+ "eval_runtime": 34.0411,
3847
+ "eval_samples_per_second": 3.437,
3848
+ "eval_steps_per_second": 1.733,
3849
+ "step": 2045
3850
+ },
3851
+ {
3852
+ "epoch": 0.041,
3853
+ "grad_norm": 0.031175983773220776,
3854
+ "learning_rate": 4.098e-06,
3855
+ "loss": 2.4332,
3856
+ "step": 2050
3857
+ },
3858
+ {
3859
+ "epoch": 0.041,
3860
+ "eval_loss": 2.4478490352630615,
3861
+ "eval_runtime": 33.9245,
3862
+ "eval_samples_per_second": 3.449,
3863
+ "eval_steps_per_second": 1.739,
3864
+ "step": 2050
3865
+ },
3866
+ {
3867
+ "epoch": 0.0411,
3868
+ "eval_loss": 2.4480035305023193,
3869
+ "eval_runtime": 34.0079,
3870
+ "eval_samples_per_second": 3.44,
3871
+ "eval_steps_per_second": 1.735,
3872
+ "step": 2055
3873
+ },
3874
+ {
3875
+ "epoch": 0.0412,
3876
+ "eval_loss": 2.447685718536377,
3877
+ "eval_runtime": 33.999,
3878
+ "eval_samples_per_second": 3.441,
3879
+ "eval_steps_per_second": 1.735,
3880
+ "step": 2060
3881
+ },
3882
+ {
3883
+ "epoch": 0.0413,
3884
+ "eval_loss": 2.447507619857788,
3885
+ "eval_runtime": 34.1446,
3886
+ "eval_samples_per_second": 3.427,
3887
+ "eval_steps_per_second": 1.728,
3888
+ "step": 2065
3889
+ },
3890
+ {
3891
+ "epoch": 0.0414,
3892
+ "eval_loss": 2.447322130203247,
3893
+ "eval_runtime": 33.7479,
3894
+ "eval_samples_per_second": 3.467,
3895
+ "eval_steps_per_second": 1.748,
3896
+ "step": 2070
3897
+ },
3898
+ {
3899
+ "epoch": 0.0415,
3900
+ "grad_norm": 0.02904850084773878,
3901
+ "learning_rate": 4.148000000000001e-06,
3902
+ "loss": 2.4481,
3903
+ "step": 2075
3904
+ },
3905
+ {
3906
+ "epoch": 0.0415,
3907
+ "eval_loss": 2.4471347332000732,
3908
+ "eval_runtime": 33.917,
3909
+ "eval_samples_per_second": 3.45,
3910
+ "eval_steps_per_second": 1.74,
3911
+ "step": 2075
3912
+ },
3913
+ {
3914
+ "epoch": 0.0416,
3915
+ "eval_loss": 2.447152853012085,
3916
+ "eval_runtime": 33.8287,
3917
+ "eval_samples_per_second": 3.459,
3918
+ "eval_steps_per_second": 1.744,
3919
+ "step": 2080
3920
+ },
3921
+ {
3922
+ "epoch": 0.0417,
3923
+ "eval_loss": 2.4469242095947266,
3924
+ "eval_runtime": 33.7591,
3925
+ "eval_samples_per_second": 3.466,
3926
+ "eval_steps_per_second": 1.748,
3927
+ "step": 2085
3928
+ },
3929
+ {
3930
+ "epoch": 0.0418,
3931
+ "eval_loss": 2.4471774101257324,
3932
+ "eval_runtime": 33.7879,
3933
+ "eval_samples_per_second": 3.463,
3934
+ "eval_steps_per_second": 1.746,
3935
+ "step": 2090
3936
+ },
3937
+ {
3938
+ "epoch": 0.0419,
3939
+ "eval_loss": 2.447988986968994,
3940
+ "eval_runtime": 33.6878,
3941
+ "eval_samples_per_second": 3.473,
3942
+ "eval_steps_per_second": 1.751,
3943
+ "step": 2095
3944
+ },
3945
+ {
3946
+ "epoch": 0.042,
3947
+ "grad_norm": 0.033838990669225626,
3948
+ "learning_rate": 4.198e-06,
3949
+ "loss": 2.4386,
3950
+ "step": 2100
3951
+ },
3952
+ {
3953
+ "epoch": 0.042,
3954
+ "eval_loss": 2.4477100372314453,
3955
+ "eval_runtime": 33.6345,
3956
+ "eval_samples_per_second": 3.479,
3957
+ "eval_steps_per_second": 1.754,
3958
+ "step": 2100
3959
+ },
3960
+ {
3961
+ "epoch": 0.0421,
3962
+ "eval_loss": 2.447394847869873,
3963
+ "eval_runtime": 33.6221,
3964
+ "eval_samples_per_second": 3.48,
3965
+ "eval_steps_per_second": 1.755,
3966
+ "step": 2105
3967
+ },
3968
+ {
3969
+ "epoch": 0.0422,
3970
+ "eval_loss": 2.4470951557159424,
3971
+ "eval_runtime": 33.6689,
3972
+ "eval_samples_per_second": 3.475,
3973
+ "eval_steps_per_second": 1.752,
3974
+ "step": 2110
3975
+ },
3976
+ {
3977
+ "epoch": 0.0423,
3978
+ "eval_loss": 2.4467623233795166,
3979
+ "eval_runtime": 33.6979,
3980
+ "eval_samples_per_second": 3.472,
3981
+ "eval_steps_per_second": 1.751,
3982
+ "step": 2115
3983
+ },
3984
+ {
3985
+ "epoch": 0.0424,
3986
+ "eval_loss": 2.4469833374023438,
3987
+ "eval_runtime": 33.8632,
3988
+ "eval_samples_per_second": 3.455,
3989
+ "eval_steps_per_second": 1.742,
3990
+ "step": 2120
3991
+ },
3992
+ {
3993
+ "epoch": 0.0425,
3994
+ "grad_norm": 0.0382703849144026,
3995
+ "learning_rate": 4.248000000000001e-06,
3996
+ "loss": 2.4313,
3997
+ "step": 2125
3998
+ },
3999
+ {
4000
+ "epoch": 0.0425,
4001
+ "eval_loss": 2.447753667831421,
4002
+ "eval_runtime": 33.7269,
4003
+ "eval_samples_per_second": 3.469,
4004
+ "eval_steps_per_second": 1.749,
4005
+ "step": 2125
4006
+ },
4007
+ {
4008
+ "epoch": 0.0426,
4009
+ "eval_loss": 2.447281837463379,
4010
+ "eval_runtime": 33.7037,
4011
+ "eval_samples_per_second": 3.471,
4012
+ "eval_steps_per_second": 1.751,
4013
+ "step": 2130
4014
+ },
4015
+ {
4016
+ "epoch": 0.0427,
4017
+ "eval_loss": 2.4472267627716064,
4018
+ "eval_runtime": 33.6873,
4019
+ "eval_samples_per_second": 3.473,
4020
+ "eval_steps_per_second": 1.751,
4021
+ "step": 2135
4022
+ },
4023
+ {
4024
+ "epoch": 0.0428,
4025
+ "eval_loss": 2.446859836578369,
4026
+ "eval_runtime": 33.6738,
4027
+ "eval_samples_per_second": 3.475,
4028
+ "eval_steps_per_second": 1.752,
4029
+ "step": 2140
4030
+ },
4031
+ {
4032
+ "epoch": 0.0429,
4033
+ "eval_loss": 2.446655035018921,
4034
+ "eval_runtime": 33.6536,
4035
+ "eval_samples_per_second": 3.477,
4036
+ "eval_steps_per_second": 1.753,
4037
+ "step": 2145
4038
+ },
4039
+ {
4040
+ "epoch": 0.043,
4041
+ "grad_norm": 0.027126678960545086,
4042
+ "learning_rate": 4.298e-06,
4043
+ "loss": 2.4298,
4044
+ "step": 2150
4045
+ },
4046
+ {
4047
+ "epoch": 0.043,
4048
+ "eval_loss": 2.4463651180267334,
4049
+ "eval_runtime": 33.6454,
4050
+ "eval_samples_per_second": 3.477,
4051
+ "eval_steps_per_second": 1.754,
4052
+ "step": 2150
4053
+ },
4054
+ {
4055
+ "epoch": 0.0431,
4056
+ "eval_loss": 2.4461581707000732,
4057
+ "eval_runtime": 33.6166,
4058
+ "eval_samples_per_second": 3.48,
4059
+ "eval_steps_per_second": 1.755,
4060
+ "step": 2155
4061
+ },
4062
+ {
4063
+ "epoch": 0.0432,
4064
+ "eval_loss": 2.4461660385131836,
4065
+ "eval_runtime": 33.5484,
4066
+ "eval_samples_per_second": 3.488,
4067
+ "eval_steps_per_second": 1.759,
4068
+ "step": 2160
4069
+ },
4070
+ {
4071
+ "epoch": 0.0433,
4072
+ "eval_loss": 2.4458513259887695,
4073
+ "eval_runtime": 33.6579,
4074
+ "eval_samples_per_second": 3.476,
4075
+ "eval_steps_per_second": 1.753,
4076
+ "step": 2165
4077
+ },
4078
+ {
4079
+ "epoch": 0.0434,
4080
+ "eval_loss": 2.4454855918884277,
4081
+ "eval_runtime": 33.5647,
4082
+ "eval_samples_per_second": 3.486,
4083
+ "eval_steps_per_second": 1.758,
4084
+ "step": 2170
4085
+ },
4086
+ {
4087
+ "epoch": 0.0435,
4088
+ "grad_norm": 0.030565328679921875,
4089
+ "learning_rate": 4.3480000000000006e-06,
4090
+ "loss": 2.4387,
4091
+ "step": 2175
4092
+ },
4093
+ {
4094
+ "epoch": 0.0435,
4095
+ "eval_loss": 2.445688009262085,
4096
+ "eval_runtime": 33.5164,
4097
+ "eval_samples_per_second": 3.491,
4098
+ "eval_steps_per_second": 1.76,
4099
+ "step": 2175
4100
+ },
4101
+ {
4102
+ "epoch": 0.0436,
4103
+ "eval_loss": 2.4456729888916016,
4104
+ "eval_runtime": 33.4724,
4105
+ "eval_samples_per_second": 3.495,
4106
+ "eval_steps_per_second": 1.763,
4107
+ "step": 2180
4108
+ },
4109
+ {
4110
+ "epoch": 0.0437,
4111
+ "eval_loss": 2.4460015296936035,
4112
+ "eval_runtime": 33.3984,
4113
+ "eval_samples_per_second": 3.503,
4114
+ "eval_steps_per_second": 1.767,
4115
+ "step": 2185
4116
+ },
4117
+ {
4118
+ "epoch": 0.0438,
4119
+ "eval_loss": 2.4460256099700928,
4120
+ "eval_runtime": 33.4582,
4121
+ "eval_samples_per_second": 3.497,
4122
+ "eval_steps_per_second": 1.763,
4123
+ "step": 2190
4124
+ },
4125
+ {
4126
+ "epoch": 0.0439,
4127
+ "eval_loss": 2.4456872940063477,
4128
+ "eval_runtime": 33.444,
4129
+ "eval_samples_per_second": 3.498,
4130
+ "eval_steps_per_second": 1.764,
4131
+ "step": 2195
4132
+ },
4133
+ {
4134
+ "epoch": 0.044,
4135
+ "grad_norm": 0.03864046787827566,
4136
+ "learning_rate": 4.398000000000001e-06,
4137
+ "loss": 2.445,
4138
+ "step": 2200
4139
+ },
4140
+ {
4141
+ "epoch": 0.044,
4142
+ "eval_loss": 2.4454870223999023,
4143
+ "eval_runtime": 33.4474,
4144
+ "eval_samples_per_second": 3.498,
4145
+ "eval_steps_per_second": 1.764,
4146
+ "step": 2200
4147
+ },
4148
+ {
4149
+ "epoch": 0.0441,
4150
+ "eval_loss": 2.4453113079071045,
4151
+ "eval_runtime": 33.4062,
4152
+ "eval_samples_per_second": 3.502,
4153
+ "eval_steps_per_second": 1.766,
4154
+ "step": 2205
4155
+ },
4156
+ {
4157
+ "epoch": 0.0442,
4158
+ "eval_loss": 2.4448771476745605,
4159
+ "eval_runtime": 33.3542,
4160
+ "eval_samples_per_second": 3.508,
4161
+ "eval_steps_per_second": 1.769,
4162
+ "step": 2210
4163
+ },
4164
+ {
4165
+ "epoch": 0.0443,
4166
+ "eval_loss": 2.444946765899658,
4167
+ "eval_runtime": 33.3997,
4168
+ "eval_samples_per_second": 3.503,
4169
+ "eval_steps_per_second": 1.766,
4170
+ "step": 2215
4171
+ },
4172
+ {
4173
+ "epoch": 0.0444,
4174
+ "eval_loss": 2.445194959640503,
4175
+ "eval_runtime": 33.3669,
4176
+ "eval_samples_per_second": 3.506,
4177
+ "eval_steps_per_second": 1.768,
4178
+ "step": 2220
4179
+ },
4180
+ {
4181
+ "epoch": 0.0445,
4182
+ "grad_norm": 0.026792091668494698,
4183
+ "learning_rate": 4.4480000000000004e-06,
4184
+ "loss": 2.4339,
4185
+ "step": 2225
4186
+ },
4187
+ {
4188
+ "epoch": 0.0445,
4189
+ "eval_loss": 2.445009469985962,
4190
+ "eval_runtime": 33.4467,
4191
+ "eval_samples_per_second": 3.498,
4192
+ "eval_steps_per_second": 1.764,
4193
+ "step": 2225
4194
+ },
4195
+ {
4196
+ "epoch": 0.0446,
4197
+ "eval_loss": 2.4450981616973877,
4198
+ "eval_runtime": 33.4513,
4199
+ "eval_samples_per_second": 3.498,
4200
+ "eval_steps_per_second": 1.764,
4201
+ "step": 2230
4202
+ },
4203
+ {
4204
+ "epoch": 0.0447,
4205
+ "eval_loss": 2.444899082183838,
4206
+ "eval_runtime": 33.3869,
4207
+ "eval_samples_per_second": 3.504,
4208
+ "eval_steps_per_second": 1.767,
4209
+ "step": 2235
4210
+ },
4211
+ {
4212
+ "epoch": 0.0448,
4213
+ "eval_loss": 2.4448494911193848,
4214
+ "eval_runtime": 33.486,
4215
+ "eval_samples_per_second": 3.494,
4216
+ "eval_steps_per_second": 1.762,
4217
+ "step": 2240
4218
+ },
4219
+ {
4220
+ "epoch": 0.0449,
4221
+ "eval_loss": 2.444640636444092,
4222
+ "eval_runtime": 33.4202,
4223
+ "eval_samples_per_second": 3.501,
4224
+ "eval_steps_per_second": 1.765,
4225
+ "step": 2245
4226
+ },
4227
+ {
4228
+ "epoch": 0.045,
4229
+ "grad_norm": 0.027104711228224686,
4230
+ "learning_rate": 4.498e-06,
4231
+ "loss": 2.4326,
4232
+ "step": 2250
4233
+ },
4234
+ {
4235
+ "epoch": 0.045,
4236
+ "eval_loss": 2.444633722305298,
4237
+ "eval_runtime": 33.4154,
4238
+ "eval_samples_per_second": 3.501,
4239
+ "eval_steps_per_second": 1.766,
4240
+ "step": 2250
4241
+ },
4242
+ {
4243
+ "epoch": 0.0451,
4244
+ "eval_loss": 2.44467830657959,
4245
+ "eval_runtime": 33.4237,
4246
+ "eval_samples_per_second": 3.501,
4247
+ "eval_steps_per_second": 1.765,
4248
+ "step": 2255
4249
+ },
4250
+ {
4251
+ "epoch": 0.0452,
4252
+ "eval_loss": 2.444413900375366,
4253
+ "eval_runtime": 33.3694,
4254
+ "eval_samples_per_second": 3.506,
4255
+ "eval_steps_per_second": 1.768,
4256
+ "step": 2260
4257
+ },
4258
+ {
4259
+ "epoch": 0.0453,
4260
+ "eval_loss": 2.444222927093506,
4261
+ "eval_runtime": 33.3585,
4262
+ "eval_samples_per_second": 3.507,
4263
+ "eval_steps_per_second": 1.769,
4264
+ "step": 2265
4265
+ },
4266
+ {
4267
+ "epoch": 0.0454,
4268
+ "eval_loss": 2.444108724594116,
4269
+ "eval_runtime": 33.3346,
4270
+ "eval_samples_per_second": 3.51,
4271
+ "eval_steps_per_second": 1.77,
4272
+ "step": 2270
4273
+ },
4274
+ {
4275
+ "epoch": 0.0455,
4276
+ "grad_norm": 0.033569645173308425,
4277
+ "learning_rate": 4.548e-06,
4278
+ "loss": 2.4342,
4279
+ "step": 2275
4280
+ },
4281
+ {
4282
+ "epoch": 0.0455,
4283
+ "eval_loss": 2.443859577178955,
4284
+ "eval_runtime": 33.3636,
4285
+ "eval_samples_per_second": 3.507,
4286
+ "eval_steps_per_second": 1.768,
4287
+ "step": 2275
4288
+ },
4289
+ {
4290
+ "epoch": 0.0456,
4291
+ "eval_loss": 2.4441120624542236,
4292
+ "eval_runtime": 33.2442,
4293
+ "eval_samples_per_second": 3.519,
4294
+ "eval_steps_per_second": 1.775,
4295
+ "step": 2280
4296
+ },
4297
+ {
4298
+ "epoch": 0.0457,
4299
+ "eval_loss": 2.4439260959625244,
4300
+ "eval_runtime": 33.2924,
4301
+ "eval_samples_per_second": 3.514,
4302
+ "eval_steps_per_second": 1.772,
4303
+ "step": 2285
4304
+ },
4305
+ {
4306
+ "epoch": 0.0458,
4307
+ "eval_loss": 2.4439032077789307,
4308
+ "eval_runtime": 33.4004,
4309
+ "eval_samples_per_second": 3.503,
4310
+ "eval_steps_per_second": 1.766,
4311
+ "step": 2290
4312
+ },
4313
+ {
4314
+ "epoch": 0.0459,
4315
+ "eval_loss": 2.443621873855591,
4316
+ "eval_runtime": 33.3314,
4317
+ "eval_samples_per_second": 3.51,
4318
+ "eval_steps_per_second": 1.77,
4319
+ "step": 2295
4320
+ },
4321
+ {
4322
+ "epoch": 0.046,
4323
+ "grad_norm": 0.02648413187023774,
4324
+ "learning_rate": 4.598e-06,
4325
+ "loss": 2.4368,
4326
+ "step": 2300
4327
+ },
4328
+ {
4329
+ "epoch": 0.046,
4330
+ "eval_loss": 2.4436306953430176,
4331
+ "eval_runtime": 33.372,
4332
+ "eval_samples_per_second": 3.506,
4333
+ "eval_steps_per_second": 1.768,
4334
+ "step": 2300
4335
+ },
4336
+ {
4337
+ "epoch": 0.0461,
4338
+ "eval_loss": 2.4436404705047607,
4339
+ "eval_runtime": 33.3039,
4340
+ "eval_samples_per_second": 3.513,
4341
+ "eval_steps_per_second": 1.772,
4342
+ "step": 2305
4343
+ },
4344
+ {
4345
+ "epoch": 0.0462,
4346
+ "eval_loss": 2.44333815574646,
4347
+ "eval_runtime": 33.3059,
4348
+ "eval_samples_per_second": 3.513,
4349
+ "eval_steps_per_second": 1.771,
4350
+ "step": 2310
4351
+ },
4352
+ {
4353
+ "epoch": 0.0463,
4354
+ "eval_loss": 2.443415880203247,
4355
+ "eval_runtime": 33.4065,
4356
+ "eval_samples_per_second": 3.502,
4357
+ "eval_steps_per_second": 1.766,
4358
+ "step": 2315
4359
+ },
4360
+ {
4361
+ "epoch": 0.0464,
4362
+ "eval_loss": 2.443068742752075,
4363
+ "eval_runtime": 33.2818,
4364
+ "eval_samples_per_second": 3.515,
4365
+ "eval_steps_per_second": 1.773,
4366
+ "step": 2320
4367
+ },
4368
+ {
4369
+ "epoch": 0.0465,
4370
+ "grad_norm": 0.0351440602227012,
4371
+ "learning_rate": 4.648e-06,
4372
+ "loss": 2.4381,
4373
+ "step": 2325
4374
+ },
4375
+ {
4376
+ "epoch": 0.0465,
4377
+ "eval_loss": 2.443199634552002,
4378
+ "eval_runtime": 33.3538,
4379
+ "eval_samples_per_second": 3.508,
4380
+ "eval_steps_per_second": 1.769,
4381
+ "step": 2325
4382
+ },
4383
+ {
4384
+ "epoch": 0.0466,
4385
+ "eval_loss": 2.4433047771453857,
4386
+ "eval_runtime": 33.4816,
4387
+ "eval_samples_per_second": 3.494,
4388
+ "eval_steps_per_second": 1.762,
4389
+ "step": 2330
4390
+ },
4391
+ {
4392
+ "epoch": 0.0467,
4393
+ "eval_loss": 2.443272113800049,
4394
+ "eval_runtime": 33.5015,
4395
+ "eval_samples_per_second": 3.492,
4396
+ "eval_steps_per_second": 1.761,
4397
+ "step": 2335
4398
+ },
4399
+ {
4400
+ "epoch": 0.0468,
4401
+ "eval_loss": 2.443246603012085,
4402
+ "eval_runtime": 33.5753,
4403
+ "eval_samples_per_second": 3.485,
4404
+ "eval_steps_per_second": 1.757,
4405
+ "step": 2340
4406
+ },
4407
+ {
4408
+ "epoch": 0.0469,
4409
+ "eval_loss": 2.4432363510131836,
4410
+ "eval_runtime": 33.2869,
4411
+ "eval_samples_per_second": 3.515,
4412
+ "eval_steps_per_second": 1.772,
4413
+ "step": 2345
4414
+ },
4415
+ {
4416
+ "epoch": 0.047,
4417
+ "grad_norm": 0.02695670446644145,
4418
+ "learning_rate": 4.698000000000001e-06,
4419
+ "loss": 2.4303,
4420
+ "step": 2350
4421
+ },
4422
+ {
4423
+ "epoch": 0.047,
4424
+ "eval_loss": 2.4429421424865723,
4425
+ "eval_runtime": 33.3556,
4426
+ "eval_samples_per_second": 3.508,
4427
+ "eval_steps_per_second": 1.769,
4428
+ "step": 2350
4429
+ },
4430
+ {
4431
+ "epoch": 0.0471,
4432
+ "eval_loss": 2.4427566528320312,
4433
+ "eval_runtime": 33.3612,
4434
+ "eval_samples_per_second": 3.507,
4435
+ "eval_steps_per_second": 1.769,
4436
+ "step": 2355
4437
+ },
4438
+ {
4439
+ "epoch": 0.0472,
4440
+ "eval_loss": 2.4425995349884033,
4441
+ "eval_runtime": 33.353,
4442
+ "eval_samples_per_second": 3.508,
4443
+ "eval_steps_per_second": 1.769,
4444
+ "step": 2360
4445
+ },
4446
+ {
4447
+ "epoch": 0.0473,
4448
+ "eval_loss": 2.4426395893096924,
4449
+ "eval_runtime": 33.4669,
4450
+ "eval_samples_per_second": 3.496,
4451
+ "eval_steps_per_second": 1.763,
4452
+ "step": 2365
4453
+ },
4454
+ {
4455
+ "epoch": 0.0474,
4456
+ "eval_loss": 2.4425301551818848,
4457
+ "eval_runtime": 33.3803,
4458
+ "eval_samples_per_second": 3.505,
4459
+ "eval_steps_per_second": 1.768,
4460
+ "step": 2370
4461
+ },
4462
+ {
4463
+ "epoch": 0.0475,
4464
+ "grad_norm": 0.031232764672567994,
4465
+ "learning_rate": 4.748e-06,
4466
+ "loss": 2.4284,
4467
+ "step": 2375
4468
+ },
4469
+ {
4470
+ "epoch": 0.0475,
4471
+ "eval_loss": 2.4426214694976807,
4472
+ "eval_runtime": 33.3013,
4473
+ "eval_samples_per_second": 3.513,
4474
+ "eval_steps_per_second": 1.772,
4475
+ "step": 2375
4476
+ },
4477
+ {
4478
+ "epoch": 0.0476,
4479
+ "eval_loss": 2.442599296569824,
4480
+ "eval_runtime": 33.3419,
4481
+ "eval_samples_per_second": 3.509,
4482
+ "eval_steps_per_second": 1.77,
4483
+ "step": 2380
4484
+ },
4485
+ {
4486
+ "epoch": 0.0477,
4487
+ "eval_loss": 2.442364454269409,
4488
+ "eval_runtime": 33.3677,
4489
+ "eval_samples_per_second": 3.506,
4490
+ "eval_steps_per_second": 1.768,
4491
+ "step": 2385
4492
+ },
4493
+ {
4494
+ "epoch": 0.0478,
4495
+ "eval_loss": 2.4425458908081055,
4496
+ "eval_runtime": 33.3892,
4497
+ "eval_samples_per_second": 3.504,
4498
+ "eval_steps_per_second": 1.767,
4499
+ "step": 2390
4500
+ },
4501
+ {
4502
+ "epoch": 0.0479,
4503
+ "eval_loss": 2.4425549507141113,
4504
+ "eval_runtime": 33.4202,
4505
+ "eval_samples_per_second": 3.501,
4506
+ "eval_steps_per_second": 1.765,
4507
+ "step": 2395
4508
+ },
4509
+ {
4510
+ "epoch": 0.048,
4511
+ "grad_norm": 0.027127721086561404,
4512
+ "learning_rate": 4.7980000000000005e-06,
4513
+ "loss": 2.4291,
4514
+ "step": 2400
4515
+ },
4516
+ {
4517
+ "epoch": 0.048,
4518
+ "eval_loss": 2.4425251483917236,
4519
+ "eval_runtime": 33.3802,
4520
+ "eval_samples_per_second": 3.505,
4521
+ "eval_steps_per_second": 1.768,
4522
+ "step": 2400
4523
+ },
4524
+ {
4525
+ "epoch": 0.0481,
4526
+ "eval_loss": 2.4424123764038086,
4527
+ "eval_runtime": 33.3283,
4528
+ "eval_samples_per_second": 3.511,
4529
+ "eval_steps_per_second": 1.77,
4530
+ "step": 2405
4531
+ },
4532
+ {
4533
+ "epoch": 0.0482,
4534
+ "eval_loss": 2.4421849250793457,
4535
+ "eval_runtime": 33.4172,
4536
+ "eval_samples_per_second": 3.501,
4537
+ "eval_steps_per_second": 1.766,
4538
+ "step": 2410
4539
+ },
4540
+ {
4541
+ "epoch": 0.0483,
4542
+ "eval_loss": 2.4419970512390137,
4543
+ "eval_runtime": 33.4642,
4544
+ "eval_samples_per_second": 3.496,
4545
+ "eval_steps_per_second": 1.763,
4546
+ "step": 2415
4547
+ },
4548
+ {
4549
+ "epoch": 0.0484,
4550
+ "eval_loss": 2.4419567584991455,
4551
+ "eval_runtime": 33.3663,
4552
+ "eval_samples_per_second": 3.507,
4553
+ "eval_steps_per_second": 1.768,
4554
+ "step": 2420
4555
+ },
4556
+ {
4557
+ "epoch": 0.0485,
4558
+ "grad_norm": 0.026032952013136927,
4559
+ "learning_rate": 4.848000000000001e-06,
4560
+ "loss": 2.4256,
4561
+ "step": 2425
4562
+ },
4563
+ {
4564
+ "epoch": 0.0485,
4565
+ "eval_loss": 2.441688299179077,
4566
+ "eval_runtime": 33.3169,
4567
+ "eval_samples_per_second": 3.512,
4568
+ "eval_steps_per_second": 1.771,
4569
+ "step": 2425
4570
+ },
4571
+ {
4572
+ "epoch": 0.0486,
4573
+ "eval_loss": 2.4417548179626465,
4574
+ "eval_runtime": 33.3476,
4575
+ "eval_samples_per_second": 3.508,
4576
+ "eval_steps_per_second": 1.769,
4577
+ "step": 2430
4578
+ },
4579
+ {
4580
+ "epoch": 0.0487,
4581
+ "eval_loss": 2.441769599914551,
4582
+ "eval_runtime": 33.4488,
4583
+ "eval_samples_per_second": 3.498,
4584
+ "eval_steps_per_second": 1.764,
4585
+ "step": 2435
4586
+ },
4587
+ {
4588
+ "epoch": 0.0488,
4589
+ "eval_loss": 2.4415283203125,
4590
+ "eval_runtime": 33.4555,
4591
+ "eval_samples_per_second": 3.497,
4592
+ "eval_steps_per_second": 1.764,
4593
+ "step": 2440
4594
+ },
4595
+ {
4596
+ "epoch": 0.0489,
4597
+ "eval_loss": 2.4416847229003906,
4598
+ "eval_runtime": 33.2459,
4599
+ "eval_samples_per_second": 3.519,
4600
+ "eval_steps_per_second": 1.775,
4601
+ "step": 2445
4602
+ },
4603
+ {
4604
+ "epoch": 0.049,
4605
+ "grad_norm": 0.02804626155591942,
4606
+ "learning_rate": 4.898e-06,
4607
+ "loss": 2.4334,
4608
+ "step": 2450
4609
+ },
4610
+ {
4611
+ "epoch": 0.049,
4612
+ "eval_loss": 2.4414188861846924,
4613
+ "eval_runtime": 33.2989,
4614
+ "eval_samples_per_second": 3.514,
4615
+ "eval_steps_per_second": 1.772,
4616
+ "step": 2450
4617
+ },
4618
+ {
4619
+ "epoch": 0.0491,
4620
+ "eval_loss": 2.4416472911834717,
4621
+ "eval_runtime": 33.3676,
4622
+ "eval_samples_per_second": 3.506,
4623
+ "eval_steps_per_second": 1.768,
4624
+ "step": 2455
4625
+ },
4626
+ {
4627
+ "epoch": 0.0492,
4628
+ "eval_loss": 2.4414844512939453,
4629
+ "eval_runtime": 33.4116,
4630
+ "eval_samples_per_second": 3.502,
4631
+ "eval_steps_per_second": 1.766,
4632
+ "step": 2460
4633
+ },
4634
+ {
4635
+ "epoch": 0.0493,
4636
+ "eval_loss": 2.441408395767212,
4637
+ "eval_runtime": 33.6104,
4638
+ "eval_samples_per_second": 3.481,
4639
+ "eval_steps_per_second": 1.755,
4640
+ "step": 2465
4641
+ },
4642
+ {
4643
+ "epoch": 0.0494,
4644
+ "eval_loss": 2.4413650035858154,
4645
+ "eval_runtime": 33.3838,
4646
+ "eval_samples_per_second": 3.505,
4647
+ "eval_steps_per_second": 1.767,
4648
+ "step": 2470
4649
+ },
4650
+ {
4651
+ "epoch": 0.0495,
4652
+ "grad_norm": 0.025351866385684634,
4653
+ "learning_rate": 4.948000000000001e-06,
4654
+ "loss": 2.4356,
4655
+ "step": 2475
4656
+ },
4657
+ {
4658
+ "epoch": 0.0495,
4659
+ "eval_loss": 2.4411768913269043,
4660
+ "eval_runtime": 33.3857,
4661
+ "eval_samples_per_second": 3.504,
4662
+ "eval_steps_per_second": 1.767,
4663
+ "step": 2475
4664
+ },
4665
+ {
4666
+ "epoch": 0.0496,
4667
+ "eval_loss": 2.441201686859131,
4668
+ "eval_runtime": 33.4117,
4669
+ "eval_samples_per_second": 3.502,
4670
+ "eval_steps_per_second": 1.766,
4671
+ "step": 2480
4672
+ },
4673
+ {
4674
+ "epoch": 0.0497,
4675
+ "eval_loss": 2.4408698081970215,
4676
+ "eval_runtime": 33.3015,
4677
+ "eval_samples_per_second": 3.513,
4678
+ "eval_steps_per_second": 1.772,
4679
+ "step": 2485
4680
+ },
4681
+ {
4682
+ "epoch": 0.0498,
4683
+ "eval_loss": 2.440950393676758,
4684
+ "eval_runtime": 33.379,
4685
+ "eval_samples_per_second": 3.505,
4686
+ "eval_steps_per_second": 1.768,
4687
+ "step": 2490
4688
+ },
4689
+ {
4690
+ "epoch": 0.0499,
4691
+ "eval_loss": 2.4407267570495605,
4692
+ "eval_runtime": 33.2561,
4693
+ "eval_samples_per_second": 3.518,
4694
+ "eval_steps_per_second": 1.774,
4695
+ "step": 2495
4696
+ },
4697
+ {
4698
+ "epoch": 0.05,
4699
+ "grad_norm": 0.029743600833546286,
4700
+ "learning_rate": 4.998e-06,
4701
+ "loss": 2.4369,
4702
+ "step": 2500
4703
+ },
4704
+ {
4705
+ "epoch": 0.05,
4706
+ "eval_loss": 2.4408068656921387,
4707
+ "eval_runtime": 33.3807,
4708
+ "eval_samples_per_second": 3.505,
4709
+ "eval_steps_per_second": 1.767,
4710
+ "step": 2500
4711
+ },
4712
+ {
4713
+ "epoch": 0.0501,
4714
+ "eval_loss": 2.4407401084899902,
4715
+ "eval_runtime": 33.2295,
4716
+ "eval_samples_per_second": 3.521,
4717
+ "eval_steps_per_second": 1.776,
4718
+ "step": 2505
4719
+ },
4720
+ {
4721
+ "epoch": 0.0502,
4722
+ "eval_loss": 2.4409286975860596,
4723
+ "eval_runtime": 33.3925,
4724
+ "eval_samples_per_second": 3.504,
4725
+ "eval_steps_per_second": 1.767,
4726
+ "step": 2510
4727
+ },
4728
+ {
4729
+ "epoch": 0.0503,
4730
+ "eval_loss": 2.4407782554626465,
4731
+ "eval_runtime": 33.4498,
4732
+ "eval_samples_per_second": 3.498,
4733
+ "eval_steps_per_second": 1.764,
4734
+ "step": 2515
4735
+ },
4736
+ {
4737
+ "epoch": 0.0504,
4738
+ "eval_loss": 2.4407856464385986,
4739
+ "eval_runtime": 33.4899,
4740
+ "eval_samples_per_second": 3.494,
4741
+ "eval_steps_per_second": 1.762,
4742
+ "step": 2520
4743
+ },
4744
+ {
4745
+ "epoch": 0.0505,
4746
+ "grad_norm": 0.027292319342276494,
4747
+ "learning_rate": 5.048000000000001e-06,
4748
+ "loss": 2.4263,
4749
+ "step": 2525
4750
+ },
4751
+ {
4752
+ "epoch": 0.0505,
4753
+ "eval_loss": 2.440830945968628,
4754
+ "eval_runtime": 33.3428,
4755
+ "eval_samples_per_second": 3.509,
4756
+ "eval_steps_per_second": 1.769,
4757
+ "step": 2525
4758
+ },
4759
+ {
4760
+ "epoch": 0.0506,
4761
+ "eval_loss": 2.44069504737854,
4762
+ "eval_runtime": 33.2895,
4763
+ "eval_samples_per_second": 3.515,
4764
+ "eval_steps_per_second": 1.772,
4765
+ "step": 2530
4766
+ },
4767
+ {
4768
+ "epoch": 0.0507,
4769
+ "eval_loss": 2.4408159255981445,
4770
+ "eval_runtime": 33.3488,
4771
+ "eval_samples_per_second": 3.508,
4772
+ "eval_steps_per_second": 1.769,
4773
+ "step": 2535
4774
+ },
4775
+ {
4776
+ "epoch": 0.0508,
4777
+ "eval_loss": 2.440523386001587,
4778
+ "eval_runtime": 33.3582,
4779
+ "eval_samples_per_second": 3.507,
4780
+ "eval_steps_per_second": 1.769,
4781
+ "step": 2540
4782
+ },
4783
+ {
4784
+ "epoch": 0.0509,
4785
+ "eval_loss": 2.4403724670410156,
4786
+ "eval_runtime": 33.5287,
4787
+ "eval_samples_per_second": 3.49,
4788
+ "eval_steps_per_second": 1.76,
4789
+ "step": 2545
4790
+ },
4791
+ {
4792
+ "epoch": 0.051,
4793
+ "grad_norm": 0.02495087994166461,
4794
+ "learning_rate": 5.098000000000001e-06,
4795
+ "loss": 2.428,
4796
+ "step": 2550
4797
+ },
4798
+ {
4799
+ "epoch": 0.051,
4800
+ "eval_loss": 2.440495252609253,
4801
+ "eval_runtime": 34.4575,
4802
+ "eval_samples_per_second": 3.395,
4803
+ "eval_steps_per_second": 1.712,
4804
+ "step": 2550
4805
+ },
4806
+ {
4807
+ "epoch": 0.0511,
4808
+ "eval_loss": 2.440384864807129,
4809
+ "eval_runtime": 34.0144,
4810
+ "eval_samples_per_second": 3.44,
4811
+ "eval_steps_per_second": 1.735,
4812
+ "step": 2555
4813
+ },
4814
+ {
4815
+ "epoch": 0.0512,
4816
+ "eval_loss": 2.4405176639556885,
4817
+ "eval_runtime": 34.5852,
4818
+ "eval_samples_per_second": 3.383,
4819
+ "eval_steps_per_second": 1.706,
4820
+ "step": 2560
4821
+ },
4822
+ {
4823
+ "epoch": 0.0513,
4824
+ "eval_loss": 2.4402472972869873,
4825
+ "eval_runtime": 34.2689,
4826
+ "eval_samples_per_second": 3.414,
4827
+ "eval_steps_per_second": 1.722,
4828
+ "step": 2565
4829
+ },
4830
+ {
4831
+ "epoch": 0.0514,
4832
+ "eval_loss": 2.440459966659546,
4833
+ "eval_runtime": 33.3821,
4834
+ "eval_samples_per_second": 3.505,
4835
+ "eval_steps_per_second": 1.767,
4836
+ "step": 2570
4837
+ },
4838
+ {
4839
+ "epoch": 0.0515,
4840
+ "grad_norm": 0.029728034222700407,
4841
+ "learning_rate": 5.1480000000000005e-06,
4842
+ "loss": 2.439,
4843
+ "step": 2575
4844
+ },
4845
+ {
4846
+ "epoch": 0.0515,
4847
+ "eval_loss": 2.440525531768799,
4848
+ "eval_runtime": 34.3072,
4849
+ "eval_samples_per_second": 3.41,
4850
+ "eval_steps_per_second": 1.72,
4851
+ "step": 2575
4852
+ },
4853
+ {
4854
+ "epoch": 0.0516,
4855
+ "eval_loss": 2.440373420715332,
4856
+ "eval_runtime": 33.5748,
4857
+ "eval_samples_per_second": 3.485,
4858
+ "eval_steps_per_second": 1.757,
4859
+ "step": 2580
4860
+ },
4861
+ {
4862
+ "epoch": 0.0517,
4863
+ "eval_loss": 2.4405770301818848,
4864
+ "eval_runtime": 35.2655,
4865
+ "eval_samples_per_second": 3.318,
4866
+ "eval_steps_per_second": 1.673,
4867
+ "step": 2585
4868
+ },
4869
+ {
4870
+ "epoch": 0.0518,
4871
+ "eval_loss": 2.4402198791503906,
4872
+ "eval_runtime": 34.9918,
4873
+ "eval_samples_per_second": 3.344,
4874
+ "eval_steps_per_second": 1.686,
4875
+ "step": 2590
4876
+ },
4877
+ {
4878
+ "epoch": 0.0519,
4879
+ "eval_loss": 2.440136194229126,
4880
+ "eval_runtime": 33.4873,
4881
+ "eval_samples_per_second": 3.494,
4882
+ "eval_steps_per_second": 1.762,
4883
+ "step": 2595
4884
+ },
4885
+ {
4886
+ "epoch": 0.052,
4887
+ "grad_norm": 0.02473354917836018,
4888
+ "learning_rate": 5.198000000000001e-06,
4889
+ "loss": 2.427,
4890
+ "step": 2600
4891
+ },
4892
+ {
4893
+ "epoch": 0.052,
4894
+ "eval_loss": 2.440282106399536,
4895
+ "eval_runtime": 33.4628,
4896
+ "eval_samples_per_second": 3.496,
4897
+ "eval_steps_per_second": 1.763,
4898
+ "step": 2600
4899
+ },
4900
+ {
4901
+ "epoch": 0.0521,
4902
+ "eval_loss": 2.440448045730591,
4903
+ "eval_runtime": 33.4191,
4904
+ "eval_samples_per_second": 3.501,
4905
+ "eval_steps_per_second": 1.765,
4906
+ "step": 2605
4907
+ },
4908
+ {
4909
+ "epoch": 0.0522,
4910
+ "eval_loss": 2.440248966217041,
4911
+ "eval_runtime": 33.4911,
4912
+ "eval_samples_per_second": 3.493,
4913
+ "eval_steps_per_second": 1.762,
4914
+ "step": 2610
4915
+ },
4916
+ {
4917
+ "epoch": 0.0523,
4918
+ "eval_loss": 2.440030336380005,
4919
+ "eval_runtime": 33.4921,
4920
+ "eval_samples_per_second": 3.493,
4921
+ "eval_steps_per_second": 1.762,
4922
+ "step": 2615
4923
+ },
4924
+ {
4925
+ "epoch": 0.0524,
4926
+ "eval_loss": 2.4397685527801514,
4927
+ "eval_runtime": 33.4491,
4928
+ "eval_samples_per_second": 3.498,
4929
+ "eval_steps_per_second": 1.764,
4930
+ "step": 2620
4931
+ },
4932
+ {
4933
+ "epoch": 0.0525,
4934
+ "grad_norm": 0.026533778128592735,
4935
+ "learning_rate": 5.248000000000001e-06,
4936
+ "loss": 2.4214,
4937
+ "step": 2625
4938
+ },
4939
+ {
4940
+ "epoch": 0.0525,
4941
+ "eval_loss": 2.43971848487854,
4942
+ "eval_runtime": 33.3975,
4943
+ "eval_samples_per_second": 3.503,
4944
+ "eval_steps_per_second": 1.767,
4945
+ "step": 2625
4946
+ },
4947
+ {
4948
+ "epoch": 0.0526,
4949
+ "eval_loss": 2.4398951530456543,
4950
+ "eval_runtime": 33.4912,
4951
+ "eval_samples_per_second": 3.493,
4952
+ "eval_steps_per_second": 1.762,
4953
+ "step": 2630
4954
+ },
4955
+ {
4956
+ "epoch": 0.0527,
4957
+ "eval_loss": 2.43975830078125,
4958
+ "eval_runtime": 33.4071,
4959
+ "eval_samples_per_second": 3.502,
4960
+ "eval_steps_per_second": 1.766,
4961
+ "step": 2635
4962
+ },
4963
+ {
4964
+ "epoch": 0.0528,
4965
+ "eval_loss": 2.439666271209717,
4966
+ "eval_runtime": 33.4208,
4967
+ "eval_samples_per_second": 3.501,
4968
+ "eval_steps_per_second": 1.765,
4969
+ "step": 2640
4970
+ },
4971
+ {
4972
+ "epoch": 0.0529,
4973
+ "eval_loss": 2.439816951751709,
4974
+ "eval_runtime": 33.5111,
4975
+ "eval_samples_per_second": 3.491,
4976
+ "eval_steps_per_second": 1.761,
4977
+ "step": 2645
4978
+ },
4979
+ {
4980
+ "epoch": 0.053,
4981
+ "grad_norm": 0.024723120971366967,
4982
+ "learning_rate": 5.298000000000001e-06,
4983
+ "loss": 2.4241,
4984
+ "step": 2650
4985
+ },
4986
+ {
4987
+ "epoch": 0.053,
4988
+ "eval_loss": 2.4398183822631836,
4989
+ "eval_runtime": 33.506,
4990
+ "eval_samples_per_second": 3.492,
4991
+ "eval_steps_per_second": 1.761,
4992
+ "step": 2650
4993
+ },
4994
+ {
4995
+ "epoch": 0.0531,
4996
+ "eval_loss": 2.4402668476104736,
4997
+ "eval_runtime": 34.1298,
4998
+ "eval_samples_per_second": 3.428,
4999
+ "eval_steps_per_second": 1.729,
5000
+ "step": 2655
5001
+ },
5002
+ {
5003
+ "epoch": 0.0532,
5004
+ "eval_loss": 2.4400885105133057,
5005
+ "eval_runtime": 33.436,
5006
+ "eval_samples_per_second": 3.499,
5007
+ "eval_steps_per_second": 1.765,
5008
+ "step": 2660
5009
+ },
5010
+ {
5011
+ "epoch": 0.0533,
5012
+ "eval_loss": 2.439871311187744,
5013
+ "eval_runtime": 33.3874,
5014
+ "eval_samples_per_second": 3.504,
5015
+ "eval_steps_per_second": 1.767,
5016
+ "step": 2665
5017
+ },
5018
+ {
5019
+ "epoch": 0.0534,
5020
+ "eval_loss": 2.4393365383148193,
5021
+ "eval_runtime": 33.5258,
5022
+ "eval_samples_per_second": 3.49,
5023
+ "eval_steps_per_second": 1.76,
5024
+ "step": 2670
5025
+ },
5026
+ {
5027
+ "epoch": 0.0535,
5028
+ "grad_norm": 0.02173239513971497,
5029
+ "learning_rate": 5.348000000000001e-06,
5030
+ "loss": 2.4295,
5031
+ "step": 2675
5032
+ },
5033
+ {
5034
+ "epoch": 0.0535,
5035
+ "eval_loss": 2.439133405685425,
5036
+ "eval_runtime": 33.4962,
5037
+ "eval_samples_per_second": 3.493,
5038
+ "eval_steps_per_second": 1.761,
5039
+ "step": 2675
5040
+ },
5041
+ {
5042
+ "epoch": 0.0536,
5043
+ "eval_loss": 2.439093589782715,
5044
+ "eval_runtime": 33.4708,
5045
+ "eval_samples_per_second": 3.496,
5046
+ "eval_steps_per_second": 1.763,
5047
+ "step": 2680
5048
+ },
5049
+ {
5050
+ "epoch": 0.0537,
5051
+ "eval_loss": 2.439096212387085,
5052
+ "eval_runtime": 33.4284,
5053
+ "eval_samples_per_second": 3.5,
5054
+ "eval_steps_per_second": 1.765,
5055
+ "step": 2685
5056
+ },
5057
+ {
5058
+ "epoch": 0.0538,
5059
+ "eval_loss": 2.4389584064483643,
5060
+ "eval_runtime": 33.4749,
5061
+ "eval_samples_per_second": 3.495,
5062
+ "eval_steps_per_second": 1.763,
5063
+ "step": 2690
5064
+ },
5065
+ {
5066
+ "epoch": 0.0539,
5067
+ "eval_loss": 2.438805103302002,
5068
+ "eval_runtime": 33.478,
5069
+ "eval_samples_per_second": 3.495,
5070
+ "eval_steps_per_second": 1.762,
5071
+ "step": 2695
5072
+ },
5073
+ {
5074
+ "epoch": 0.054,
5075
+ "grad_norm": 0.023851331909406925,
5076
+ "learning_rate": 5.398e-06,
5077
+ "loss": 2.4302,
5078
+ "step": 2700
5079
+ },
5080
+ {
5081
+ "epoch": 0.054,
5082
+ "eval_loss": 2.4386403560638428,
5083
+ "eval_runtime": 33.4276,
5084
+ "eval_samples_per_second": 3.5,
5085
+ "eval_steps_per_second": 1.765,
5086
+ "step": 2700
5087
+ },
5088
+ {
5089
+ "epoch": 0.0541,
5090
+ "eval_loss": 2.438568115234375,
5091
+ "eval_runtime": 33.528,
5092
+ "eval_samples_per_second": 3.49,
5093
+ "eval_steps_per_second": 1.76,
5094
+ "step": 2705
5095
+ },
5096
+ {
5097
+ "epoch": 0.0542,
5098
+ "eval_loss": 2.438894510269165,
5099
+ "eval_runtime": 33.5228,
5100
+ "eval_samples_per_second": 3.49,
5101
+ "eval_steps_per_second": 1.76,
5102
+ "step": 2710
5103
+ },
5104
+ {
5105
+ "epoch": 0.0543,
5106
+ "eval_loss": 2.4387168884277344,
5107
+ "eval_runtime": 33.4663,
5108
+ "eval_samples_per_second": 3.496,
5109
+ "eval_steps_per_second": 1.763,
5110
+ "step": 2715
5111
+ },
5112
+ {
5113
+ "epoch": 0.0544,
5114
+ "eval_loss": 2.4385879039764404,
5115
+ "eval_runtime": 33.513,
5116
+ "eval_samples_per_second": 3.491,
5117
+ "eval_steps_per_second": 1.761,
5118
+ "step": 2720
5119
+ },
5120
+ {
5121
+ "epoch": 0.0545,
5122
+ "grad_norm": 0.02728082451264937,
5123
+ "learning_rate": 5.448e-06,
5124
+ "loss": 2.4308,
5125
+ "step": 2725
5126
+ },
5127
+ {
5128
+ "epoch": 0.0545,
5129
+ "eval_loss": 2.4388349056243896,
5130
+ "eval_runtime": 33.4525,
5131
+ "eval_samples_per_second": 3.497,
5132
+ "eval_steps_per_second": 1.764,
5133
+ "step": 2725
5134
+ },
5135
+ {
5136
+ "epoch": 0.0546,
5137
+ "eval_loss": 2.438887357711792,
5138
+ "eval_runtime": 33.428,
5139
+ "eval_samples_per_second": 3.5,
5140
+ "eval_steps_per_second": 1.765,
5141
+ "step": 2730
5142
+ },
5143
+ {
5144
+ "epoch": 0.0547,
5145
+ "eval_loss": 2.438713312149048,
5146
+ "eval_runtime": 33.5229,
5147
+ "eval_samples_per_second": 3.49,
5148
+ "eval_steps_per_second": 1.76,
5149
+ "step": 2735
5150
+ },
5151
+ {
5152
+ "epoch": 0.0548,
5153
+ "eval_loss": 2.438657283782959,
5154
+ "eval_runtime": 33.4169,
5155
+ "eval_samples_per_second": 3.501,
5156
+ "eval_steps_per_second": 1.766,
5157
+ "step": 2740
5158
+ },
5159
+ {
5160
+ "epoch": 0.0549,
5161
+ "eval_loss": 2.438544988632202,
5162
+ "eval_runtime": 33.4944,
5163
+ "eval_samples_per_second": 3.493,
5164
+ "eval_steps_per_second": 1.761,
5165
+ "step": 2745
5166
+ },
5167
+ {
5168
+ "epoch": 0.055,
5169
+ "grad_norm": 0.025461121075693184,
5170
+ "learning_rate": 5.498e-06,
5171
+ "loss": 2.4379,
5172
+ "step": 2750
5173
+ },
5174
+ {
5175
+ "epoch": 0.055,
5176
+ "eval_loss": 2.4386098384857178,
5177
+ "eval_runtime": 33.6782,
5178
+ "eval_samples_per_second": 3.474,
5179
+ "eval_steps_per_second": 1.752,
5180
+ "step": 2750
5181
+ },
5182
+ {
5183
+ "epoch": 0.0551,
5184
+ "eval_loss": 2.438521146774292,
5185
+ "eval_runtime": 33.5161,
5186
+ "eval_samples_per_second": 3.491,
5187
+ "eval_steps_per_second": 1.76,
5188
+ "step": 2755
5189
+ },
5190
+ {
5191
+ "epoch": 0.0552,
5192
+ "eval_loss": 2.438474178314209,
5193
+ "eval_runtime": 33.4773,
5194
+ "eval_samples_per_second": 3.495,
5195
+ "eval_steps_per_second": 1.762,
5196
+ "step": 2760
5197
+ },
5198
+ {
5199
+ "epoch": 0.0553,
5200
+ "eval_loss": 2.4382379055023193,
5201
+ "eval_runtime": 33.4869,
5202
+ "eval_samples_per_second": 3.494,
5203
+ "eval_steps_per_second": 1.762,
5204
+ "step": 2765
5205
+ },
5206
+ {
5207
+ "epoch": 0.0554,
5208
+ "eval_loss": 2.438157796859741,
5209
+ "eval_runtime": 33.543,
5210
+ "eval_samples_per_second": 3.488,
5211
+ "eval_steps_per_second": 1.759,
5212
+ "step": 2770
5213
+ },
5214
+ {
5215
+ "epoch": 0.0555,
5216
+ "grad_norm": 0.0234055445054481,
5217
+ "learning_rate": 5.548e-06,
5218
+ "loss": 2.4326,
5219
+ "step": 2775
5220
+ },
5221
+ {
5222
+ "epoch": 0.0555,
5223
+ "eval_loss": 2.438048839569092,
5224
+ "eval_runtime": 33.5073,
5225
+ "eval_samples_per_second": 3.492,
5226
+ "eval_steps_per_second": 1.761,
5227
+ "step": 2775
5228
+ },
5229
+ {
5230
+ "epoch": 0.0556,
5231
+ "eval_loss": 2.4379706382751465,
5232
+ "eval_runtime": 33.4567,
5233
+ "eval_samples_per_second": 3.497,
5234
+ "eval_steps_per_second": 1.763,
5235
+ "step": 2780
5236
+ },
5237
+ {
5238
+ "epoch": 0.0557,
5239
+ "eval_loss": 2.4379332065582275,
5240
+ "eval_runtime": 33.5172,
5241
+ "eval_samples_per_second": 3.491,
5242
+ "eval_steps_per_second": 1.76,
5243
+ "step": 2785
5244
+ },
5245
+ {
5246
+ "epoch": 0.0558,
5247
+ "eval_loss": 2.4380111694335938,
5248
+ "eval_runtime": 33.5913,
5249
+ "eval_samples_per_second": 3.483,
5250
+ "eval_steps_per_second": 1.756,
5251
+ "step": 2790
5252
+ },
5253
+ {
5254
+ "epoch": 0.0559,
5255
+ "eval_loss": 2.4379403591156006,
5256
+ "eval_runtime": 33.5223,
5257
+ "eval_samples_per_second": 3.49,
5258
+ "eval_steps_per_second": 1.76,
5259
+ "step": 2795
5260
+ },
5261
+ {
5262
+ "epoch": 0.056,
5263
+ "grad_norm": 0.024691045411267393,
5264
+ "learning_rate": 5.5980000000000004e-06,
5265
+ "loss": 2.4297,
5266
+ "step": 2800
5267
+ },
5268
+ {
5269
+ "epoch": 0.056,
5270
+ "eval_loss": 2.43778657913208,
5271
+ "eval_runtime": 33.524,
5272
+ "eval_samples_per_second": 3.49,
5273
+ "eval_steps_per_second": 1.76,
5274
+ "step": 2800
5275
+ },
5276
+ {
5277
+ "epoch": 0.0561,
5278
+ "eval_loss": 2.4376559257507324,
5279
+ "eval_runtime": 33.58,
5280
+ "eval_samples_per_second": 3.484,
5281
+ "eval_steps_per_second": 1.757,
5282
+ "step": 2805
5283
+ },
5284
+ {
5285
+ "epoch": 0.0562,
5286
+ "eval_loss": 2.437596559524536,
5287
+ "eval_runtime": 33.5756,
5288
+ "eval_samples_per_second": 3.485,
5289
+ "eval_steps_per_second": 1.757,
5290
+ "step": 2810
5291
+ },
5292
+ {
5293
+ "epoch": 0.0563,
5294
+ "eval_loss": 2.437690496444702,
5295
+ "eval_runtime": 33.5056,
5296
+ "eval_samples_per_second": 3.492,
5297
+ "eval_steps_per_second": 1.761,
5298
+ "step": 2815
5299
+ },
5300
+ {
5301
+ "epoch": 0.0564,
5302
+ "eval_loss": 2.437558174133301,
5303
+ "eval_runtime": 33.4948,
5304
+ "eval_samples_per_second": 3.493,
5305
+ "eval_steps_per_second": 1.761,
5306
+ "step": 2820
5307
+ },
5308
+ {
5309
+ "epoch": 0.0565,
5310
+ "grad_norm": 0.02500330428035899,
5311
+ "learning_rate": 5.648e-06,
5312
+ "loss": 2.4281,
5313
+ "step": 2825
5314
+ },
5315
+ {
5316
+ "epoch": 0.0565,
5317
+ "eval_loss": 2.437875747680664,
5318
+ "eval_runtime": 33.4492,
5319
+ "eval_samples_per_second": 3.498,
5320
+ "eval_steps_per_second": 1.764,
5321
+ "step": 2825
5322
+ },
5323
+ {
5324
+ "epoch": 0.0566,
5325
+ "eval_loss": 2.438183546066284,
5326
+ "eval_runtime": 33.5208,
5327
+ "eval_samples_per_second": 3.49,
5328
+ "eval_steps_per_second": 1.76,
5329
+ "step": 2830
5330
+ },
5331
+ {
5332
+ "epoch": 0.0567,
5333
+ "eval_loss": 2.4375228881835938,
5334
+ "eval_runtime": 33.5319,
5335
+ "eval_samples_per_second": 3.489,
5336
+ "eval_steps_per_second": 1.76,
5337
+ "step": 2835
5338
+ },
5339
+ {
5340
+ "epoch": 0.0568,
5341
+ "eval_loss": 2.437365770339966,
5342
+ "eval_runtime": 33.4734,
5343
+ "eval_samples_per_second": 3.495,
5344
+ "eval_steps_per_second": 1.763,
5345
+ "step": 2840
5346
+ },
5347
+ {
5348
+ "epoch": 0.0569,
5349
+ "eval_loss": 2.4376399517059326,
5350
+ "eval_runtime": 33.4578,
5351
+ "eval_samples_per_second": 3.497,
5352
+ "eval_steps_per_second": 1.763,
5353
+ "step": 2845
5354
+ },
5355
+ {
5356
+ "epoch": 0.057,
5357
+ "grad_norm": 0.023953363978697285,
5358
+ "learning_rate": 5.698e-06,
5359
+ "loss": 2.4341,
5360
+ "step": 2850
5361
+ },
5362
+ {
5363
+ "epoch": 0.057,
5364
+ "eval_loss": 2.437318801879883,
5365
+ "eval_runtime": 33.4551,
5366
+ "eval_samples_per_second": 3.497,
5367
+ "eval_steps_per_second": 1.764,
5368
+ "step": 2850
5369
+ },
5370
+ {
5371
+ "epoch": 0.0571,
5372
+ "eval_loss": 2.437349319458008,
5373
+ "eval_runtime": 33.4482,
5374
+ "eval_samples_per_second": 3.498,
5375
+ "eval_steps_per_second": 1.764,
5376
+ "step": 2855
5377
+ },
5378
+ {
5379
+ "epoch": 0.0572,
5380
+ "eval_loss": 2.437500476837158,
5381
+ "eval_runtime": 33.5179,
5382
+ "eval_samples_per_second": 3.491,
5383
+ "eval_steps_per_second": 1.76,
5384
+ "step": 2860
5385
+ },
5386
+ {
5387
+ "epoch": 0.0573,
5388
+ "eval_loss": 2.4371414184570312,
5389
+ "eval_runtime": 33.4246,
5390
+ "eval_samples_per_second": 3.5,
5391
+ "eval_steps_per_second": 1.765,
5392
+ "step": 2865
5393
+ },
5394
+ {
5395
+ "epoch": 0.0574,
5396
+ "eval_loss": 2.4371588230133057,
5397
+ "eval_runtime": 33.5686,
5398
+ "eval_samples_per_second": 3.485,
5399
+ "eval_steps_per_second": 1.758,
5400
+ "step": 2870
5401
+ },
5402
+ {
5403
+ "epoch": 0.0575,
5404
+ "grad_norm": 0.023037224733864405,
5405
+ "learning_rate": 5.748e-06,
5406
+ "loss": 2.4201,
5407
+ "step": 2875
5408
+ },
5409
+ {
5410
+ "epoch": 0.0575,
5411
+ "eval_loss": 2.4373178482055664,
5412
+ "eval_runtime": 33.4813,
5413
+ "eval_samples_per_second": 3.494,
5414
+ "eval_steps_per_second": 1.762,
5415
+ "step": 2875
5416
+ },
5417
+ {
5418
+ "epoch": 0.0576,
5419
+ "eval_loss": 2.4371204376220703,
5420
+ "eval_runtime": 33.5096,
5421
+ "eval_samples_per_second": 3.492,
5422
+ "eval_steps_per_second": 1.761,
5423
+ "step": 2880
5424
+ },
5425
+ {
5426
+ "epoch": 0.0577,
5427
+ "eval_loss": 2.43719482421875,
5428
+ "eval_runtime": 33.4709,
5429
+ "eval_samples_per_second": 3.496,
5430
+ "eval_steps_per_second": 1.763,
5431
+ "step": 2885
5432
+ },
5433
+ {
5434
+ "epoch": 0.0578,
5435
+ "eval_loss": 2.4369635581970215,
5436
+ "eval_runtime": 33.5125,
5437
+ "eval_samples_per_second": 3.491,
5438
+ "eval_steps_per_second": 1.761,
5439
+ "step": 2890
5440
+ },
5441
+ {
5442
+ "epoch": 0.0579,
5443
+ "eval_loss": 2.4367122650146484,
5444
+ "eval_runtime": 33.5349,
5445
+ "eval_samples_per_second": 3.489,
5446
+ "eval_steps_per_second": 1.759,
5447
+ "step": 2895
5448
+ },
5449
+ {
5450
+ "epoch": 0.058,
5451
+ "grad_norm": 0.023843041578218274,
5452
+ "learning_rate": 5.798e-06,
5453
+ "loss": 2.4322,
5454
+ "step": 2900
5455
+ },
5456
+ {
5457
+ "epoch": 0.058,
5458
+ "eval_loss": 2.436885118484497,
5459
+ "eval_runtime": 33.5038,
5460
+ "eval_samples_per_second": 3.492,
5461
+ "eval_steps_per_second": 1.761,
5462
+ "step": 2900
5463
+ },
5464
+ {
5465
+ "epoch": 0.0581,
5466
+ "eval_loss": 2.4368388652801514,
5467
+ "eval_runtime": 33.4337,
5468
+ "eval_samples_per_second": 3.499,
5469
+ "eval_steps_per_second": 1.765,
5470
+ "step": 2905
5471
+ },
5472
+ {
5473
+ "epoch": 0.0582,
5474
+ "eval_loss": 2.436776638031006,
5475
+ "eval_runtime": 33.5783,
5476
+ "eval_samples_per_second": 3.484,
5477
+ "eval_steps_per_second": 1.757,
5478
+ "step": 2910
5479
+ },
5480
+ {
5481
+ "epoch": 0.0583,
5482
+ "eval_loss": 2.4369046688079834,
5483
+ "eval_runtime": 33.5764,
5484
+ "eval_samples_per_second": 3.485,
5485
+ "eval_steps_per_second": 1.757,
5486
+ "step": 2915
5487
+ },
5488
+ {
5489
+ "epoch": 0.0584,
5490
+ "eval_loss": 2.4369351863861084,
5491
+ "eval_runtime": 33.5715,
5492
+ "eval_samples_per_second": 3.485,
5493
+ "eval_steps_per_second": 1.757,
5494
+ "step": 2920
5495
+ },
5496
+ {
5497
+ "epoch": 0.0585,
5498
+ "grad_norm": 0.030212978437899864,
5499
+ "learning_rate": 5.848000000000001e-06,
5500
+ "loss": 2.4318,
5501
+ "step": 2925
5502
+ },
5503
+ {
5504
+ "epoch": 0.0585,
5505
+ "eval_loss": 2.4367170333862305,
5506
+ "eval_runtime": 33.455,
5507
+ "eval_samples_per_second": 3.497,
5508
+ "eval_steps_per_second": 1.764,
5509
+ "step": 2925
5510
+ },
5511
+ {
5512
+ "epoch": 0.0586,
5513
+ "eval_loss": 2.4367101192474365,
5514
+ "eval_runtime": 33.3973,
5515
+ "eval_samples_per_second": 3.503,
5516
+ "eval_steps_per_second": 1.767,
5517
+ "step": 2930
5518
+ },
5519
+ {
5520
+ "epoch": 0.0587,
5521
+ "eval_loss": 2.436723470687866,
5522
+ "eval_runtime": 33.4183,
5523
+ "eval_samples_per_second": 3.501,
5524
+ "eval_steps_per_second": 1.766,
5525
+ "step": 2935
5526
+ },
5527
+ {
5528
+ "epoch": 0.0588,
5529
+ "eval_loss": 2.4368371963500977,
5530
+ "eval_runtime": 33.5269,
5531
+ "eval_samples_per_second": 3.49,
5532
+ "eval_steps_per_second": 1.76,
5533
+ "step": 2940
5534
+ },
5535
+ {
5536
+ "epoch": 0.0589,
5537
+ "eval_loss": 2.436763286590576,
5538
+ "eval_runtime": 33.4623,
5539
+ "eval_samples_per_second": 3.496,
5540
+ "eval_steps_per_second": 1.763,
5541
+ "step": 2945
5542
+ },
5543
+ {
5544
+ "epoch": 0.059,
5545
+ "grad_norm": 0.024293450378328845,
5546
+ "learning_rate": 5.898e-06,
5547
+ "loss": 2.4221,
5548
+ "step": 2950
5549
+ },
5550
+ {
5551
+ "epoch": 0.059,
5552
+ "eval_loss": 2.436692714691162,
5553
+ "eval_runtime": 33.523,
5554
+ "eval_samples_per_second": 3.49,
5555
+ "eval_steps_per_second": 1.76,
5556
+ "step": 2950
5557
+ },
5558
+ {
5559
+ "epoch": 0.0591,
5560
+ "eval_loss": 2.436657667160034,
5561
+ "eval_runtime": 34.902,
5562
+ "eval_samples_per_second": 3.352,
5563
+ "eval_steps_per_second": 1.69,
5564
+ "step": 2955
5565
+ },
5566
+ {
5567
+ "epoch": 0.0592,
5568
+ "eval_loss": 2.436432123184204,
5569
+ "eval_runtime": 33.4808,
5570
+ "eval_samples_per_second": 3.495,
5571
+ "eval_steps_per_second": 1.762,
5572
+ "step": 2960
5573
+ },
5574
+ {
5575
+ "epoch": 0.0593,
5576
+ "eval_loss": 2.436782121658325,
5577
+ "eval_runtime": 34.5166,
5578
+ "eval_samples_per_second": 3.39,
5579
+ "eval_steps_per_second": 1.709,
5580
+ "step": 2965
5581
+ },
5582
+ {
5583
+ "epoch": 0.0594,
5584
+ "eval_loss": 2.4366602897644043,
5585
+ "eval_runtime": 33.7416,
5586
+ "eval_samples_per_second": 3.468,
5587
+ "eval_steps_per_second": 1.749,
5588
+ "step": 2970
5589
+ },
5590
+ {
5591
+ "epoch": 0.0595,
5592
+ "grad_norm": 0.028294127858427973,
5593
+ "learning_rate": 5.9480000000000005e-06,
5594
+ "loss": 2.4196,
5595
+ "step": 2975
5596
+ },
5597
+ {
5598
+ "epoch": 0.0595,
5599
+ "eval_loss": 2.436668872833252,
5600
+ "eval_runtime": 35.1904,
5601
+ "eval_samples_per_second": 3.325,
5602
+ "eval_steps_per_second": 1.677,
5603
+ "step": 2975
5604
+ },
5605
+ {
5606
+ "epoch": 0.0596,
5607
+ "eval_loss": 2.436310052871704,
5608
+ "eval_runtime": 33.583,
5609
+ "eval_samples_per_second": 3.484,
5610
+ "eval_steps_per_second": 1.757,
5611
+ "step": 2980
5612
+ },
5613
+ {
5614
+ "epoch": 0.0597,
5615
+ "eval_loss": 2.4361066818237305,
5616
+ "eval_runtime": 34.1148,
5617
+ "eval_samples_per_second": 3.43,
5618
+ "eval_steps_per_second": 1.729,
5619
+ "step": 2985
5620
+ },
5621
+ {
5622
+ "epoch": 0.0598,
5623
+ "eval_loss": 2.436128854751587,
5624
+ "eval_runtime": 33.7895,
5625
+ "eval_samples_per_second": 3.463,
5626
+ "eval_steps_per_second": 1.746,
5627
+ "step": 2990
5628
+ },
5629
+ {
5630
+ "epoch": 0.0599,
5631
+ "eval_loss": 2.436457872390747,
5632
+ "eval_runtime": 34.0525,
5633
+ "eval_samples_per_second": 3.436,
5634
+ "eval_steps_per_second": 1.733,
5635
+ "step": 2995
5636
+ },
5637
+ {
5638
+ "epoch": 0.06,
5639
+ "grad_norm": 0.02242795270420928,
5640
+ "learning_rate": 5.998000000000001e-06,
5641
+ "loss": 2.4245,
5642
+ "step": 3000
5643
+ },
5644
+ {
5645
+ "epoch": 0.06,
5646
+ "eval_loss": 2.436203718185425,
5647
+ "eval_runtime": 33.6471,
5648
+ "eval_samples_per_second": 3.477,
5649
+ "eval_steps_per_second": 1.753,
5650
+ "step": 3000
5651
  }
5652
  ],
5653
  "logging_steps": 25,
 
5667
  "attributes": {}
5668
  }
5669
  },
5670
+ "total_flos": 8.355905264309764e+18,
5671
  "train_batch_size": 1,
5672
  "trial_name": null,
5673
  "trial_params": null