train_hellaswag_101112_1768397614

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

Loss: 0.0949
Num Input Tokens Seen: 99679008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.149	0.5000	8979	0.1659	5003904
0.0163	1.0001	17958	0.1352	9966400
0.0988	1.5001	26937	0.1139	14965776
0.1067	2.0001	35916	0.1075	19943248
0.008	2.5001	44895	0.0981	24937840
0.4668	3.0002	53874	0.0949	29909872
0.044	3.5002	62853	0.1158	34899056
0.0025	4.0002	71832	0.1086	39882864
0.1509	4.5003	80811	0.1226	44862576
0.0003	5.0003	89790	0.1126	49857616
0.0006	5.5003	98769	0.1238	54837408
0.0003	6.0003	107748	0.1152	59816976
0.1426	6.5004	116727	0.1367	64800688
0.0002	7.0004	125706	0.1442	69781152
0.0006	7.5004	134685	0.1508	74768336
0.0001	8.0004	143664	0.1471	79748128
0.0	8.5005	152643	0.1587	84740496
0.1157	9.0005	161622	0.1552	89717136
0.0002	9.5005	170601	0.1593	94705056

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 109

Model tree for rbelanec/train_hellaswag_101112_1768397614

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2202)

this model