train_hellaswag_101112_1768397614

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0949
  • Num Input Tokens Seen: 99679008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.149 0.5000 8979 0.1659 5003904
0.0163 1.0001 17958 0.1352 9966400
0.0988 1.5001 26937 0.1139 14965776
0.1067 2.0001 35916 0.1075 19943248
0.008 2.5001 44895 0.0981 24937840
0.4668 3.0002 53874 0.0949 29909872
0.044 3.5002 62853 0.1158 34899056
0.0025 4.0002 71832 0.1086 39882864
0.1509 4.5003 80811 0.1226 44862576
0.0003 5.0003 89790 0.1126 49857616
0.0006 5.5003 98769 0.1238 54837408
0.0003 6.0003 107748 0.1152 59816976
0.1426 6.5004 116727 0.1367 64800688
0.0002 7.0004 125706 0.1442 69781152
0.0006 7.5004 134685 0.1508 74768336
0.0001 8.0004 143664 0.1471 79748128
0.0 8.5005 152643 0.1587 84740496
0.1157 9.0005 161622 0.1552 89717136
0.0002 9.5005 170601 0.1593 94705056

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
109
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_101112_1768397614

Adapter
(2202)
this model