train_mmlu_42_1767887021

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2390
  • Num Input Tokens Seen: 432909080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.1453 0.5000 22465 0.2841 21668048
0.0164 1.0000 44930 0.2601 43271544
0.1201 1.5000 67395 0.2583 64929544
0.1747 2.0000 89860 0.2406 86578136
0.2354 2.5001 112325 0.2390 108237656
0.4333 3.0001 134790 0.2455 129873976
0.3186 3.5001 157255 0.2611 151501864
0.3711 4.0001 179720 0.2410 173113432
0.0576 4.5001 202185 0.2694 194767064
0.0025 5.0001 224650 0.2571 216425648
0.2605 5.5001 247115 0.2725 238134160
0.002 6.0001 269580 0.2650 259721160
0.2716 6.5001 292045 0.2784 281435688
0.0012 7.0002 314510 0.2829 303030584
0.0016 7.5002 336975 0.2884 324674808
0.1663 8.0002 359440 0.2924 346309560
0.0017 8.5002 381905 0.3036 367987752
0.0021 9.0002 404370 0.3040 389607696
0.0005 9.5002 426835 0.3022 411289104

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_42_1767887021

Adapter
(2369)
this model