train_mmlu_42_1767887021

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

Loss: 0.2390
Num Input Tokens Seen: 432909080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.1453	0.5000	22465	0.2841	21668048
0.0164	1.0000	44930	0.2601	43271544
0.1201	1.5000	67395	0.2583	64929544
0.1747	2.0000	89860	0.2406	86578136
0.2354	2.5001	112325	0.2390	108237656
0.4333	3.0001	134790	0.2455	129873976
0.3186	3.5001	157255	0.2611	151501864
0.3711	4.0001	179720	0.2410	173113432
0.0576	4.5001	202185	0.2694	194767064
0.0025	5.0001	224650	0.2571	216425648
0.2605	5.5001	247115	0.2725	238134160
0.002	6.0001	269580	0.2650	259721160
0.2716	6.5001	292045	0.2784	281435688
0.0012	7.0002	314510	0.2829	303030584
0.0016	7.5002	336975	0.2884	324674808
0.1663	8.0002	359440	0.2924	346309560
0.0017	8.5002	381905	0.3036	367987752
0.0021	9.0002	404370	0.3040	389607696
0.0005	9.5002	426835	0.3022	411289104

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 5

Model tree for rbelanec/train_mmlu_42_1767887021

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2369)

this model