babylm-rta10m-gpt2

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 180
training_steps: 18000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.6599	0.0512	200	4.7438	0.3347
4.8062	0.1025	400	4.3328	0.3386
4.5833	0.1537	600	4.1507	0.3710
4.4542	0.2049	800	4.0696	0.3587
4.396	0.2561	1000	4.0014	0.3666
4.2547	0.3074	1200	3.9399	0.3692
4.1936	0.3586	1400	3.8861	0.3749
4.1394	0.4098	1600	3.8499	0.3743
4.033	0.4611	1800	3.7972	0.3793
3.9368	0.5123	2000	3.7842	0.3722
3.1581	1.0246	4000	3.4519	0.4068
2.829	1.5369	6000	3.2556	0.4356
2.6352	2.0492	8000	3.1564	0.4474
2.5791	2.5615	10000	3.1035	0.4487
2.4904	3.0738	12000	3.0579	0.4521
2.4417	3.5861	14000	3.0320	0.4565
2.4018	4.0984	16000	3.0134	0.4573
2.3989	4.6107	18000	3.0038	0.4590

Safetensors

Model size

98.4M params

Tensor type

F32