End of training

b7ae88b verified over 1 year ago

1.96 kB

base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

eval_enwikippl: 18151.8379
eval_frwikippl: 38363.0352
eval_zhwikippl: 56660.7266
eval_loss: 0.0004
eval_runtime: 0.0556
eval_samples_per_second: 17.976
eval_steps_per_second: 17.976

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

distillation_strategy: logits_activations
loss_fn: reverse_kl
train_embeddings: True
learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2477 GB

Model Results

eval_ metrics:

enwikippl	frwikippl	loss	runtime	samples_per_second	steps_per_second	zhwikippl	epoch	step
								teacher eval
							0	0
							0.3030	30
							0.6061	60
							0.9091	90
							1.0	99

Framework versions

Distily 0.1.0
Transformers 4.43.3
Pytorch 2.3.0
Datasets 2.20.0