lapp0's picture
End of training
b7ae88b verified
|
raw
history blame
1.96 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 18151.8379
  • eval_frwikippl: 38363.0352
  • eval_zhwikippl: 56660.7266
  • eval_loss: 0.0004
  • eval_runtime: 0.0556
  • eval_samples_per_second: 17.976
  • eval_steps_per_second: 17.976

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: reverse_kl
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2477 GB

Model Results

eval_ metrics:

enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl epoch step
teacher eval
0 0
0.3030 30
0.6061 60
0.9091 90
1.0 99

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0