Upload README.md with huggingface_hub

d962ce0 verified 2 days ago

3.5 kB

library_name: transformers
language:
  - ar
license: apache-2.0
base_model: openai/whisper-base
tags:
  - generated_from_trainer
datasets:
  - google/fleurs
  - fixie-ai/common_voice_17_0
  - UBC-NLP/Casablanca
  - ymoslem/MediaSpeech
  - deepdml/Tunisian_MSA
metrics:
  - wer
model-index:
  - name: Whisper Base ar
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 17.0
          type: google/fleurs
        metrics:
          - name: Wer
            type: wer
            value: 40.550118433374344

Whisper Base ar

This model is a fine-tuned version of openai/whisper-base on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

Loss: 0.5179
Wer: 40.5501
Cer: 13.2382

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.04
training_steps: 18000

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.7397	0.0556	1000	0.6305	54.8668	18.9365
0.3962	0.1111	2000	0.5805	50.5793	16.9481
0.1913	0.1667	3000	0.5593	48.8019	16.2853
0.1031	0.2222	4000	0.5390	46.7766	15.6262
0.0743	0.2778	5000	0.5193	46.1321	15.5048
0.0463	0.3333	6000	0.5074	44.1857	14.5137
0.0296	1.0197	7000	0.5135	43.6074	14.0715
0.0288	1.0752	8000	0.5119	43.6514	14.6808
0.0232	1.1308	9000	0.4999	41.8538	13.6624
0.022	1.1863	10000	0.4930	41.8813	13.6632
0.0226	1.2419	11000	0.4779	41.8208	13.8859
0.0213	1.2974	12000	0.4795	41.0569	13.3648
0.0194	1.353	13000	0.4831	41.0881	13.3223
0.0148	2.0393	14000	0.5064	41.2644	13.4050
0.0131	2.0949	15000	0.5116	41.2570	13.5709
0.0116	2.1504	16000	0.5102	40.6860	13.2589
0.0088	2.206	17000	0.5196	40.4859	13.2482
0.0129	2.2616	18000	0.5179	40.5501	13.2382

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.6.0
Tokenizers 0.21.0

Citation

Please cite the model using the following BibTeX entry:

@misc{deepdml/whisper-base-ar-mix-norm,
      title={Fine-tuned Whisper base ASR model for speech recognition in Arabic},
      author={Jimenez, David},
      howpublished={\url{https://huggingface.co/deepdml/whisper-base-ar-mix-norm}},
      year={2026}
    }