A newer version of this model is available: openai/whisper-large-v3

πŸ—£οΈ Whisper Large Lao Fine-tuned

Model Overview

This model is a fine-tuned version of OpenAI's Whisper Large for Lao (ພາΰΊͺΰΊ²ΰΊ₯ΰΊ²ΰΊ§) automatic speech recognition (ASR). It has been fine-tuned on the Phonepadith/laos-speech-dataset, a curated dataset containing Lao speech samples and transcriptions.

🧠 Model Details

Property Description
Base model openai/whisper-v3-large
Fine-tuned by @Phonepadith
Language Lao (lo)
Task Automatic Speech Recognition (ASR)
Framework πŸ€— Transformers, PyTorch
Dataset Phonepadith/laos-speech-dataset
Sampling rate 16 kHz
License MIT (same as base model unless otherwise stated)

πŸ“Š Training Details

  • Fine-tuned on: Lao speech dataset with 7k+ samples
  • Input: 16kHz mono audio
  • Output: Lao text transcription
  • Epochs: 6
  • Batch size: 2
  • Learning rate: 1e-5
  • Optimizer: AdamW
  • Evaluation metric: Word Error Rate (WER)

πŸš€ Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio

# Load model and processor
model_id = "Phonepadith/whisper-3-large-lao-finetuned-v1"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load an audio file (16kHz mono)
speech_array, sampling_rate = torchaudio.load("example.wav")
speech_array = torchaudio.functional.resample(speech_array, sampling_rate, 16000)

# Preprocess and generate transcription
input_features = processor(
    speech_array.squeeze().numpy(), 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

πŸ“ˆ Evaluation Results

Training Metrics

[5000/5000 2:42:10, Epoch 5/5]

Step Training Loss Validation Loss
500 0.124200 0.109733
1000 0.051000 0.055065
1500 0.034800 0.040616
2000 0.023100 0.033179
2500 0.016200 0.027788
3000 0.007700 0.026611
3500 0.007100 0.023043
4000 0.003000 0.021656
4500 0.002200 0.020975
5000 0.001100 0.020395

🧩 Intended Use

This model is designed for speech-to-text transcription in Lao, such as:

  • Voice command systems
  • Lao language learning apps
  • Accessibility tools (subtitles, transcripts)
  • Cultural and linguistic research

⚠️ Limitations

  • May struggle with code-switching (mix of Lao and English)
  • Background noise or strong dialectal accents may reduce accuracy
  • Whisper's built-in tokenizer may occasionally normalize Lao text (tone marks or spacing)

πŸͺͺ Citation

If you use this model in your research, please cite:

@misc{phonepadith2025whisperlao,
  title = {Whisper Large Fine-tuned for Lao ASR},
  author = {Phonepadith Phoummavong},
  year = {2025},
  howpublished = {\url{https://huggingface.co/Phonepadith/whisper-3-large-lao-finetuned-v1}},
}

πŸ’¬ Contact

For questions, collaboration, or dataset contributions:


Note: This model is part of ongoing efforts to improve ASR capabilities for low-resource languages like Lao. Contributions and feedback are welcome!

Downloads last month
192
Safetensors
Model size
2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Phonepadith/whisper-3-large-lao-finetuned-v1

Finetuned
(668)
this model

Dataset used to train Phonepadith/whisper-3-large-lao-finetuned-v1

Space using Phonepadith/whisper-3-large-lao-finetuned-v1 1