A newer version of this model is available:
openai/whisper-large-v3
π£οΈ Whisper Large Lao Fine-tuned
Model Overview
This model is a fine-tuned version of OpenAI's Whisper Large for Lao (ΰΊΰΊ²ΰΊͺΰΊ²ΰΊ₯ΰΊ²ΰΊ§) automatic speech recognition (ASR). It has been fine-tuned on the Phonepadith/laos-speech-dataset, a curated dataset containing Lao speech samples and transcriptions.
π§ Model Details
| Property | Description |
|---|---|
| Base model | openai/whisper-v3-large |
| Fine-tuned by | @Phonepadith |
| Language | Lao (lo) |
| Task | Automatic Speech Recognition (ASR) |
| Framework | π€ Transformers, PyTorch |
| Dataset | Phonepadith/laos-speech-dataset |
| Sampling rate | 16 kHz |
| License | MIT (same as base model unless otherwise stated) |
π Training Details
- Fine-tuned on: Lao speech dataset with 7k+ samples
- Input: 16kHz mono audio
- Output: Lao text transcription
- Epochs: 6
- Batch size: 2
- Learning rate: 1e-5
- Optimizer: AdamW
- Evaluation metric: Word Error Rate (WER)
π Usage Example
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio
# Load model and processor
model_id = "Phonepadith/whisper-3-large-lao-finetuned-v1"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# Load an audio file (16kHz mono)
speech_array, sampling_rate = torchaudio.load("example.wav")
speech_array = torchaudio.functional.resample(speech_array, sampling_rate, 16000)
# Preprocess and generate transcription
input_features = processor(
speech_array.squeeze().numpy(),
sampling_rate=16000,
return_tensors="pt"
).input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("Transcription:", transcription)
π Evaluation Results
Training Metrics
[5000/5000 2:42:10, Epoch 5/5]
| Step | Training Loss | Validation Loss |
|---|---|---|
| 500 | 0.124200 | 0.109733 |
| 1000 | 0.051000 | 0.055065 |
| 1500 | 0.034800 | 0.040616 |
| 2000 | 0.023100 | 0.033179 |
| 2500 | 0.016200 | 0.027788 |
| 3000 | 0.007700 | 0.026611 |
| 3500 | 0.007100 | 0.023043 |
| 4000 | 0.003000 | 0.021656 |
| 4500 | 0.002200 | 0.020975 |
| 5000 | 0.001100 | 0.020395 |
π§© Intended Use
This model is designed for speech-to-text transcription in Lao, such as:
- Voice command systems
- Lao language learning apps
- Accessibility tools (subtitles, transcripts)
- Cultural and linguistic research
β οΈ Limitations
- May struggle with code-switching (mix of Lao and English)
- Background noise or strong dialectal accents may reduce accuracy
- Whisper's built-in tokenizer may occasionally normalize Lao text (tone marks or spacing)
πͺͺ Citation
If you use this model in your research, please cite:
@misc{phonepadith2025whisperlao,
title = {Whisper Large Fine-tuned for Lao ASR},
author = {Phonepadith Phoummavong},
year = {2025},
howpublished = {\url{https://huggingface.co/Phonepadith/whisper-3-large-lao-finetuned-v1}},
}
π¬ Contact
For questions, collaboration, or dataset contributions:
- π§ Email: [email protected]
- π€ Hugging Face Profile
Note: This model is part of ongoing efforts to improve ASR capabilities for low-resource languages like Lao. Contributions and feedback are welcome!
- Downloads last month
- 192
Model tree for Phonepadith/whisper-3-large-lao-finetuned-v1
Base model
openai/whisper-large-v3