A newer version of this model is available: openai/whisper-large-v3

🗣️ Whisper Large Lao Fine-tuned

Model Overview

This model is a fine-tuned version of OpenAI's Whisper Large for Lao (ພາສາລາວ) automatic speech recognition (ASR). It has been fine-tuned on the Phonepadith/laos-speech-dataset, a curated dataset containing Lao speech samples and transcriptions.

🧠 Model Details

Property	Description
Base model	`openai/whisper-v3-large`
Fine-tuned by	@Phonepadith
Language	Lao (lo)
Task	Automatic Speech Recognition (ASR)
Framework	🤗 Transformers, PyTorch
Dataset	`Phonepadith/laos-speech-dataset`
Sampling rate	16 kHz
License	MIT (same as base model unless otherwise stated)

📊 Training Details

Fine-tuned on: Lao speech dataset with 7k+ samples
Input: 16kHz mono audio
Output: Lao text transcription
Epochs: 6
Batch size: 2
Learning rate: 1e-5
Optimizer: AdamW
Evaluation metric: Word Error Rate (WER)

🚀 Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio

# Load model and processor
model_id = "Phonepadith/whisper-3-large-lao-finetuned-v1"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load an audio file (16kHz mono)
speech_array, sampling_rate = torchaudio.load("example.wav")
speech_array = torchaudio.functional.resample(speech_array, sampling_rate, 16000)

# Preprocess and generate transcription
input_features = processor(
    speech_array.squeeze().numpy(), 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

📈 Evaluation Results

Training Metrics

[5000/5000 2:42:10, Epoch 5/5]

Step	Training Loss	Validation Loss
500	0.124200	0.109733
1000	0.051000	0.055065
1500	0.034800	0.040616
2000	0.023100	0.033179
2500	0.016200	0.027788
3000	0.007700	0.026611
3500	0.007100	0.023043
4000	0.003000	0.021656
4500	0.002200	0.020975
5000	0.001100	0.020395

🧩 Intended Use

This model is designed for speech-to-text transcription in Lao, such as:

Voice command systems
Lao language learning apps
Accessibility tools (subtitles, transcripts)
Cultural and linguistic research

⚠️ Limitations

May struggle with code-switching (mix of Lao and English)
Background noise or strong dialectal accents may reduce accuracy
Whisper's built-in tokenizer may occasionally normalize Lao text (tone marks or spacing)

🪪 Citation

If you use this model in your research, please cite:

@misc{phonepadith2025whisperlao,
  title = {Whisper Large Fine-tuned for Lao ASR},
  author = {Phonepadith Phoummavong},
  year = {2025},
  howpublished = {\url{https://huggingface.co/Phonepadith/whisper-3-large-lao-finetuned-v1}},
}

💬 Contact

For questions, collaboration, or dataset contributions:

📧 Email: [email protected]
🤗 Hugging Face Profile

Note: This model is part of ongoing efforts to improve ASR capabilities for low-resource languages like Lao. Contributions and feedback are welcome!

Downloads last month: 192

Safetensors

Model size

2B params

Tensor type

F32

Model tree for Phonepadith/whisper-3-large-lao-finetuned-v1

Base model

openai/whisper-large-v3

Finetuned

(668)

this model

Phonepadith
/

whisper-3-large-lao-finetuned-v1