---
base_model:
- scb10x/typhoon-asr-realtime
language:
- th
library_name: nemo
license: cc-by-4.0
tags:
- pytorch
- NeMo
arxiv: 2601.13044
pipeline_tag: automatic-speech-recognition
---
# Typhoon-isan-asr-realtime
| [](#model-architecture)
| [](#model-architecture)
| [](#datasets)
| [](https://arxiv.org/abs/2601.13044)
**Typhoon Isan ASR Realtime** is a specialized, fine-tuned version of the Typhoon ASR Realtime model, optimized specifically for the Isan dialect of the Thai language. Presented in [Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition](https://huggingface.co/papers/2601.13044), it is built for real-world streaming applications, delivering fast and accurate transcriptions of Isan speech while running efficiently on standard CPUs.
The model is based on [NVIDIA's FastConformer Transducer model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer), which is optimized for low-latency, real-time performance.
**Code / Examples available on [Github](https://github.com/scb-10x/typhoon-asr)**
**Project Page available on [OpenTyphoon](https://opentyphoon.ai/model/typhoon-asr-realtime)**
**Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-asr-realtime-release)**
***
## Usage
You can use the `typhoon-asr` package for easy inference:
```bash
pip install typhoon-asr
```
```python
from typhoon_asr import transcribe
# Basic transcription using the Isan model
result = transcribe("path/to/your_audio.wav", model="scb10x/typhoon-isan-asr-realtime")
print(result['text'])
# With timestamps
result = transcribe("path/to/your_audio.wav", model="scb10x/typhoon-isan-asr-realtime", with_timestamps=True)
print(result['text'])
```
### Performance
**Note on Baseline:** The `scb10x/whisper-medium-slscu-nectec` included in the comparison is a model we fine-tuned specifically for this benchmark using existing dialect data from NECTEC and SLSCU. It serves as a representative baseline for performance based on public data, distinct from the [SLSCU_korat_model](https://huggingface.co/SLSCU/thai-dialect_korat_model) (the prominent previous work for Isan dialect ASR). This helps to determine the clear gap between capabilities derived from previously available resources and the new Typhoon Isan ASR.
### Key Findings
* **Comparable to State-of-the-Art Proprietary Models:** The **Typhoon Isan ASR family** (both the Whisper-based and Realtime variants) demonstrates performance highly competitive with **Gemini-2.5-pro**. This validates that specialized open models can match or exceed the capabilities of large-scale proprietary multimodal systems for dialectal speech recognition.
* **Leading Performance:** The `typhoon-whisper-medium-isan-asr` model achieved the lowest error rate in the benchmark (**0.0885**), outperforming Gemini-2.5-pro (**0.1020**) by a clear margin.
* **Consistency Across Architectures:** The `typhoon-isan-asr-realtime` model follows closely with a CER of **0.1065**. The difference between this model and Gemini-2.5-pro is negligible (< 0.5%), indicating that users can rely on the Typhoon suite for both high-accuracy offline transcription and latency-sensitive realtime applications without compromising on quality compared to commercial APIs.
## Citation
If you use this model in your research or application, please cite our technical report:
```bibtex
@misc{warit2026typhoonasr,
title={Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition},
author={Warit Sirichotedumrong and Adisai Na-Thalang and Potsawee Manakul and Pittawat Taveekitworachai and Sittipong Sripaisarnmongkol and Kunat Pipatanakul},
year={2026},
eprint={2601.13044},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.13044},
}
```
## **Follow us**
**https://twitter.com/opentyphoon**
## **Support**
**https://discord.gg/us5gAYmrxw**