--- base_model: - scb10x/typhoon-asr-realtime language: - th library_name: nemo license: cc-by-4.0 tags: - pytorch - NeMo arxiv: 2601.13044 pipeline_tag: automatic-speech-recognition --- # Typhoon-isan-asr-realtime | [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--Transducer-lightgrey#model-badge)](#model-architecture) | [![Model size](https://img.shields.io/badge/Params-114M-lightgrey#model-badge)](#model-architecture) | [![Language](https://img.shields.io/badge/Language-th-lightgrey#model-badge)](#datasets) | [![arXiv](https://img.shields.io/badge/arXiv-2601.13044-b31b1b.svg)](https://arxiv.org/abs/2601.13044) **Typhoon Isan ASR Realtime** is a specialized, fine-tuned version of the Typhoon ASR Realtime model, optimized specifically for the Isan dialect of the Thai language. Presented in [Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition](https://huggingface.co/papers/2601.13044), it is built for real-world streaming applications, delivering fast and accurate transcriptions of Isan speech while running efficiently on standard CPUs. The model is based on [NVIDIA's FastConformer Transducer model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer), which is optimized for low-latency, real-time performance. **Code / Examples available on [Github](https://github.com/scb-10x/typhoon-asr)** **Project Page available on [OpenTyphoon](https://opentyphoon.ai/model/typhoon-asr-realtime)** **Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-asr-realtime-release)** *** ## Usage You can use the `typhoon-asr` package for easy inference: ```bash pip install typhoon-asr ``` ```python from typhoon_asr import transcribe # Basic transcription using the Isan model result = transcribe("path/to/your_audio.wav", model="scb10x/typhoon-isan-asr-realtime") print(result['text']) # With timestamps result = transcribe("path/to/your_audio.wav", model="scb10x/typhoon-isan-asr-realtime", with_timestamps=True) print(result['text']) ``` ### Performance cer comparison

**Note on Baseline:** The `scb10x/whisper-medium-slscu-nectec` included in the comparison is a model we fine-tuned specifically for this benchmark using existing dialect data from NECTEC and SLSCU. It serves as a representative baseline for performance based on public data, distinct from the [SLSCU_korat_model](https://huggingface.co/SLSCU/thai-dialect_korat_model) (the prominent previous work for Isan dialect ASR). This helps to determine the clear gap between capabilities derived from previously available resources and the new Typhoon Isan ASR. ### Key Findings * **Comparable to State-of-the-Art Proprietary Models:** The **Typhoon Isan ASR family** (both the Whisper-based and Realtime variants) demonstrates performance highly competitive with **Gemini-2.5-pro**. This validates that specialized open models can match or exceed the capabilities of large-scale proprietary multimodal systems for dialectal speech recognition. * **Leading Performance:** The `typhoon-whisper-medium-isan-asr` model achieved the lowest error rate in the benchmark (**0.0885**), outperforming Gemini-2.5-pro (**0.1020**) by a clear margin. * **Consistency Across Architectures:** The `typhoon-isan-asr-realtime` model follows closely with a CER of **0.1065**. The difference between this model and Gemini-2.5-pro is negligible (< 0.5%), indicating that users can rely on the Typhoon suite for both high-accuracy offline transcription and latency-sensitive realtime applications without compromising on quality compared to commercial APIs. ## Citation If you use this model in your research or application, please cite our technical report: ```bibtex @misc{warit2026typhoonasr, title={Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition}, author={Warit Sirichotedumrong and Adisai Na-Thalang and Potsawee Manakul and Pittawat Taveekitworachai and Sittipong Sripaisarnmongkol and Kunat Pipatanakul}, year={2026}, eprint={2601.13044}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2601.13044}, } ``` ## **Follow us** **https://twitter.com/opentyphoon** ## **Support** **https://discord.gg/us5gAYmrxw**