--- library_name: transformers language: - en - th base_model: - Qwen/Qwen2.5-VL-3B tags: - OCR - vision-language - document-understanding - multilingual - QAT license: apache-2.0 --- # Typhoon-OCR-1.5-3B-QAT A quantization-aware trained (QAT) version of [**Typhoon OCR v1.5**](https://huggingface.co/scb10x/typhoon-ocr1.5-2b), designed for robust and efficient on-device vision-language OCR in English and Thai. This release maintains strong accuracy while significantly improving performance when running under low-bit quantization (e.g., 4-bit), making it ideal for lightweight environments. This model is released in **bfloat16** and is intended to be used as the **pre-quantization base** before converting to low-bit formats. For the 4-bit model, please use the Ollama build here: **https://ollama.com/scb10x/typhoon-ocr1.5-3b** QAT is applied on top of **Qwen2.5-VL-3B**, enabling improved stability and reduced degradation when deployed below 16-bit precision. **4-bit Ollama version:** https://ollama.com/scb10x/typhoon-ocr1.5-3b **Base FP16 model:** https://huggingface.co/scb10x/typhoon-ocr1.5-2b **Try our demo available on [Demo](https://ocr.opentyphoon.ai/)** **Code / Examples available on [Github](https://github.com/scb-10x/typhoon-ocr)** **Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)** --- ## Highlights - **Quantization-Aware Training (QAT):** Maintains strong OCR accuracy even under aggressive quantization. - **Optimized for On-Device Inference:** Faster and more consistent performance on low-resource hardware. - **Enhanced Handwriting & Form Parsing:** Retains the v1.5 improvements in handling handwritten notes, forms, irregular layouts, and structured documents. - **Supports Text-Rich & Image-Rich Documents:** Effective on tables, diagrams, annotated pages, charts, receipts, and dense reports. - **Thai + English Multilingual OCR:** Trained for reliable extraction across bilingual real-world documents. --- ## Intended Use This is a **task-specific OCR model** and is intended to be used **only with the provided prompt format**. It does **not** include general VQA or safety guardrails. Some hallucination may still occur, and users should validate outputs for production scenarios. --- ## Quick Links - **Demo:** https://ocr.opentyphoon.ai - **Code / Examples:** https://github.com/scb-10x/typhoon-ocr - **Release Blog:** https://opentyphoon.ai/blog/en/typhoon-ocr-release --- ## Prompting ```python prompt = """Extract all text from the image. Instructions: - Only return the clean Markdown. - Do not include any explanation or extra text. - You must include all information on the page. Formatting Rules: - Tables: Render tables using