Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models
Abstract
A minimal post-training approach using supervised fine-tuning, on-policy distillation, and small-scale reinforcement fine-tuning enables the development of high-quality sovereign language models with reduced resource requirements.
Large language models (LLMs) have progressed rapidly; however, most state-of-the-art models are trained and evaluated primarily in high-resource languages such as English and Chinese, and are often developed by a small number of organizations with access to large-scale compute and data. This gatekeeping creates a practical barrier for sovereign settings in which a regional- or national-scale institution or domain owner must retain control and understanding of model weights, training data, and deployment while operating under limited resources and strict transparency constraints. To this end, we identify two core requirements: (1) adoptability, the ability to transform a base model into a general-purpose assistant, and (2) sovereign capability, the ability to perform high-stakes, region-specific tasks (e.g., legal reasoning in local languages and cultural knowledge). We investigate whether these requirements can be achieved without scaling massive instruction corpora or relying on complex preference tuning pipelines and large-scale reinforcement fine-tuning (RFT). We present Typhoon S, a minimal and open post-training recipe that combines supervised fine-tuning, on-policy distillation, and small-scale RFT. Using Thai as a representative case study, we demonstrate that our approach transforms both sovereign-adapted and general-purpose base models into instruction-tuned models with strong general performance. We further show that small-scale RFT with InK-GRPO -- an extension of GRPO that augments the GRPO loss with a next-word prediction loss -- improves Thai legal reasoning and Thai-specific knowledge while preserving general capabilities. Our results suggest that a carefully designed post-training strategy can reduce the required scale of instruction data and computation, providing a practical path toward high-quality sovereign LLMs under academic-scale resources.
Community
Large language models (LLMs) have progressed rapidly; however, most state-of-the-art models are
trained and evaluated primarily in high-resource languages such as English and Chinese. In addition, they are often developed by a small number of organizations with access to large-scale compute
and data. This gatekeeping creates a practical barrier for sovereign settings in which a regional- or
national-scale institution or domain owner must retain control and understanding of model weights,
training data, and deployment while operating under limited resources and strict transparency constraints. To this end, we identify two core requirements: (1) adoptability, the ability to transform
a base model into a general-purpose assistant, and (2) sovereign capability, the ability to perform
high-stakes, region-specific tasks (e.g., legal reasoning in local languages and cultural knowledge).
We investigate whether these requirements can be achieved without scaling massive general-purpose
instruction corpora or relying on complex preference tuning pipelines and large-scale reinforcement
fine-tuning (RFT). We present Typhoon S, a minimal and open post-training recipe that combines supervised fine-tuning, on-policy distillation, and small-scale RFT stages. Using Thai as a
representative case study, we demonstrate that our approach successfully addresses adoptability by
transforming both sovereign-adapted and general-purpose base models into instruction-tuned models
with strong general performance. We further show that small-scale RFT with InK-GRPO–an extension of GRPO that augments the GRPO loss with a next-word prediction loss–enables sovereign
capability by improving Thai legal reasoning and Thai-specific knowledge while preserving general
capabilities. Our results suggest that a carefully designed post-training strategy can reduce the
required scale of instruction data and computation, providing a practical path toward high-quality
sovereign LLMs under academic-scale resources (approximately two days of 8-GPU training for an
8B model for adoptability, and one day of 4-GPU training for sovereign capability).
Models citing this paper 2
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper