qwen3-0.6b-translation-synthetic-reasoning-1
A fine-tuned Qwen3-0.6B model that provides step-by-step reasoning for few-shot translation tasks, particularly focused on low-resource and biblical language pairs.
Model Details
Model Description
This model extends Qwen/Qwen3-0.6B with the ability to perform detailed reasoning during translation. Given a query and several few-shot examples, it explains its translation choices step-by-step, making the process transparent and educational.
- Developed by: dadu
- Model type: Causal Language Model (Fine-tuned for few-shot translation reasoning)
- Language(s): Multi-lingual (specialized in biblical/low-resource language pairs)
- License: Apache 2.0 (following base model)
- Finetuned from model: Qwen/Qwen3-0.6B
Model Sources
Uses
Direct Use
This model is designed for translation tasks where you need:
- Step-by-step reasoning explanations
- Fragment-by-fragment translation analysis
- Reference to linguistic patterns from few-shot examples
- Educational translation methodology for low-resource languages
Out-of-Scope Use
- General conversation (may be overly verbose)
- Real-time translation (generates long explanations)
- Zero-shot translation (performs best with few-shot examples)
- Languages significantly different from training data
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "dadu/qwen3-0.6b-translation-synthetic-reasoning-1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Few-shot examples provide context for the translation style
few_shot_prompt = """
Examples:
source: Aŋkɛ bímbɔ áwúlégé, ɛkiɛ́nné Ɛsɔwɔ ɛ́kwɔ́ Josɛf ushu né gejya, ɛ́jɔɔ́ ne ji ɛké...
target: Jalla nekztanaqui nii magonacaz̈ ojktan tsjii Yooz Jilirz̈ anjilaqui wiiquin Josez̈quiz parisisquichic̈ha...
source: Josɛf ápégé, asɛ maá yimbɔ ne mmá wuú áfɛ́ né mme Isrɛli.
target: Jalla nuz̈ cjen Josequi z̈aaz̈cu Israel yokquin nii uztan maatan chjitchic̈ha.
Query: ɛké “Josɛf, kwilé ka ɔ́kpá maá yina ne mma wuú, ɛnyú dékéré meso né mme Isrɛli. Bɔɔ́ abi ákɛlege manwá ji ágboó.”
"""
messages = [
{"role": "system", "content": "You are a helpful Bible translation assistant. Given examples of language pairs and a query, you will write a high quality translation with reasoning."},
{"role": "user", "content": few_shot_prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
Training Data
- Dataset:
dadu/translation-synthetic-reasoning-1 - Size: ~980 translation examples with detailed reasoning
- Source: Synthetically generated using LLMs with gold standard translations
- Format: Each example includes source text, target translation, and step-by-step reasoning
- Languages: Primarily biblical and low-resource language pairs
- Quality: Filtered to remove corrupted examples
Training Procedure
Training Hyperparameters
- Training regime: Full fine-tuning (not LoRA)
- Context Length: 16,384 tokens
- Epochs: 2
- Batch Size: 8 (1 per device × 8 gradient accumulation)
- Learning Rate: 1.5e-5 with cosine scheduling
- Optimizer: AdamW with gradient clipping (max_grad_norm=1.0)
- Precision: BF16 mixed precision
Methodology
The model was trained on few-shot prompts to:
- Analyze source text fragment by fragment
- Reference similar patterns from the provided few-shot examples
- Explain lexical and grammatical choices based on context
- Provide systematic reasoning before the final translation
Limitations
- Specialized domain: Optimized for biblical/low-resource language translation
- Verbose output: Generates detailed explanations for all translations
- Training data scope: Performance may vary on language pairs not represented in training data
- Few-shot dependency: Works best when provided with relevant few-shot examples
Technical Specifications
Model Architecture
- Base: Qwen/Qwen3-0.6B (transformer decoder)
- Parameters: ~600M
- Context Window: 16,384 tokens
Compute Infrastructure
Hardware
- Training: Google Colab Pro (A100 GPU)
- Memory: High memory configuration for 16K context training
Software
- Framework: Transformers, TRL
- Precision: BF16 mixed precision
- Environment: Google Colab
- Downloads last month
- 5