qwen3-0.6b-translation-synthetic-reasoning-1

A fine-tuned Qwen3-0.6B model that provides step-by-step reasoning for few-shot translation tasks, particularly focused on low-resource and biblical language pairs.

Model Details

Model Description

This model extends Qwen/Qwen3-0.6B with the ability to perform detailed reasoning during translation. Given a query and several few-shot examples, it explains its translation choices step-by-step, making the process transparent and educational.

Developed by: dadu
Model type: Causal Language Model (Fine-tuned for few-shot translation reasoning)
Language(s): Multi-lingual (specialized in biblical/low-resource language pairs)
License: Apache 2.0 (following base model)
Finetuned from model: Qwen/Qwen3-0.6B

Model Sources

Repository: dadu/qwen3-0.6b-translation-synthetic-reasoning-1

Uses

Direct Use

This model is designed for translation tasks where you need:

Step-by-step reasoning explanations
Fragment-by-fragment translation analysis
Reference to linguistic patterns from few-shot examples
Educational translation methodology for low-resource languages

Out-of-Scope Use

General conversation (may be overly verbose)
Real-time translation (generates long explanations)
Zero-shot translation (performs best with few-shot examples)
Languages significantly different from training data

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "dadu/qwen3-0.6b-translation-synthetic-reasoning-1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Few-shot examples provide context for the translation style
few_shot_prompt = """
Examples:

source: Aŋkɛ bímbɔ áwúlégé, ɛkiɛ́nné Ɛsɔwɔ ɛ́kwɔ́ Josɛf ushu né gejya, ɛ́jɔɔ́ ne ji ɛké...
target: Jalla nekztanaqui nii magonacaz̈ ojktan tsjii Yooz Jilirz̈ anjilaqui wiiquin Josez̈quiz parisisquichic̈ha...

source: Josɛf ápégé, asɛ maá yimbɔ ne mmá wuú áfɛ́ né mme Isrɛli.
target: Jalla nuz̈ cjen Josequi z̈aaz̈cu Israel yokquin nii uztan maatan chjitchic̈ha.

Query: ɛké “Josɛf, kwilé ka ɔ́kpá maá yina ne mma wuú, ɛnyú dékéré meso né mme Isrɛli. Bɔɔ́ abi ákɛlege manwá ji ágboó.”
"""

messages = [
    {"role": "system", "content": "You are a helpful Bible translation assistant. Given examples of language pairs and a query, you will write a high quality translation with reasoning."},
    {"role": "user", "content": few_shot_prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

Dataset: dadu/translation-synthetic-reasoning-1
Size: ~980 translation examples with detailed reasoning
Source: Synthetically generated using LLMs with gold standard translations
Format: Each example includes source text, target translation, and step-by-step reasoning
Languages: Primarily biblical and low-resource language pairs
Quality: Filtered to remove corrupted examples

Training Procedure

Training Hyperparameters

Training regime: Full fine-tuning (not LoRA)
Context Length: 16,384 tokens
Epochs: 2
Batch Size: 8 (1 per device × 8 gradient accumulation)
Learning Rate: 1.5e-5 with cosine scheduling
Optimizer: AdamW with gradient clipping (max_grad_norm=1.0)
Precision: BF16 mixed precision

Methodology

The model was trained on few-shot prompts to:

Analyze source text fragment by fragment
Reference similar patterns from the provided few-shot examples
Explain lexical and grammatical choices based on context
Provide systematic reasoning before the final translation

Limitations

Specialized domain: Optimized for biblical/low-resource language translation
Verbose output: Generates detailed explanations for all translations
Training data scope: Performance may vary on language pairs not represented in training data
Few-shot dependency: Works best when provided with relevant few-shot examples

Technical Specifications

Model Architecture

Base: Qwen/Qwen3-0.6B (transformer decoder)
Parameters: ~600M
Context Window: 16,384 tokens

Compute Infrastructure

Hardware

Training: Google Colab Pro (A100 GPU)
Memory: High memory configuration for 16K context training

Software

Framework: Transformers, TRL
Precision: BF16 mixed precision
Environment: Google Colab

Downloads last month: 5

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for dadu/qwen3-0.6b-translation-synthetic-reasoning-1

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(548)

this model