qwen3-0.6b-translation-synthetic-reasoning-1

A fine-tuned Qwen3-0.6B model that provides step-by-step reasoning for few-shot translation tasks, particularly focused on low-resource and biblical language pairs.

Model Details

Model Description

This model extends Qwen/Qwen3-0.6B with the ability to perform detailed reasoning during translation. Given a query and several few-shot examples, it explains its translation choices step-by-step, making the process transparent and educational.

  • Developed by: dadu
  • Model type: Causal Language Model (Fine-tuned for few-shot translation reasoning)
  • Language(s): Multi-lingual (specialized in biblical/low-resource language pairs)
  • License: Apache 2.0 (following base model)
  • Finetuned from model: Qwen/Qwen3-0.6B

Model Sources

Uses

Direct Use

This model is designed for translation tasks where you need:

  • Step-by-step reasoning explanations
  • Fragment-by-fragment translation analysis
  • Reference to linguistic patterns from few-shot examples
  • Educational translation methodology for low-resource languages

Out-of-Scope Use

  • General conversation (may be overly verbose)
  • Real-time translation (generates long explanations)
  • Zero-shot translation (performs best with few-shot examples)
  • Languages significantly different from training data

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "dadu/qwen3-0.6b-translation-synthetic-reasoning-1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Few-shot examples provide context for the translation style
few_shot_prompt = """
Examples:

source: Aŋkɛ bímbɔ áwúlégé, ɛkiɛ́nné Ɛsɔwɔ ɛ́kwɔ́ Josɛf ushu né gejya, ɛ́jɔɔ́ ne ji ɛké...
target: Jalla nekztanaqui nii magonacaz̈ ojktan tsjii Yooz Jilirz̈ anjilaqui wiiquin Josez̈quiz parisisquichic̈ha...

source: Josɛf ápégé, asɛ maá yimbɔ ne mmá wuú áfɛ́ né mme Isrɛli.
target: Jalla nuz̈ cjen Josequi z̈aaz̈cu Israel yokquin nii uztan maatan chjitchic̈ha.

Query: ɛké “Josɛf, kwilé ka ɔ́kpá maá yina ne mma wuú, ɛnyú dékéré meso né mme Isrɛli. Bɔɔ́ abi ákɛlege manwá ji ágboó.”
"""

messages = [
    {"role": "system", "content": "You are a helpful Bible translation assistant. Given examples of language pairs and a query, you will write a high quality translation with reasoning."},
    {"role": "user", "content": few_shot_prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

  • Dataset: dadu/translation-synthetic-reasoning-1
  • Size: ~980 translation examples with detailed reasoning
  • Source: Synthetically generated using LLMs with gold standard translations
  • Format: Each example includes source text, target translation, and step-by-step reasoning
  • Languages: Primarily biblical and low-resource language pairs
  • Quality: Filtered to remove corrupted examples

Training Procedure

Training Hyperparameters

  • Training regime: Full fine-tuning (not LoRA)
  • Context Length: 16,384 tokens
  • Epochs: 2
  • Batch Size: 8 (1 per device × 8 gradient accumulation)
  • Learning Rate: 1.5e-5 with cosine scheduling
  • Optimizer: AdamW with gradient clipping (max_grad_norm=1.0)
  • Precision: BF16 mixed precision

Methodology

The model was trained on few-shot prompts to:

  1. Analyze source text fragment by fragment
  2. Reference similar patterns from the provided few-shot examples
  3. Explain lexical and grammatical choices based on context
  4. Provide systematic reasoning before the final translation

Limitations

  • Specialized domain: Optimized for biblical/low-resource language translation
  • Verbose output: Generates detailed explanations for all translations
  • Training data scope: Performance may vary on language pairs not represented in training data
  • Few-shot dependency: Works best when provided with relevant few-shot examples

Technical Specifications

Model Architecture

  • Base: Qwen/Qwen3-0.6B (transformer decoder)
  • Parameters: ~600M
  • Context Window: 16,384 tokens

Compute Infrastructure

Hardware

  • Training: Google Colab Pro (A100 GPU)
  • Memory: High memory configuration for 16K context training

Software

  • Framework: Transformers, TRL
  • Precision: BF16 mixed precision
  • Environment: Google Colab
Downloads last month
5
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dadu/qwen3-0.6b-translation-synthetic-reasoning-1

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(548)
this model