vLLM-Compatible TranslateGemma 12B
This is a modified version of google/translategemma-12b-it optimized for deployment with vLLM.
No retraining was performed. Only configuration files and the chat template were modified. Model weights are identical to the original.
Why This Exists
As of 2025-01-29, vLLM does not natively support TranslateGemma's custom structured input format. See vllm-project/vllm#32446 for the upstream tracking issue. Until that is merged, this repo provides a workaround by modifying configuration files to make TranslateGemma compatible with vLLM's standard chat API.
Acknowledgements
This conversion is based entirely on the work done by Infomaniak-AI/vllm-translategemma-4b-it. The same conversion approach was applied to the 12B model. Full credit goes to the Infomaniak AI team for figuring out the necessary changes.
Modified Files
The following files were modified from the original google/translategemma-12b-it for vLLM compatibility. All other files (model weights, tokenizer, etc.) are unchanged.
config.json— RoPE parameters simplified for vLLMgeneration_config.json— Added temperature, bos_token_id, max_length; adjusted eos_token_id orderingchat_template.jinja— Replaced structured content format with delimiter-based string format
Changes from Original Model
1. Chat Template
The original TranslateGemma model requires a structured payload with dedicated source_lang_code and target_lang_code fields:
{
"role": "user",
"content": [
{
"type": "text",
"source_lang_code": "cs",
"target_lang_code": "de-DE",
"text": "V nejhorším případě i k prasknutí čočky."
}
]
}
However, vLLM does not support these custom content parameters. To maintain compatibility, the chat template has been modified to encode language codes directly in the message content using a delimiter-based format:
{
"model": "model",
"messages": [
{
"role": "user",
"content": "<<<source>>>cs<<<target>>>de-DE<<<text>>>V nejhorším případě i k prasknutí čočky."
}
]
}
Format: <<<source>>>{source_lang}<<<target>>>{target_lang}<<<text>>>{text_to_translate}
If you need to provide a custom prompt input:
Format: <<<custom>>>{text}
2. Model Configuration (RoPE)
The original model uses the new Transformers RoPE configuration format with separate attention type settings:
"rope_parameters": {
"full_attention": {
"factor": 8.0,
"rope_type": "linear"
},
"sliding_attention": {
"rope_type": "default"
}
}
This has been simplified for vLLM compatibility:
"rope_parameters": {
"factor": 8.0,
"rope_type": "linear"
}
3. Generation Configuration
Added temperature, bos_token_id, and max_length fields. The EOS token ordering was adjusted to [106, 1] for proper sequence termination.
Usage with vLLM
Start the server
vllm serve chbae624/vllm-translategemma-12b-it --max-model-len 4096
Send a translation request
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "chbae624/vllm-translategemma-12b-it",
"messages": [
{
"role": "user",
"content": "<<<source>>>en<<<target>>>ko<<<text>>>The weather is beautiful today."
}
],
"temperature": 0.15,
"max_tokens": 256
}'
Custom prompt mode
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "chbae624/vllm-translategemma-12b-it",
"messages": [
{
"role": "user",
"content": "<<<custom>>>Translate the following Japanese text to English: 今日はいい天気ですね。"
}
],
"temperature": 0.15,
"max_tokens": 256
}'
Supported Languages
TranslateGemma supports translation across 55 languages. See the original model card for the full list.
Notes
- Input/Output length: The model was fine-tuned with ~2K token sequences. Inputs significantly exceeding 2048 tokens may produce degraded translation quality.
- Model weights: Identical to google/translategemma-12b-it. No retraining was performed.
- Conversion method: Follows the exact same approach as Infomaniak-AI/vllm-translategemma-4b-it.
License
This model is subject to the Gemma Terms of Use. See the NOTICE file included in this repository.
Citation
@article{gemmatranslate2026,
title={{TranslateGemma Technical Report}},
url={https://arxiv.org/pdf/2601.09012},
publisher={Google DeepMind},
author={{Google Translate Research Team}},
year={2026}
}
- Downloads last month
- 198
Model tree for chbae624/vllm-translategemma-12b-it
Base model
google/translategemma-12b-it