vLLM-Compatible TranslateGemma 12B

This is a modified version of google/translategemma-12b-it optimized for deployment with vLLM.

No retraining was performed. Only configuration files and the chat template were modified. Model weights are identical to the original.

Why This Exists

As of 2025-01-29, vLLM does not natively support TranslateGemma's custom structured input format. See vllm-project/vllm#32446 for the upstream tracking issue. Until that is merged, this repo provides a workaround by modifying configuration files to make TranslateGemma compatible with vLLM's standard chat API.

Acknowledgements

This conversion is based entirely on the work done by Infomaniak-AI/vllm-translategemma-4b-it. The same conversion approach was applied to the 12B model. Full credit goes to the Infomaniak AI team for figuring out the necessary changes.

Modified Files

The following files were modified from the original google/translategemma-12b-it for vLLM compatibility. All other files (model weights, tokenizer, etc.) are unchanged.

  • config.json — RoPE parameters simplified for vLLM
  • generation_config.json — Added temperature, bos_token_id, max_length; adjusted eos_token_id ordering
  • chat_template.jinja — Replaced structured content format with delimiter-based string format

Changes from Original Model

1. Chat Template

The original TranslateGemma model requires a structured payload with dedicated source_lang_code and target_lang_code fields:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "source_lang_code": "cs",
      "target_lang_code": "de-DE",
      "text": "V nejhorším případě i k prasknutí čočky."
    }
  ]
}

However, vLLM does not support these custom content parameters. To maintain compatibility, the chat template has been modified to encode language codes directly in the message content using a delimiter-based format:

{
  "model": "model",
  "messages": [
    {
      "role": "user",
      "content": "<<<source>>>cs<<<target>>>de-DE<<<text>>>V nejhorším případě i k prasknutí čočky."
    }
  ]
}

Format: <<<source>>>{source_lang}<<<target>>>{target_lang}<<<text>>>{text_to_translate}

If you need to provide a custom prompt input:

Format: <<<custom>>>{text}

2. Model Configuration (RoPE)

The original model uses the new Transformers RoPE configuration format with separate attention type settings:

"rope_parameters": {
  "full_attention": {
    "factor": 8.0,
    "rope_type": "linear"
  },
  "sliding_attention": {
    "rope_type": "default"
  }
}

This has been simplified for vLLM compatibility:

"rope_parameters": {
  "factor": 8.0,
  "rope_type": "linear"
}

3. Generation Configuration

Added temperature, bos_token_id, and max_length fields. The EOS token ordering was adjusted to [106, 1] for proper sequence termination.

Usage with vLLM

Start the server

vllm serve chbae624/vllm-translategemma-12b-it --max-model-len 4096

Send a translation request

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chbae624/vllm-translategemma-12b-it",
    "messages": [
      {
        "role": "user",
        "content": "<<<source>>>en<<<target>>>ko<<<text>>>The weather is beautiful today."
      }
    ],
    "temperature": 0.15,
    "max_tokens": 256
  }'

Custom prompt mode

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chbae624/vllm-translategemma-12b-it",
    "messages": [
      {
        "role": "user",
        "content": "<<<custom>>>Translate the following Japanese text to English: 今日はいい天気ですね。"
      }
    ],
    "temperature": 0.15,
    "max_tokens": 256
  }'

Supported Languages

TranslateGemma supports translation across 55 languages. See the original model card for the full list.

Notes

  • Input/Output length: The model was fine-tuned with ~2K token sequences. Inputs significantly exceeding 2048 tokens may produce degraded translation quality.
  • Model weights: Identical to google/translategemma-12b-it. No retraining was performed.
  • Conversion method: Follows the exact same approach as Infomaniak-AI/vllm-translategemma-4b-it.

License

This model is subject to the Gemma Terms of Use. See the NOTICE file included in this repository.

Citation

@article{gemmatranslate2026,
    title={{TranslateGemma Technical Report}},
    url={https://arxiv.org/pdf/2601.09012},
    publisher={Google DeepMind},
    author={{Google Translate Research Team}},
    year={2026}
}
Downloads last month
198
Safetensors
Model size
13B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chbae624/vllm-translategemma-12b-it

Finetuned
(4)
this model

Paper for chbae624/vllm-translategemma-12b-it