Multilingual MoE Transformer

A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.

Model Details

  • Architecture: Encoder-Decoder Transformer with MoE routing
  • Languages: English, French, Hindi, Bengali
  • Vocabulary Size: 32,000 tokens
  • Model Dimension: 512
  • Number of Experts: 4
  • Number of Layers: 6
  • Attention Heads: 8

Training

  • Stage: Self-supervised pre-training (Stage 1)
  • Task: Next-token prediction (language modeling)
  • Dataset: Wikipedia data for all 4 languages
  • Final Loss: 2.0218

Usage

import torch
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
checkpoint = torch.load(model_path)

# Load model (you'll need to define the architecture)
model.load_state_dict(checkpoint['model_state_dict'])

Next Steps

This model is ready for Stage 2: fine-tuning on parallel translation data.

Citation

If you use this model, please cite:

@misc{moe-multilingual-translator,
  author = {arka7},
  title = {Multilingual MoE Transformer},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/arka7/moe-multilingual-translator}
}
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for arka7/moe-multilingual-translator

Finetunes
1 model