Multilingual MoE Transformer
A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.
Model Details
- Architecture: Encoder-Decoder Transformer with MoE routing
- Languages: English, French, Hindi, Bengali
- Vocabulary Size: 32,000 tokens
- Model Dimension: 512
- Number of Experts: 4
- Number of Layers: 6
- Attention Heads: 8
Training
- Stage: Self-supervised pre-training (Stage 1)
- Task: Next-token prediction (language modeling)
- Dataset: Wikipedia data for all 4 languages
- Final Loss: 2.0218
Usage
import torch
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
checkpoint = torch.load(model_path)
# Load model (you'll need to define the architecture)
model.load_state_dict(checkpoint['model_state_dict'])
Next Steps
This model is ready for Stage 2: fine-tuning on parallel translation data.
Citation
If you use this model, please cite:
@misc{moe-multilingual-translator,
author = {arka7},
title = {Multilingual MoE Transformer},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/arka7/moe-multilingual-translator}
}
- Downloads last month
- 20