π BART Emoji Translator
A fine-tuned BART model that translates English text to emoji sequences using curriculum learning and LoRA.
Model Description
This model converts natural language text into appropriate emoji representations. It was trained using a 6-stage curriculum learning approach with custom data retention strategies.
Base Model: facebook/bart-large
Training Method: LoRA (Low-Rank Adaptation)
Usage
from transformers import BartTokenizer, BartForConditionalGeneration
from peft import PeftModel
# Load model
base_model = BartForConditionalGeneration.from_pretrained("facebook/bart-large")
model = PeftModel.from_pretrained(base_model, "mohamedmostafa259/bart-emoji-translator")
tokenizer = BartTokenizer.from_pretrained("mohamedmostafa259/bart-emoji-translator")
# Translate text to emojis
def translate(
text: str,
max_length: int = 32,
num_beams: int = 1,
do_sample: bool = True,
temperature: float = 1.0,
top_p: float = 0.4,
top_k: int = 50
) -> str:
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=max_length,
num_beams=num_beams,
do_sample=do_sample,
temperature=temperature,
top_p=top_p,
top_k=top_k
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Examples
print(translate('I am happy.')) # ππ€
print(translate('I feel misunderstood.')) # π€¬π¬
print(translate('My parents want to have a new baby.')) # πΆπ€°πͺ
print(translate('I eat dinner with my family.')) # π₯ͺπ₯πͺ
Output Variability
This model uses sampling-based decoding (temperature and nucleus sampling) rather than deterministic beam search. As a result, the same input may occasionally produce slightly different emoji sequences across runs. This behavior is expected and reflects the model choosing between multiple valid emoji interpretations for a given sentence.
Training Details
Training Hyperparameters
- Learning Rate: 0.0001
- Batch Size: 8
- Gradient Accumulation Steps: 4
- Max Epochs per Phase: 20
- Early Stopping Patience: 3
LoRA Configuration
- Rank (r): 128
- Alpha: 256
- Dropout: 0.1
- Target Modules: q_proj, v_proj, k_proj, out_proj
Curriculum Learning Strategy
The model was trained using a 6-phase curriculum learning approach with strategic data retention:
Phase Composition
| Phase | Current Stage | Previous Stages Included |
|---|---|---|
| Phase 1 (bootstrap) | Stage 1 (100%) | None |
| Phase 2 | Stage 2 (100%) | None (Stage 1 dropped) |
| Phase 3 | Stage 3 (100%) | Stage 2 (33%) |
| Phase 4 | Stage 4 (100%) | Stage 3 (50%), Stage 2 (33%) |
| Phase 5 | Stage 5 (100%) | Stage 4 (50%), Stage 3 (50%), Stage 2 (33%) |
| Phase 6 | Stage 6 (100%) | Stage 5 (50%), Stage 4 (50%), Stage 3 (50%), Stage 2 (33%) |
Stage Retention Strategy
- Stage 1 (Bootstrap): Used only in Phase 1, then completely dropped to prevent the model from over-relying on basic patterns
- Stage 2 (Foundation): Retained at 33% in all subsequent phases (3-6) to maintain core vocabulary
- Stages 3-6 (Progressive Complexity): Each retained at 50% in subsequent phases to balance learning new patterns while preventing catastrophic forgetting
Complexity Progression
The curriculum is structured around emoji count, progressively increasing output complexity:
Phase 1 (Single Emoji Foundation): Simple phrases mapping to one emoji
- Example:
"I feel very happy"β π - Example:
"You pour some wine"β π·
- Example:
Phase 2 (Two Emojis): Basic two-concept expressions
- Example:
"They move to the rhythm"β π΅ π - Example:
"I like to drink wine"β β€οΈ π·
- Example:
Phase 3 (Three Emojis): Short sentences with three distinct concepts
- Example:
"He makes money selling his Islamic art"β π€ βͺοΈ π¨ - Example:
"We looked for the eagle and llama on the map"β π¦ π¦ πΊοΈ
- Example:
Phase 4 (Four Emojis): Longer phrases with multiple related concepts
- Example:
"We took the car to see the nature scenery"β π β°οΈ π² ποΈ - Example:
"He made breakfast with eggs and toast"β π³ π π₯ π
- Example:
Phase 5 (Five Emojis): Complex sentences requiring sequential emoji representations
- Example:
"The breakfast included eggs cheese bread and fresh milk"β π₯ π§ π π₯ π - Example:
"The bar served wine champagne beer and whiskey all night"β π· π₯ πΎ πΊ π₯
- Example:
Phase 6 (Six+ Emojis): Complex narratives with action sequences and multiple events
- Example:
"Board plane pack luggage reach beach swim drink coconut milk nap"β βοΈ π§³ ποΈ π π₯₯ π΄ - Example:
"Man in suit and woman in dress drink wine eat noodles listen to music get engaged"β π€΅ π π· π π΅ π
- Example:
This emoji-count-based curriculum allows the model to:
- Master single-concept mappings before handling multiple concepts
- Gradually increase sequence length and complexity (progressive difficulty without overwhelming the model)
- Prevention of catastrophic forgetting through strategic data retention (Stage 2 at 33%, Stages 3-6 at 50%)
- Learn compositional patterns (how emojis combine to represent complex ideas)
Training Dynamics
- Each phase uses stage-specific validation sets to prevent metric contamination across difficulty levels
- Early stopping with patience=3 prevents overfitting when validation loss increases
- The model automatically loads the best checkpoint (lowest validation loss) from each phase
Limitations
- Works best with English input text
- May not recognize very rare or newly created emojis
- Performance varies with text complexity and length
- Optimal for text under 32 tokens
Training Infrastructure
- Experiment: bart-large_custom_curriculum_lr0.0001_r128_20251121_115133
- Date: 2025-11-21
- Framework: Transformers + PEFT
- Training: Kaggle GPU T4 x2
Citation
If you use this model, please cite:
@misc{bart-emoji-translator,
author = {mohamedmostafa259},
title = {BART Emoji Translator},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/mohamedmostafa259/bart-emoji-translator}}
}
Model tree for mohamedmostafa259/bart-emoji-translator
Base model
facebook/bart-large