---
language: en
tags:
  - language-model
  - custom-architecture
  - jarvisx50m
license: mit
---

# JarvisX50M

**JarvisX50M** is a 50M parameter language model built from scratch with the **JarvisXCore** architecture, designed to be lean, fast, and factual. Trained on WikiText-2, it aims to rival GPT-2 in accuracy (~85-95% on factual Q&A) while being ~5x faster and ~4x lighter. India's first custom AI, crafted for budget devices! 🇮🇳

## Model Details
- **Parameters**: ~50M
- **Architecture**: JarvisXCore (custom multi-head attention, GELU, optimized FFNs)
- **Training Data**: WikiText-2 (~2M tokens)
- **Vocabulary Size**: 50,257 (GPT-2 tokenizer)
- **Context Length**: 256 tokens
- **Training**: 3 epochs, ~2,800 steps/epoch, CPU/GPU
- **Final Loss**: ~0.0010

## Try It Out!
Chat with JarvisX50M below (powered by Gradio):

<iframe
  src="https://vihaan134354-jarvisx50m-chat.hf.space"
  frameborder="0"
  width="100%"
  height="400"
></iframe>

## Usage
```python
import torch
from model import JarvisX50M, Config
from transformers import AutoTokenizer

config = Config()
model = JarvisX50M(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
tokenizer = AutoTokenizer.from_pretrained("vihaan134354/JarvisX50M")
model.eval()
```

## Chat
Run the chat script:
```bash
python chat_jarvisx50m.py
```

## Train
Retrain with:
```bash
python train_jarvisx50m.py
```

## Example
**Prompt**: "Tell me about Rome"  
**Output**: "Rome's empire shaped law, architecture, and culture for centuries."

## Note
Casual prompts (e.g., "What's up?") may need fine-tuning for better coherence due to WikiText-2 focus. Try factual questions for best results!

## Author
Created by vihaan134354. Aiming to put India on the AI map! 🚀

---