--- language: en tags: - language-model - custom-architecture - jarvisx50m license: mit --- # JarvisX50M **JarvisX50M** is a 50M parameter language model built from scratch with the **JarvisXCore** architecture, designed to be lean, fast, and factual. Trained on WikiText-2, it aims to rival GPT-2 in accuracy (~85-95% on factual Q&A) while being ~5x faster and ~4x lighter. India's first custom AI, crafted for budget devices! 🇮🇳 ## Model Details - **Parameters**: ~50M - **Architecture**: JarvisXCore (custom multi-head attention, GELU, optimized FFNs) - **Training Data**: WikiText-2 (~2M tokens) - **Vocabulary Size**: 50,257 (GPT-2 tokenizer) - **Context Length**: 256 tokens - **Training**: 3 epochs, ~2,800 steps/epoch, CPU/GPU - **Final Loss**: ~0.0010 ## Try It Out! Chat with JarvisX50M below (powered by Gradio): ## Usage ```python import torch from model import JarvisX50M, Config from transformers import AutoTokenizer config = Config() model = JarvisX50M(config) model.load_state_dict(torch.load("pytorch_model.bin")) tokenizer = AutoTokenizer.from_pretrained("vihaan134354/JarvisX50M") model.eval() ``` ## Chat Run the chat script: ```bash python chat_jarvisx50m.py ``` ## Train Retrain with: ```bash python train_jarvisx50m.py ``` ## Example **Prompt**: "Tell me about Rome" **Output**: "Rome's empire shaped law, architecture, and culture for centuries." ## Note Casual prompts (e.g., "What's up?") may need fine-tuning for better coherence due to WikiText-2 focus. Try factual questions for best results! ## Author Created by vihaan134354. Aiming to put India on the AI map! 🚀 ---