--- license: mit datasets: - shivendrra/consolidated-datasets language: - en metrics: - perplexity tags: - Basemodel - text-generation - nlp - custom_code - casual-llm library_name: transformers --- # TinyWay-1.2.0 **TinyWay-1.2.0** is a lightweight GPT-style causal language model (~110M parameters) trained from scratch on a mixed streaming corpus (web text, stories, and code). The model is designed for research, experimentation, and educational purposes, with an emphasis on transparent architecture and reproducible training. > ⚡ Trained end-to-end using a custom PyTorch pipeline with mixed precision, gradient accumulation, and streaming datasets. --- ## Model Overview | Property | Value | | ----------------- | ------------------------------------ | | Model type | Decoder-only Transformer (GPT-style) | | Parameters | **~109.6M** | | Layers | 10 | | Hidden size | 768 | | Attention heads | 12 | | Context length | 256 tokens | | Activation | GELU | | Dropout | 0.1 | | Precision | fp16 / bf16 | | Weight tying | Token embedding tied with LM head | | Position encoding | Learned absolute embeddings | --- ## Training Details ### Dataset The model was trained using **streaming data** from: * 🌍 Web text * 📚 Stories * 💻 Code via the HuggingFace dataset: ``` shivendrra/consolidated-datasets ``` Streaming was used to avoid large local storage and to allow continuous sampling directly from HuggingFace. --- ### Tokenization * Tokenizer: **GPT2TokenizerFast** * Vocabulary size: **50,257** * Special tokens: * `bos_token_id = eos_token_id = pad_token_id = 50256` --- ### Training Configuration | Setting | Value | | --------------------- | ---------------------------- | | Sequence length | 256 | | Effective batch size | 64 sequences | | Optimizer | AdamW | | Learning rate | 3e-4 (cosine decay + warmup) | | Betas | (0.9, 0.95) | | Weight decay | 0.1 | | Gradient clipping | 1.0 | | Mixed precision | AMP (fp16 / bf16) | | Gradient accumulation | Yes | | Training steps | ~60k | | Total tokens | ~1B (approx) | Final training loss ≈ **3.0** Final perplexity ≈ **~20** --- ## Usage ### Load with Transformers (Custom Code Required) This repository uses a custom model definition (`modeling_tinyway.py`). Make sure it is available in your environment before loading. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("NNEngine/TinyWay-1.2.0") tokenizer = AutoTokenizer.from_pretrained("gpt2") ``` --- ### Text Generation Example ```python import torch prompt = "Once upon a time" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, temperature=0.8, top_k=50, top_p=0.95, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Example Generations The model demonstrates: * ✅ Coherent sentence structure * ✅ Narrative flow in stories * ✅ Reasonable grammar and punctuation * ⚠️ Occasional repetition and topic drift (expected for this scale) This is a research-grade small LLM, not instruction-aligned by default. --- ## Limitations * ❌ Not instruction-tuned * ❌ Limited reasoning depth compared to large LLMs * ❌ Context length limited to 256 tokens * ⚠️ May hallucinate or generate inconsistent facts * ⚠️ Training data may contain noise from web sources Use responsibly. --- ## Intended Use * Research experiments * Educational purposes * Model scaling studies * Training pipeline benchmarking * Custom fine-tuning experiments Not recommended for production or safety-critical applications. --- ## Reproducibility The model was trained using: * Custom PyTorch training loop * Streaming datasets via HuggingFace * Mixed precision training * Gradient accumulation * Periodic checkpointing * Full monitoring (loss, perplexity, gradient norm, attention entropy) If you’d like the full training code or configs, feel free to reach out. --- ## License This model follows the license of the underlying datasets and tokenizer. Please ensure compliance before commercial usage. --- ## Acknowledgements * HuggingFace 🤗 * PyTorch * GPT-2 tokenizer * Open research community