EasyGPT-303M (Trained on OpenWebText)

Hugging Face GitHub License

A 303M parameter GPT-2 style model trained from scratch on the OpenWebText dataset.
Reaching a validation loss of 2.887, comparable to GPT-2 Medium.


1. Model Introduction

This is a Decoder-only Transformer language model trained using Andrej Karpathy's nanoGPT framework. We integrated new components such as RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU, and GQA. It was trained from scratch on the OpenWebText dataset, which is an open-source reproduction of the dataset used to train OpenAI's GPT-2.

Key Specifications

Attribute Value
Parameters 303 Million (comparable to GPT-2 Medium)
Architecture GPT-2 (1024 context window, RoPE/Standard embeddings)
Dataset OpenWebText (~17GB cleaned)
Tokenizer GPT-2 BPE (via tiktoken)
Training Steps 15,000 steps
Batch Size ~0.5M tokens per step (Gradient Accumulation)
Total Tokens ~7.3 Billion tokens
Final Val Loss 2.887 (PPL 18.0)

Training Details

  • Hardware: Single NVIDIA RTX 3090 (24GB VRAM)
  • Optimizer: AdamW
  • Learning Rate: Peak 3.2e-4 with Cosine Decay (warmup 800 steps)
  • Precision: BF16 (bfloat16) mixed precision

Capabilities

As a Base Model (not instruction-tuned), it excels at:

  • Text Completion: Coherent story generation and article writing.
  • In-Context Learning: Can perform tasks (like sentiment analysis) given a few examples.
  • Syntax & Structure: Produces grammatically correct English with high consistency.

2. How to Use

Since this model is based on nanoGPT and uses a custom checkpoint format (.pt), you need the original model definition to load it.You can refer to https://github.com/ssyzhang/EasyGPT

3. License

This project is licensed under the MIT License. See the LICENSE file for the full license text.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train siyzhang/EasyGPT