EasyGPT-303M (Trained on OpenWebText)

A 303M parameter GPT-2 style model trained from scratch on the OpenWebText dataset.
Reaching a validation loss of 2.887, comparable to GPT-2 Medium.

1. Model Introduction

This is a Decoder-only Transformer language model trained using Andrej Karpathy's nanoGPT framework. We integrated new components such as RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU, and GQA. It was trained from scratch on the OpenWebText dataset, which is an open-source reproduction of the dataset used to train OpenAI's GPT-2.

Key Specifications

Attribute	Value
Parameters	303 Million (comparable to GPT-2 Medium)
Architecture	GPT-2 (1024 context window, RoPE/Standard embeddings)
Dataset	OpenWebText (~17GB cleaned)
Tokenizer	GPT-2 BPE (via `tiktoken`)
Training Steps	15,000 steps
Batch Size	~0.5M tokens per step (Gradient Accumulation)
Total Tokens	~7.3 Billion tokens
Final Val Loss	2.887 (PPL 18.0)

Training Details

Hardware: Single NVIDIA RTX 3090 (24GB VRAM)
Optimizer: AdamW
Learning Rate: Peak 3.2e-4 with Cosine Decay (warmup 800 steps)
Precision: BF16 (bfloat16) mixed precision

Capabilities

As a Base Model (not instruction-tuned), it excels at:

Text Completion: Coherent story generation and article writing.
In-Context Learning: Can perform tasks (like sentiment analysis) given a few examples.
Syntax & Structure: Produces grammatically correct English with high consistency.

2. How to Use

Since this model is based on nanoGPT and uses a custom checkpoint format (.pt), you need the original model definition to load it.You can refer to https://github.com/ssyzhang/EasyGPT

3. License

This project is licensed under the MIT License. See the LICENSE file for the full license text.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

siyzhang
/

EasyGPT