siyzhang commited on
Commit
488dd0b
·
verified ·
1 Parent(s): fdabffb

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. LICENSE +22 -0
  2. README.md +63 -0
  3. easygpt.pt +3 -0
LICENSE ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Siyuan Zhang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: pytorch
6
+ tags:
7
+ - nanogpt
8
+ - transformer
9
+ - llm
10
+ - causal-lm
11
+ datasets:
12
+ - openwebtext
13
+ ---
14
+
15
+ # EasyGPT-303M (Trained on OpenWebText)
16
+
17
+ <div align="center">
18
+
19
+ **A 303M parameter GPT-2 style model trained from scratch on the OpenWebText dataset.**
20
+ *Reaching a validation loss of 2.887, comparable to GPT-2 Medium.*
21
+
22
+ </div>
23
+
24
+ ---
25
+
26
+ ## 1. Model Introduction
27
+
28
+ This is a **Decoder-only Transformer** language model trained using Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) framework. We integrated new components such as RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU, and GQA. It was trained from scratch on the **OpenWebText** dataset, which is an open-source reproduction of the dataset used to train OpenAI's GPT-2.We add
29
+
30
+ ### Key Specifications
31
+ | Attribute | Value |
32
+ | :--- | :--- |
33
+ | **Parameters** | **303 Million** (comparable to GPT-2 Medium) |
34
+ | **Architecture** | GPT-2 (1024 context window, RoPE/Standard embeddings) |
35
+ | **Dataset** | [OpenWebText](https://huggingface.co/datasets/openwebtext) (~17GB cleaned) |
36
+ | **Tokenizer** | GPT-2 BPE (via `tiktoken`) |
37
+ | **Training Steps** | 15,000 steps |
38
+ | **Batch Size** | ~0.5M tokens per step (Gradient Accumulation) |
39
+ | **Total Tokens** | ~7.3 Billion tokens |
40
+ | **Final Val Loss** | **2.887** (PPL $\approx$ 18.0) |
41
+
42
+ ### Training Details
43
+ - **Hardware**: Single NVIDIA RTX 3090 (24GB VRAM)
44
+ - **Optimizer**: AdamW
45
+ - **Learning Rate**: Peak 3.2e-4 with Cosine Decay (warmup 800 steps)
46
+ - **Precision**: BF16 (bfloat16) mixed precision
47
+
48
+ ### Capabilities
49
+ As a **Base Model** (not instruction-tuned), it excels at:
50
+ - **Text Completion**: Coherent story generation and article writing.
51
+ - **In-Context Learning**: Can perform tasks (like sentiment analysis) given a few examples.
52
+ - **Syntax & Structure**: Produces grammatically correct English with high consistency.
53
+
54
+ ---
55
+
56
+ ## 2. How to Use
57
+
58
+ Since this model is based on `nanoGPT` and uses a custom checkpoint format (`.pt`), you need the original model definition to load it.You can refer to https://github.com/ssyzhang/EasyGPT
59
+ `
60
+
61
+ ## 3. License
62
+ This project is licensed under the MIT License.
63
+ See the [LICENSE](./LICENSE) file for the full license text.
easygpt.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd3dec53fd970fa63427056583cdc09fd75e65121d3915d78df05655df53544d
3
+ size 3638880424