Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

LICENSE +22 -0
README.md +63 -0
easygpt.pt +3 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,22 @@

+MIT License
+Copyright (c) 2025 Siyuan Zhang
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+---
+language:
+- en
+license: mit
+library_name: pytorch
+tags:
+- nanogpt
+- transformer
+- llm
+- causal-lm
+datasets:
+- openwebtext
+---
+# EasyGPT-303M (Trained on OpenWebText)
+<div align="center">
+**A 303M parameter GPT-2 style model trained from scratch on the OpenWebText dataset.**
+*Reaching a validation loss of 2.887, comparable to GPT-2 Medium.*
+</div>
+---
+## 1. Model Introduction
+This is a **Decoder-only Transformer** language model trained using Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) framework. We integrated new components such as RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU, and GQA. It was trained from scratch on the **OpenWebText** dataset, which is an open-source reproduction of the dataset used to train OpenAI's GPT-2.We add
+### Key Specifications
+| Attribute | Value |
+| :--- | :--- |
+| **Parameters** | **303 Million** (comparable to GPT-2 Medium) |
+| **Architecture** | GPT-2 (1024 context window, RoPE/Standard embeddings) |
+| **Dataset** | [OpenWebText](https://huggingface.co/datasets/openwebtext) (~17GB cleaned) |
+| **Tokenizer** | GPT-2 BPE (via `tiktoken`) |
+| **Training Steps** | 15,000 steps |
+| **Batch Size** | ~0.5M tokens per step (Gradient Accumulation) |
+| **Total Tokens** | ~7.3 Billion tokens |
+| **Final Val Loss** | **2.887** (PPL $\approx$ 18.0) |
+### Training Details
+- **Hardware**: Single NVIDIA RTX 3090 (24GB VRAM)
+- **Optimizer**: AdamW
+- **Learning Rate**: Peak 3.2e-4 with Cosine Decay (warmup 800 steps)
+- **Precision**: BF16 (bfloat16) mixed precision
+### Capabilities
+As a **Base Model** (not instruction-tuned), it excels at:
+- **Text Completion**: Coherent story generation and article writing.
+- **In-Context Learning**: Can perform tasks (like sentiment analysis) given a few examples.
+- **Syntax & Structure**: Produces grammatically correct English with high consistency.
+---
+## 2. How to Use
+Since this model is based on `nanoGPT` and uses a custom checkpoint format (`.pt`), you need the original model definition to load it.You can refer to https://github.com/ssyzhang/EasyGPT
+`
+## 3. License
+This project is licensed under the MIT License.
+See the [LICENSE](./LICENSE) file for the full license text.

easygpt.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd3dec53fd970fa63427056583cdc09fd75e65121d3915d78df05655df53544d
+size 3638880424