tplr
/

TEMPLAR-I

Safetensors

llama

Model card Files Files and versions

xet

Community

joellidin commited on Sep 19, 2025

Commit

dabf08d

verified ·

1 Parent(s): 45dd831

Update README.md

Browse files

Files changed (1) hide show

README.md +68 -1

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
 ---
-license: mit
 ---

+# Templar-I: Permissionless Distributed Training
+> A 1.2B-parameter causal language model trained with **Gauntlet**, an incentive system that rewards permissionless contributors for useful pseudo-gradients on the Bittensor network. [[Paper]](https://arxiv.org/abs/2505.21684)
 ---
+## Overview
+* **Setting:** Fully open, permissionless, internet-scale training; no control over who registers or their hardware.
+* **Mechanism:** Two-stage peer filtering (uptime/reliability/sync) + scoring per-peer gradient quality.
+* **Run:** 20K communication rounds; FineWebEdu data; top **15** peers aggregated per round with up to 250 registered peers.
+* **Result:** On a per-iteration basis, convergence outpaced a centralized AdamW baseline; downstream metrics are competitive.
 ---
+## Gauntlet
+* **Stage 1:** Filters peers by uptime, reliability, and synchronization.
+* **Stage 2:** Estimates loss before/after applying each peer’s pseudo-gradients to evaluate its contribution.
+* **Ratings:** Uses **OpenSkill** to track competitiveness across time.
+* **Aggregation:** In each round, aggregate updates from the top-scoring **G=15** peers.
+---
+## Training setup
+* **Data:** FineWeb-edu \[11].
+* **Rounds:** 20,000 communication rounds (evaluation windows matched rounds).
+* **Tokens:** 100-200B
+* **Baseline for comparison:** Centralized AdamW trained for 120B tokens.
+---
+## Quickstart
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "tplr/TEMPLAR-I"
+tok = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
+```
+---
+## Results
+### Downstream Benchmarks (zero-shot)
+| Model           | Dataset     | Tokens     | HellaSwag (acc_norm) | PIQA (acc_norm) | ARC-E (acc) |
+|-----------------|-------------|------------|----------------------:|----------------:|------------:|
+| TEMPLAR-1B      | FineWebEdu  | 100B–200B  |                 51.0  |            71.4 |        59.2 |
+| DeMo 1B [12]    | Dolmo       | 100B       |                 48.0  |            70.0 |        55.0 |
+| AdamW DDP 1B    | FineWebEdu  | 120B       |                 51.0  |            71.9 |        58.9 |
+### Per-Iteration Loss
+![Training loss](./figures/per_iteration_loss.png)
+---
+## Citation
+If you use this model or Gauntlet, please cite it as follows:
+```
+@article{lidin2025incentivizing,
+  title={Incentivizing Permissionless Distributed Learning of LLMs},
+  author={Lidin, Joel and Sarfi, Amir and Pappas, Evangelos and Dare, Samuel and Belilovsky, Eugene and Steeves, Jacob},
+  journal={arXiv preprint arXiv:2505.21684},
+  year={2025}
+}
+```