Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Templar-I: Permissionless Distributed Training
|
| 2 |
+
|
| 3 |
+
> A 1.2B-parameter causal language model trained with **Gauntlet**, an incentive system that rewards permissionless contributors for useful pseudo-gradients on the Bittensor network. [[Paper]](https://arxiv.org/abs/2505.21684)
|
| 4 |
+
|
| 5 |
---
|
| 6 |
+
## Overview
|
| 7 |
+
|
| 8 |
+
* **Setting:** Fully open, permissionless, internet-scale training; no control over who registers or their hardware.
|
| 9 |
+
* **Mechanism:** Two-stage peer filtering (uptime/reliability/sync) + scoring per-peer gradient quality.
|
| 10 |
+
* **Run:** 20K communication rounds; FineWebEdu data; top **15** peers aggregated per round with up to 250 registered peers.
|
| 11 |
+
* **Result:** On a per-iteration basis, convergence outpaced a centralized AdamW baseline; downstream metrics are competitive.
|
| 12 |
+
|
| 13 |
---
|
| 14 |
+
|
| 15 |
+
## Gauntlet
|
| 16 |
+
|
| 17 |
+
* **Stage 1:** Filters peers by uptime, reliability, and synchronization.
|
| 18 |
+
* **Stage 2:** Estimates loss before/after applying each peer’s pseudo-gradients to evaluate its contribution.
|
| 19 |
+
* **Ratings:** Uses **OpenSkill** to track competitiveness across time.
|
| 20 |
+
* **Aggregation:** In each round, aggregate updates from the top-scoring **G=15** peers.
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Training setup
|
| 25 |
+
|
| 26 |
+
* **Data:** FineWeb-edu \[11].
|
| 27 |
+
* **Rounds:** 20,000 communication rounds (evaluation windows matched rounds).
|
| 28 |
+
* **Tokens:** 100-200B
|
| 29 |
+
* **Baseline for comparison:** Centralized AdamW trained for 120B tokens.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## Quickstart
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 37 |
+
import torch
|
| 38 |
+
|
| 39 |
+
model_id = "tplr/TEMPLAR-I"
|
| 40 |
+
tok = AutoTokenizer.from_pretrained(model_id)
|
| 41 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## Results
|
| 47 |
+
### Downstream Benchmarks (zero-shot)
|
| 48 |
+
| Model | Dataset | Tokens | HellaSwag (acc_norm) | PIQA (acc_norm) | ARC-E (acc) |
|
| 49 |
+
|-----------------|-------------|------------|----------------------:|----------------:|------------:|
|
| 50 |
+
| TEMPLAR-1B | FineWebEdu | 100B–200B | 51.0 | 71.4 | 59.2 |
|
| 51 |
+
| DeMo 1B [12] | Dolmo | 100B | 48.0 | 70.0 | 55.0 |
|
| 52 |
+
| AdamW DDP 1B | FineWebEdu | 120B | 51.0 | 71.9 | 58.9 |
|
| 53 |
+
|
| 54 |
+
### Per-Iteration Loss
|
| 55 |
+

|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## Citation
|
| 60 |
+
|
| 61 |
+
If you use this model or Gauntlet, please cite it as follows:
|
| 62 |
+
|
| 63 |
+
```
|
| 64 |
+
@article{lidin2025incentivizing,
|
| 65 |
+
title={Incentivizing Permissionless Distributed Learning of LLMs},
|
| 66 |
+
author={Lidin, Joel and Sarfi, Amir and Pappas, Evangelos and Dare, Samuel and Belilovsky, Eugene and Steeves, Jacob},
|
| 67 |
+
journal={arXiv preprint arXiv:2505.21684},
|
| 68 |
+
year={2025}
|
| 69 |
+
}
|
| 70 |
+
```
|