| # Templar-I: Permissionless Distributed Training | |
| > A 1.2B-parameter causal language model trained with **Gauntlet**, an incentive system that rewards permissionless contributors for useful pseudo-gradients on the Bittensor network. [[Paper]](https://arxiv.org/abs/2505.21684) | |
| --- | |
| ## Overview | |
| * **Setting:** Fully open, permissionless, internet-scale training; no control over who registers or their hardware. | |
| * **Mechanism:** Two-stage peer filtering (uptime/reliability/sync) + scoring per-peer gradient quality. | |
| * **Run:** 20K communication rounds; FineWebEdu data; top **15** peers aggregated per round with up to 250 registered peers. | |
| * **Result:** On a per-iteration basis, convergence outpaced a centralized AdamW baseline; downstream metrics are competitive. | |
| --- | |
| ## Gauntlet | |
| * **Stage 1:** Filters peers by uptime, reliability, and synchronization. | |
| * **Stage 2:** Estimates loss before/after applying each peer’s pseudo-gradients to evaluate its contribution. | |
| * **Ratings:** Uses **OpenSkill** to track competitiveness across time. | |
| * **Aggregation:** In each round, aggregate updates from the top-scoring **G=15** peers. | |
| --- | |
| ## Training setup | |
| * **Data:** FineWeb-edu \[11]. | |
| * **Rounds:** 20,000 communication rounds (evaluation windows matched rounds). | |
| * **Tokens:** 100-200B | |
| * **Baseline for comparison:** Centralized AdamW trained for 120B tokens. | |
| --- | |
| ## Quickstart | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_id = "tplr/TEMPLAR-I" | |
| tok = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") | |
| ``` | |
| --- | |
| ## Results | |
| ### Downstream Benchmarks (zero-shot) | |
| | Model | Dataset | Tokens | HellaSwag (acc_norm) | PIQA (acc_norm) | ARC-E (acc) | | |
| |-----------------|-------------|------------|----------------------:|----------------:|------------:| | |
| | TEMPLAR-1B | FineWebEdu | 100B–200B | 51.0 | 71.4 | 59.2 | | |
| | DeMo 1B [12] | Dolmo | 100B | 48.0 | 70.0 | 55.0 | | |
| | AdamW DDP 1B | FineWebEdu | 120B | 51.0 | 71.9 | 58.9 | | |
| ### Per-Iteration Loss | |
|  | |
| --- | |
| ## Citation | |
| If you use this model or Gauntlet, please cite it as follows: | |
| ``` | |
| @article{lidin2025incentivizing, | |
| title={Incentivizing Permissionless Distributed Learning of LLMs}, | |
| author={Lidin, Joel and Sarfi, Amir and Pappas, Evangelos and Dare, Samuel and Belilovsky, Eugene and Steeves, Jacob}, | |
| journal={arXiv preprint arXiv:2505.21684}, | |
| year={2025} | |
| } | |
| ``` | |