tplr
/

TEMPLAR-I

Model card Files Files and versions

TEMPLAR-I / README.md

joellidin's picture

Update README.md

dabf08d verified 4 months ago

|

history blame contribute delete

2.75 kB

	# Templar-I: Permissionless Distributed Training

	> A 1.2B-parameter causal language model trained with Gauntlet, an incentive system that rewards permissionless contributors for useful pseudo-gradients on the Bittensor network. [[Paper]](https://arxiv.org/abs/2505.21684)

	---
	## Overview

	* Setting: Fully open, permissionless, internet-scale training; no control over who registers or their hardware.
	* Mechanism: Two-stage peer filtering (uptime/reliability/sync) + scoring per-peer gradient quality.
	* Run: 20K communication rounds; FineWebEdu data; top 15 peers aggregated per round with up to 250 registered peers.
	* Result: On a per-iteration basis, convergence outpaced a centralized AdamW baseline; downstream metrics are competitive.

	---

	## Gauntlet

	* Stage 1: Filters peers by uptime, reliability, and synchronization.
	* Stage 2: Estimates loss before/after applying each peer’s pseudo-gradients to evaluate its contribution.
	* Ratings: Uses OpenSkill to track competitiveness across time.
	* Aggregation: In each round, aggregate updates from the top-scoring G=15 peers.

	---

	## Training setup

	* Data: FineWeb-edu \[11].
	* Rounds: 20,000 communication rounds (evaluation windows matched rounds).
	* Tokens: 100-200B
	* Baseline for comparison: Centralized AdamW trained for 120B tokens.

	---

	## Quickstart

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "tplr/TEMPLAR-I"
	tok = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
	```

	---

	## Results
	### Downstream Benchmarks (zero-shot)
	\| Model \| Dataset \| Tokens \| HellaSwag (acc_norm) \| PIQA (acc_norm) \| ARC-E (acc) \|
	\|-----------------\|-------------\|------------\|----------------------:\|----------------:\|------------:\|
	\| TEMPLAR-1B \| FineWebEdu \| 100B–200B \| 51.0 \| 71.4 \| 59.2 \|
	\| DeMo 1B [12] \| Dolmo \| 100B \| 48.0 \| 70.0 \| 55.0 \|
	\| AdamW DDP 1B \| FineWebEdu \| 120B \| 51.0 \| 71.9 \| 58.9 \|

	### Per-Iteration Loss
	![Training loss](./figures/per_iteration_loss.png)

	---

	## Citation

	If you use this model or Gauntlet, please cite it as follows:

	```
	@article{lidin2025incentivizing,
	title={Incentivizing Permissionless Distributed Learning of LLMs},
	author={Lidin, Joel and Sarfi, Amir and Pappas, Evangelos and Dare, Samuel and Belilovsky, Eugene and Steeves, Jacob},
	journal={arXiv preprint arXiv:2505.21684},
	year={2025}
	}
	```