File size: 1,327 Bytes
aa836f3
b0343ad
 
aa836f3
 
 
 
b0343ad
 
 
aa836f3
b0343ad
aa836f3
 
 
dc95d3b
aa836f3
 
 
 
 
b437e44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
base_model:
- mistralai/Mistral-7B-v0.3
datasets:
- allenai/dolma
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
---

# TESS 2 v0.3 Base

This model is the diffusion adapted TESS 2. This model is a simplex-based diffusion model adapted from Mistral v0.3 7B, further trained on Dolma 1.7.
For more details, please check out our paper [TESS-2: A Large-Scale, Generalist Diffusion Language Model](https://arxiv.org/abs/2502.13917).
This is the model based on Mistral v0.3.

**This is the diffusion-adapted base model, which has not yet undergone instruction tuning. We recommend further tuning this model on your dataset of interest, or checking out the [instruction tuned version](https://huggingface.co/hamishivi/tess2).**

This model will only work with our custom codebase found [here](https://github.com/hamishivi/tess-2) -- please go there to see details on how to run training.

## Citation

If you find this work useful, please cite this work as follows.

```bibtex
@misc{taeivison2025tess2,
  title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}},
  author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan},
  year={2025},
  eprint={2502.13917},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.13917},
 }
```