tess2-v0.3-base / README.md

nielsr HF Staff

Add pipeline tag and library name to metadata

b0343ad verified 10 months ago

preview code

raw

history blame

1.33 kB

metadata

base_model:
  - mistralai/Mistral-7B-v0.3
datasets:
  - allenai/dolma
language:
  - en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers

TESS 2 v0.3 Base

This model is the diffusion adapted TESS 2. This model is a simplex-based diffusion model adapted from Mistral v0.3 7B, further trained on Dolma 1.7. For more details, please check out our paper TESS-2: A Large-Scale, Generalist Diffusion Language Model. This is the model based on Mistral v0.3.

This is the diffusion-adapted base model, which has not yet undergone instruction tuning. We recommend further tuning this model on your dataset of interest, or checking out the instruction tuned version.

This model will only work with our custom codebase found here -- please go there to see details on how to run training.

Citation

If you find this work useful, please cite this work as follows.

@misc{taeivison2025tess2,
  title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}},
  author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan},
  year={2025},
  eprint={2502.13917},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.13917},
 }