Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Paper
•
2309.08351
•
Published
•
3
This model is a bert-base architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective.
BibTeX:
@misc{godey2023headless,
title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying},
author={Nathan Godey and Éric de la Clergerie and Benoît Sagot},
year={2023},
eprint={2309.08351},
archivePrefix={arXiv},
primaryClass={cs.CL}
}