File size: 1,083 Bytes
9db5442 e89731b 9db5442 e89731b 9db5442 e89731b 9db5442 e89731b 9db5442 e89731b 9db5442 e89731b ec1fa0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
library_name: transformers
tags:
- rm
- latent
datasets:
- openai/gsm8k
base_model:
- openai-community/gpt2
pipeline_tag: token-classification
---
# LatentRM
The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space.
LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities.
<p align="center">
<a href="https://arxiv.org/pdf/2510.07745"><b>Paper Link</b>👁️</a>
</p>
<p align="center">
<a href="https://github.com/YRYangang/LatentTTS"><b>GitHub Repo</b>🐙</a>
</p>
## Citation
```
@misc{you2025paralleltesttimescalinglatent,
title={Parallel Test-Time Scaling for Latent Reasoning Models},
author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
year={2025},
eprint={2510.07745},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.07745},
}
``` |