--- library_name: transformers tags: - rm - latent datasets: - openai/gsm8k base_model: - openai-community/gpt2 pipeline_tag: token-classification --- # LatentRM The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space. LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities.

## Citation ``` @misc{you2025paralleltesttimescalinglatent, title={Parallel Test-Time Scaling for Latent Reasoning Models}, author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li}, year={2025}, eprint={2510.07745}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.07745}, } ```