Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published 30 days ago • 104
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2 • 25
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12 • 40
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11 • 101