LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 39
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published Jun 22, 2025 • 18
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published Jun 23, 2025 • 7