RL papers
updated
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
• 2412.05718
• Published • 4
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
• 2412.16145
• Published • 38
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
• 2412.15797
• Published • 18
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
• 2412.18319
• Published • 39
Cosmos World Foundation Model Platform for Physical AI
Paper
• 2501.03575
• Published • 82
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published • 55
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
• 2501.05707
• Published • 20
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published • 109
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 443
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published • 15
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published • 31