-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
Collections
Discover the best community collections!
Collections including paper arxiv:2411.08147
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper • 2411.10958 • Published • 55 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 122 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 50 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 28
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework
Paper • 2410.06328 • Published • 2 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 63
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 23 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 122 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 50 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 28
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework
Paper • 2410.06328 • Published • 2 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 63
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper • 2411.10958 • Published • 55 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 23 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104