papers-to-read
updated
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
277
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
•
2507.01006
•
Published
•
250
A Survey of Context Engineering for Large Language Models
Paper
•
2507.13334
•
Published
•
259
MemOS: A Memory OS for AI System
Paper
•
2507.03724
•
Published
•
157
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
•
2507.15846
•
Published
•
133
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
•
2507.16784
•
Published
•
122
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper
•
2507.02592
•
Published
•
123
4KAgent: Agentic Any Image to 4K Super-Resolution
Paper
•
2507.07105
•
Published
•
105
ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents
Paper
•
2507.22827
•
Published
•
99
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
•
2508.01191
•
Published
•
238
Paper
•
2508.10104
•
Published
•
291
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
•
2508.06471
•
Published
•
195
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper
•
2508.04026
•
Published
•
161
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
•
2508.05629
•
Published
•
180
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
•
2508.16153
•
Published
•
160
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
•
2509.08721
•
Published
•
661
A.S.E: A Repository-Level Benchmark for Evaluating Security in
AI-Generated Code
Paper
•
2508.18106
•
Published
•
347
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
•
2509.02547
•
Published
•
228
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
•
2509.08827
•
Published
•
190
A Survey of Scientific Large Language Models: From Data Foundations to
Agent Frontiers
Paper
•
2508.21148
•
Published
•
140
Why Language Models Hallucinate
Paper
•
2509.04664
•
Published
•
195
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
•
2509.07980
•
Published
•
101
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for
Open-Ended Deep Research
Paper
•
2509.13312
•
Published
•
105
Scaling Agents via Continual Pre-training
Paper
•
2509.13310
•
Published
•
117
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper
•
2509.06501
•
Published
•
79
Towards a Unified View of Large Language Model Post-Training
Paper
•
2509.04419
•
Published
•
75
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
•
2509.01055
•
Published
•
76
MachineLearningLM: Continued Pretraining Language Models on Millions of
Synthetic Tabular Prediction Tasks Scales In-Context ML
Paper
•
2509.06806
•
Published
•
63
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
•
2509.15207
•
Published
•
114