Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
• 2505.01441
• Published
• 39
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published
• 21
Emergent Agentic Transformer from Chain of Hindsight Experience
Paper
• 2305.16554
• Published
DiaTool-DPO: Multi-Turn Direct Preference Optimization for
Tool-Augmented Large Language Models
Paper
• 2504.02882
• Published
• 7
ATLAS: Learning to Optimally Memorize the Context at Test Time
Paper
• 2505.23735
• Published
• 23
Self-Challenging Language Model Agents
Paper
• 2506.01716
• Published
• 10
Matrix-Game: Interactive World Foundation Model
Paper
• 2506.18701
• Published
• 72
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published
• 58
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper
• 2509.06501
• Published
• 82
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published
• 107
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Paper
• 2510.24699
• Published
• 71
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Paper
• 2511.08522
• Published
• 18
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
Paper
• 2511.07685
• Published
• 10
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published
• 84