Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation Paper • 2602.03806 • Published about 21 hours ago • 3
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 223
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents Paper • 2509.26354 • Published Sep 30, 2025 • 18
On Evaluating the Durability of Safeguards for Open-Weight LLMs Paper • 2412.07097 • Published Dec 10, 2024 • 1
Dynamic Risk Assessments for Offensive Cybersecurity Agents Paper • 2505.18384 • Published May 23, 2025 • 8
Dynamic Risk Assessments for Offensive Cybersecurity Agents Paper • 2505.18384 • Published May 23, 2025 • 8
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published Apr 21, 2025 • 44
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19, 2025 • 13
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees Paper • 2311.08384 • Published Nov 14, 2023
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Paper • 2402.19446 • Published Feb 29, 2024
Autonomous Evaluation and Refinement of Digital Agents Paper • 2404.06474 • Published Apr 9, 2024 • 3