π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published Oct 29 • 64
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Paper • 2510.06710 • Published Oct 8 • 38
RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation Paper • 2509.15965 • Published Sep 19 • 1
What Can RL Bring to VLA Generalization? An Empirical Study Paper • 2505.19789 • Published May 26 • 1
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Paper • 2505.22094 • Published May 28 • 3
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Paper • 2506.16054 • Published Jun 19 • 60
Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization Paper • 2502.04686 • Published Feb 7 • 2
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 58
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16, 2024 • 6