Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 6 days ago • 77
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 3 days ago • 41
Go-Explore: a New Approach for Hard-Exploration Problems Paper • 1901.10995 • Published Jan 30, 2019 • 1
KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2, 2024 • 20