Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper β’ 2508.09726 β’ Published Aug 13 β’ 15
The BrowserGym Ecosystem for Web Agent Research Paper β’ 2412.05467 β’ Published Dec 6, 2024 β’ 23
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments Paper β’ 2511.07317 β’ Published 28 days ago β’ 13
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 8 days ago β’ 225
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper β’ 2402.03300 β’ Published Feb 5, 2024 β’ 137
view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation Sep 16 β’ 15
An efficient probabilistic hardware architecture for diffusion-like models Paper β’ 2510.23972 β’ Published Oct 28 β’ 3
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper β’ 2510.25992 β’ Published Oct 29 β’ 44
view article Article 3+ Years of ML & Society at Hugging Face π€π€π§βπ€βπ§ Oct 29 β’ 13
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Oct 27 β’ 71
gpt-oss-safeguard Collection gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss β’ 2 items β’ Updated Oct 29 β’ 58
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper β’ 2402.12030 β’ Published Feb 19, 2024 β’ 3
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper β’ 2307.09288 β’ Published Jul 18, 2023 β’ 247