StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2, 2025 • 57
Are Reasoning Models More Prone to Hallucination? Paper • 2505.23646 • Published May 29, 2025 • 24 • 2
AdaptThink: Reasoning Models Can Learn When to Think Paper • 2505.13417 • Published May 19, 2025 • 83
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26, 2025 • 23
ADELIE: Aligning Large Language Models on Information Extraction Paper • 2405.05008 • Published May 8, 2024 • 2
OpenSAE-LLaMA-3.1-8B Collection OpenSAE checkpoints for LLaMA 3.1 8B base model • 38 items • Updated Jan 29, 2025 • 5
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Paper • 2501.13007 • Published Jan 22, 2025 • 19 • 3
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Paper • 2501.13007 • Published Jan 22, 2025 • 19
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Paper • 2501.13007 • Published Jan 22, 2025 • 19 • 3
Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published Oct 21, 2024 • 17
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21, 2024 • 26
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21, 2024 • 26 • 2