The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published Sep 26, 2025 • 27
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds Paper • 2508.12782 • Published Aug 18, 2025 • 25
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5, 2025 • 133