Shenao Zhang's picture

3 9 10

Shenao Zhang

ZhangShenao

·

https://shenao-zhang.github.io/

ShenaoZhang

AI & ML interests

None yet

Organizations

authored a paper 4 months ago

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Paper • 2509.25810 • Published Sep 30, 2025 • 6

authored a paper 8 months ago

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

Paper • 2505.20561 • Published May 26, 2025 • 7

authored 4 papers about 1 year ago

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Paper • 2405.16436 • Published May 26, 2024 • 1

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Paper • 2410.08067 • Published Oct 10, 2024 • 2

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Paper • 2411.13611 • Published Nov 20, 2024

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

authored a paper over 1 year ago

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Paper • 2405.19332 • Published May 29, 2024 • 22