Peng Wang's picture

Peng Wang

stillarrow

·

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 6 hours ago

PaperBanana: Automating Academic Illustration for AI Scientists

upvoted a paper 1 day ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

upvoted a collection 4 days ago

View all activity

Organizations

None yet

upvoted a paper about 6 hours ago

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 220

upvoted a paper 1 day ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published 6 days ago • 173

upvoted a collection 4 days ago

NeMo Gym

Collection of RL verifiable data for NeMo Gym • 22 items • Updated 1 day ago • 52

upvoted a collection 19 days ago

BFS-Prover

LLM Step-Provers in Lean4 • 5 items • Updated Oct 7, 2025 • 7

upvoted a paper 24 days ago

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23, 2025 • 14

upvoted a paper 26 days ago

Learning to Repair Lean Proofs from Compiler Feedback

Paper • 2602.02990 • Published Feb 3 • 29

upvoted a paper about 1 month ago

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 71

upvoted a paper about 2 months ago

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 102

upvoted an article about 2 months ago

Article

Open Responses: What you need to know

+2

Jan 15

•

109

upvoted 2 papers about 2 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 110

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

upvoted a paper 2 months ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28, 2025 • 37

upvoted a collection 2 months ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 185

upvoted an article 3 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

104

upvoted an article 4 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

+2

Dec 9, 2022

•

405

upvoted a paper 4 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 133

upvoted 2 papers 5 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 82

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

upvoted a collection 6 months ago

Qwen3-VL

37 items • Updated Dec 31, 2025 • 670

upvoted a paper 6 months ago

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24, 2025 • 122