Mann Patel's picture

Mann Patel

manncodes

·

AI & ML interests

NLP, Mech Interp, Reasoning, MLSystems

Recent Activity

upvoted a paper about 8 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

upvoted a paper about 8 hours ago

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

upvoted a paper 10 days ago

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

View all activity

Organizations

None yet

upvoted 2 papers about 8 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published 1 day ago • 102

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Paper • 2512.20578 • Published 17 days ago • 68

upvoted a paper 10 days ago

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Paper • 2512.04220 • Published Dec 3, 2025 • 13

upvoted a paper 16 days ago

When Reasoning Meets Its Laws

Paper • 2512.17901 • Published 21 days ago • 56

upvoted a paper 20 days ago

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published 25 days ago • 30

liked a model 21 days ago

XiaomiMiMo/MiMo-V2-Flash

Text Generation • 310B • Updated 23 days ago • 31.4k • • 558

upvoted an article 25 days ago

Article

Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance

Dec 9, 2025

•

82

upvoted a collection 25 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 17 days ago • 56

liked a model 25 days ago

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8

Text Generation • 32B • Updated 1 day ago • 650k • • 229

New activity in allenai/Dolci-Think-RL-32B 26 days ago

decoding the coding ground truths

#1 opened 26 days ago by

upvoted a collection about 1 month ago

Tiny-A2D

Small diffusion language models adapted from AR models • 4 items • Updated Dec 6, 2025 • 13

upvoted a paper about 1 month ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 96

liked 2 datasets about 1 month ago

Anthropic/AnthropicInterviewer

Viewer • Updated 4 days ago • 1.25k • 5.86k • 350

nvidia/AceReason-Math

Viewer • Updated Jun 18, 2025 • 49.6k • 1.04k • 44

updated a collection about 1 month ago

RL data

8 items • Updated Dec 7, 2025

liked 2 datasets about 1 month ago

meta-math/MetaMathQA

Viewer • Updated Dec 21, 2023 • 395k • 8.08k • 432

qwedsacf/competition_math

Viewer • Updated Jan 28, 2023 • 12.5k • 7.98k • 76

updated a collection about 1 month ago

RL data

8 items • Updated Dec 7, 2025

liked a dataset about 1 month ago

nvidia/ToolScale

Viewer • Updated 24 days ago • 4.06k • 1.24k • 171

updated a collection about 1 month ago

RL data

8 items • Updated Dec 7, 2025