Lutalica's picture

4 2

Lutalica

Lutalica

·

https://github.com/RewindL

RewindL

AI & ML interests

Computer vision, Image Processing

Recent Activity

commented on a paper 2 months ago

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

upvoted a paper 5 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

commented on a paper 7 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

View all activity

Organizations

commented a paper 2 months ago

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

Paper • 2509.26313 • Published Sep 30 • 4 •

upvoted a paper 5 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14 • 89

commented a paper 7 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 98 •

upvoted a paper 8 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 138

commented a paper 8 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 138 •

New activity in monology/pile-uncopyrighted about 1 year ago

Format issue when loading dataset

#1 opened about 2 years ago by