PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary Paper • 2601.10201 • Published 2 days ago • 5
HanningZhang/deepseek_only_conjecture_claude_deepseek_train_data_max1_5e-7_bs32_decay1e-6_2ep_ep1 Text Generation • 7B • Updated 4 days ago • 76
HanningZhang/deepseek_only_conjecture_claude_deepseek_train_data_max1_5e-7_bs32_decay1e-6_2ep_ep1 Text Generation • 7B • Updated 4 days ago • 76
HanningZhang/physlean_ds_prover_noapply_curri_grpo_1e-6_bs256_step35 Text Generation • 7B • Updated 13 days ago • 75
HanningZhang/physlean_ds_prover_noapply_curri_grpo_1e-6_bs256_step35 Text Generation • 7B • Updated 13 days ago • 75
HanningZhang/physlean_ds_prover_noapply_grpo_1e-6_bs256_step35 Text Generation • 7B • Updated 15 days ago • 14
HanningZhang/physlean_ds_prover_noapply_grpo_1e-6_bs256_step35 Text Generation • 7B • Updated 15 days ago • 14
HanningZhang/physlean_ds_prover_grpo_1e-6_bs256_step90 Text Generation • 7B • Updated 16 days ago • 24
HanningZhang/physlean_ds_prover_grpo_1e-6_bs256_step90 Text Generation • 7B • Updated 16 days ago • 24
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_one_sample_ep1 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_one_sample_ep2 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_one_sample_ep2 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_one_sample_ep1 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_all_ep2 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_all_ep2 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_all_ep1 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_claude_and_grok_deepseek_all_ep1 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_grok_deepseek_one_sample_ep2 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_grok_deepseek_one_sample_ep2 Text Generation • 8B • Updated Dec 14, 2025
HanningZhang/physicslean_kimina_train_gen_from_grok_deepseek_one_sample_ep1 Text Generation • 8B • Updated Dec 14, 2025