arxiv:2510.05592
ZhuofengLi
ZhuofengLi
·
AI & ML interests
Agents, Reasoning LLMs/VLLMs, RL
Organizations
models
17
ZhuofengLi/Qwen3-4B-Instruct-2507-DeepReview-lora-sft-ms-swift-v2
4B
•
Updated
•
17
ZhuofengLi/Qwen3-4B-Instruct-2507-DeepReview-lora-sft-ms-swift-new
4B
•
Updated
•
18
ZhuofengLi/Qwen3-4B-Instruct-2507-DeepReview-lora-sft-ms-swift
4B
•
Updated
•
10
ZhuofengLi/Qwen3-4B-Instruct-2507-DeepReview-lora-sft
4B
•
Updated
•
28
ZhuofengLi/torl-qwen2.5-7b-instruct
8B
•
Updated
•
3
ZhuofengLi/octo-science-qwen2.5-7b-grpo-step-40-v2
2B
•
Updated
•
6
ZhuofengLi/octo-search-qwen2.5-7b-grpo-155-step-v1
8B
•
Updated
•
6
ZhuofengLi/octo-search-qwen2.5-7b-grpo-step-60-v1.5
2B
•
Updated
•
5
ZhuofengLi/tool-n1-multi-turn-reason-lora-sft-1180-step
Text Generation
•
8B
•
Updated
•
5
ZhuofengLi/xlam-reason-lora-sft-1340-step
Text Generation
•
3B
•
Updated
•
8
datasets
17
ZhuofengLi/fineweb_corpus
Viewer
•
Updated
•
14.9M
•
113
ZhuofengLi/fineweb_indexes
Updated
•
3
ZhuofengLi/lambda-sft-code-data-gen-st-debug
Viewer
•
Updated
•
5
•
25
ZhuofengLi/lambda-sft-math-data-gen-st-debug
Viewer
•
Updated
•
5
•
32
ZhuofengLi/deepreview-fast-sft-v2
Viewer
•
Updated
•
13.3k
•
14
ZhuofengLi/ICLR_26
Viewer
•
Updated
•
19.6k
•
35
ZhuofengLi/deepreview-fast-sft
Viewer
•
Updated
•
13.4k
•
38
ZhuofengLi/deepreview-sft
Viewer
•
Updated
•
41.4k
•
20
ZhuofengLi/deepreview-synthesis-sft
Viewer
•
Updated
•
13.4k
•
7
ZhuofengLi/sft_data
Viewer
•
Updated
•
8.4k
•
7