arxiv:2410.15460
MZ
Shahradmz
·
AI & ML interests
LLMs, Graph Learning, Temporal Graph Learning, RL, Continual RL, Optimization
Organizations
models 115
Shahradmz/Qwen2.5-0.5B-Instruct_cppo-reward_REWARD_1
0.5B • Updated
Shahradmz/Qwen2.5-0.5B-Instruct_cppo-reward_REWARD_0
0.5B • Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_CPPO_1
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_CPPO_0
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_PPO_1
Updated
Shahradmz/Qwen2-0.5B-Instruct_continual_data_debug_PPO_0
Updated
Shahradmz/Qwen2-1.5B-Instruct_cppo-reward_REWARD_0
2B • Updated
• 1
Shahradmz/Qwen2-1.5B-Instruct_cppo-reward_REWARD_1
Updated
Shahradmz/Qwen2-0.5B-Reward_debug_mas
Text Classification • 0.5B • Updated
Shahradmz/Qwen2-0.5B-Reward
Updated
datasets 12
Shahradmz/education_qna_hinted_qwen05
Viewer
• Updated
• 1 • 9
Shahradmz/education_qna_hinted
Viewer
• Updated
• 1 • 13
Shahradmz/education_summary_expert
Viewer
• Updated
• 1 • 13
Shahradmz/education_qna_hinted_static
Viewer
• Updated
• 1 • 14
Shahradmz/cppo_continual_dataset_rl_others
Viewer
• Updated
• 75.7k • 5
Shahradmz/cppo_continual_dataset_rl_relationships
Viewer
• Updated
• 93.9k • 7
Shahradmz/cppo_continual_dataset_reward_others
Viewer
• Updated
• 78.5k • 5
Shahradmz/cppo_continual_dataset_reward_relationships
Viewer
• Updated
• 97.4k • 7
Shahradmz/ca_constitution_1
Viewer
• Updated
• 33.7k • 6
Shahradmz/ca_constitution_2
Viewer
• Updated
• 35.8k • 5