wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Llama-3.1-8B-Instruct_bw1p6_fw0p4_ema0p999_ep30 Text Generation • 8B • Updated 1 day ago • 220
wgcyeo/ci-feedback_both_ema_Llama-3.1-8B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • 8B • Updated 1 day ago • 116
wgcyeo/ci-feedback_disallowed_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 8B • Updated 2 days ago • 23
wgcyeo/ci-feedback_disallowed_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 7B • Updated 2 days ago • 23
wgcyeo/ci-feedback_disallowed_ema_Llama-3.1-8B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 8B • Updated 3 days ago • 33
wgcyeo/ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • 8B • Updated 3 days ago • 368
wgcyeo/ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30ref Text Generation • 8B • Updated 3 days ago • 311
wgcyeo/ci-grpo_Olmo-3-7B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • 7B • Updated 4 days ago • 168
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-7B-Instruct_bw1p6_fw0p4_ema0p999_ep30 Text Generation • Updated 12 days ago • 11
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-7B-Instruct_bw1p0_fw1p0_ema0p999_ep30 Text Generation • Updated 12 days ago • 12
wgcyeo/ci-feedback_both_ema_plus_interp_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_stw0p3_ep30 Text Generation • Updated 12 days ago • 14
wgcyeo/ci-feedback_allowed_ema_Qwen2.5-7B-Instruct_forward_kl_ema0p999_ep30 Text Generation • Updated 12 days ago • 13
wgcyeo/ci-feedback_disallowed_none_Qwen2.5-7B-Instruct_from_Qwen2.5-7B-Instruct_jsd_b0p8_ep30 Text Generation • Updated 13 days ago • 13
wgcyeo/ci-feedback_disallowed_ema_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • Updated 13 days ago • 12
wgcyeo/ci-feedback_both_none_Qwen2.5-7B-Instruct_from_Qwen2.5-7B-Instruct_jsd_b0p8_ep30 Text Generation • Updated 13 days ago • 14
wgcyeo/ci-feedback_both_interp_Qwen2.5-7B-Instruct_from_Qwen2.5-7B-Instruct_jsd_b0p8_stw0p3_ep30 Text Generation • Updated 13 days ago • 13
wgcyeo/ci-feedback_both_ema_plus_interp_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_stw0p3_ep30 Text Generation • Updated 13 days ago • 14
wgcyeo/ci-feedback_both_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated 13 days ago • 14
wgcyeo/ci-feedback_both_ema_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • Updated 13 days ago • 13