Models

19,776

Full-text search

Active filters: grpo

wanglab/bioreason-pro-rl

Reinforcement Learning • 4B • Updated 3 days ago • 27 • 4

vinhnx90/gemma-3-1b-thinking-v2-Q4_K_M-GGUF

1.0B • Updated Mar 22, 2025 • 149 • 4

Pinkstack/Superthoughts-lite-v2-MOE-Llama3.2

Text Generation • 4B • Updated May 6, 2025 • 3 • 3

Pinkstack/Superthoughts-lite-v2-MOE-Llama3.2-bf16

Text Generation • 4B • Updated May 6, 2025 • 1

lightx2v/Wan2.1-T2V-1.3B-longcat-step1500

Text-to-Video • Updated Feb 11 • 118 • 7

Crownelius/Poe-8B-GLM5-Opus4.6-Sonnet4.5-Kimi-Grok-Gemini-3-pro-preview-HERETIC

9B • Updated 8 days ago • 660 • 1

YangZhou24/RealGRPO

Text-to-Image • Updated 21 days ago • 9 • 1

DATEXIS/DeepICD-R1-Llama-8B

Text Generation • 8B • Updated 7 days ago • 414 • 1

mradermacher/DeepICD-R1-Llama-8B-GGUF

8B • Updated 5 days ago • 521 • 1

mradermacher/DeepICD-R1-Llama-8B-i1-GGUF

8B • Updated 5 days ago • 3.38k • 1

AaryanK/ModelGate

Text Classification • 2B • Updated about 7 hours ago • 24 • 1

Chun121/Qwen3-4B-RPG-Roleplay-V2

Text Generation • 4B • Updated Aug 24, 2025 • 10.4k • 43

onuryozcu/llama

Text Generation • 0.1B • Updated Mar 10, 2025 • 6

amiguel/promptTuning

8B • Updated Feb 16, 2025

sergiopaniego/Qwen2-0.5B-GRPO-test

Updated Oct 3, 2025

Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF

1B • Updated Jan 28, 2025 • 203 • 4

nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora

Updated Jan 28, 2025

sergiopaniego/Qwen2-0.5B-GRPO

Updated Jan 31, 2025

philschmid/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Jan 30, 2025 • 5 • 8

spinech/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Apr 28, 2025 • 1

Dongwei/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 2, 2025 • 2 • 1

yooneo/qwen-0.5b-r1-aha

Updated Jan 31, 2025

yooneo/qwen-1.5b-r1-aha

Updated Jan 31, 2025

spinech/qwen2.5-3b-r1-rearc-stage1

Text Generation • 3B • Updated Apr 28, 2025 • 10

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO

Text Generation • 8B • Updated Feb 3, 2025 • 6 • 1

MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured

Text Generation • 2B • Updated Feb 3, 2025 • 5 • 5

mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF

2B • Updated Feb 3, 2025 • 105 • 2

hyunw3/qwen-2.5-0.5b-r1-countdown

Text Generation • 0.5B • Updated Apr 30, 2025

hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6

Text Generation • 0.5B • Updated Jun 3, 2025 • 1

mgaimm/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Feb 1, 2025 • 1