Inference Providers
Active filters: grpo
Reinforcement Learning
• 4B • Updated • 27
• 4
vinhnx90/gemma-3-1b-thinking-v2-Q4_K_M-GGUF
1.0B • Updated • 149
• 4
Pinkstack/Superthoughts-lite-v2-MOE-Llama3.2
Text Generation
• 4B • Updated • 3
• 3
Pinkstack/Superthoughts-lite-v2-MOE-Llama3.2-bf16
Text Generation
• 4B • Updated • 1
lightx2v/Wan2.1-T2V-1.3B-longcat-step1500
Text-to-Video
• Updated • 118
• 7
Crownelius/Poe-8B-GLM5-Opus4.6-Sonnet4.5-Kimi-Grok-Gemini-3-pro-preview-HERETIC
9B • Updated • 660
• 1
Text-to-Image
• Updated • 9
• 1
DATEXIS/DeepICD-R1-Llama-8B
Text Generation
• 8B • Updated • 414
• 1
mradermacher/DeepICD-R1-Llama-8B-GGUF
8B • Updated • 521
• 1
mradermacher/DeepICD-R1-Llama-8B-i1-GGUF
8B • Updated • 3.38k
• 1
Text Classification
• 2B • Updated • 24
• 1
Chun121/Qwen3-4B-RPG-Roleplay-V2
Text Generation
• 4B • Updated • 10.4k
• 43
Text Generation
• 0.1B • Updated • 6
8B • Updated sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B • Updated • 203
• 4
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 5
• 8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 1
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 2
• 1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
• 3B • Updated • 10
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO
Text Generation
• 8B • Updated • 6
• 1
MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured
Text Generation
• 2B • Updated • 5
• 5
mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF
2B • Updated • 105
• 2
hyunw3/qwen-2.5-0.5b-r1-countdown
Text Generation
• 0.5B • Updated hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6
Text Generation
• 0.5B • Updated • 1
mgaimm/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 1