Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

447

Full-text search

Active filters: rlhf

THU-KEG/WildReward-8B

Text Classification • 8B • Updated 11 days ago • 33 • 3

gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter

Text Generation • Updated Dec 29, 2025 • 7 • 2

dorukardahan/senti-qwen3-8b-dpo

Text Generation • Updated Jan 4

chrisvoncsefalvay/dx-reasoning-qwen2.5-grpo

Text Generation • Updated Jan 6 • 4

akseljoonas/Qwen3-1.7B-DPO-hh-rlhf

Text Generation • 2B • Updated Jan 13 • 44

mayiwen/PaperAudit_Models

vemz/pythia-410m-rloo-imdb

Text Generation • Updated Jan 9 • 3

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-2Bit

0.7B • Updated Jan 12 • 17

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-3Bit

0.9B • Updated Jan 12 • 24

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-4Bit

1B • Updated Jan 12 • 18

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-5Bit

1B • Updated Jan 12 • 14

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-6Bit

7B • Updated Jan 12 • 28

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-8Bit

7B • Updated Jan 12 • 39

alexgusevski/CapybaraHermes-2.5-Mistral-7B-mlx-fp16

7B • Updated Jan 12 • 51

amoeba04/KVL-DPO

Image-Text-to-Text • 15B • Updated Jan 14 • 9 • 1

percyraskova/llm-training

Text Generation • Updated Jan 14

anthonym21/gemma-3-4b-it-slipstream-grpo

4B • Updated 28 days ago • 24

jinn33/kanana-1.5-8b-rlhf

Updated 28 days ago

Sachinkry/qwen3-imdb-reward-0.6b

Text Classification • 0.6B • Updated 24 days ago • 31

dgonier/debate-qwen-32b-iter3-grpoD

Text Generation • 31B • Updated 19 days ago • 6

HowieHwong/ppopt

Text Generation • Updated 9 days ago • 10

kikansha-Tomasu/sft-dpo-sft-qwen-cot-merged

Text Generation • 4B • Updated about 12 hours ago • 5

ragtag1/qwen3-4b-historical-final

Updated 6 days ago

ragtag1/llama32-3b-historical-grpo

Updated 6 days ago

ragtag1/llama32-3b-historical-final

Updated 6 days ago

ragtag1/mistral7b-historical-grpo

Updated 5 days ago

ragtag1/mistral7b-historical-final

Updated 5 days ago