MiroThinker-1.7-mini Q3_K_M GGUF

This repository contains a Q3_K_M GGUF quantization of miromind-ai/MiroThinker-1.7-mini.

Files

  • MiroThinker-1.7-mini.Q3_K_M.gguf

Quantization details

  • Quantization: Q3_K_M
  • Quantized with: Unsloth + llama.cpp
  • GGUF size: 14.71 GB (13.70 GiB)
  • Source model license: apache-2.0

Recommended use

This quant is aimed at fitting a 16 GB GPU while keeping better quality than lower-bit options. In local testing, it loaded successfully with llama.cpp on an AMD Radeon RX 9060 XT through the Vulkan backend.

llama.cpp example

llama-cli \
  -m MiroThinker-1.7-mini.Q3_K_M.gguf \
  --device Vulkan1 \
  -ngl 999 \
  -c 128

If VRAM is tight on your setup, reduce context length first, then reduce GPU offload.

Notes

  • This repo contains only the GGUF quantized artifact, not the original safetensors weights.
  • The upstream MiroThinker model itself can document deeper lineage separately.
Downloads last month
60
GGUF
Model size
31B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kai-os/MiroThinker-1.7-mini-Q3_K_M-GGUF

Quantized
(10)
this model