MiroThinker-1.7-mini Q3_K_M GGUF

This repository contains a Q3_K_M GGUF quantization of miromind-ai/MiroThinker-1.7-mini.

Files

MiroThinker-1.7-mini.Q3_K_M.gguf

Quantization details

Quantization: Q3_K_M
Quantized with: Unsloth + llama.cpp
GGUF size: 14.71 GB (13.70 GiB)
Source model license: apache-2.0

Recommended use

This quant is aimed at fitting a 16 GB GPU while keeping better quality than lower-bit options. In local testing, it loaded successfully with llama.cpp on an AMD Radeon RX 9060 XT through the Vulkan backend.

llama.cpp example

llama-cli \
  -m MiroThinker-1.7-mini.Q3_K_M.gguf \
  --device Vulkan1 \
  -ngl 999 \
  -c 128

If VRAM is tight on your setup, reduce context length first, then reduce GPU offload.

Notes

This repo contains only the GGUF quantized artifact, not the original safetensors weights.
The upstream MiroThinker model itself can document deeper lineage separately.

Downloads last month: 60

GGUF

Model size

31B params

Architecture

qwen3moe

Hardware compatibility

3-bit

Model tree for kai-os/MiroThinker-1.7-mini-Q3_K_M-GGUF

Base model

Qwen/Qwen3-235B-A22B-Thinking-2507

Finetuned

miromind-ai/MiroThinker-1.7-mini

Quantized

(10)

this model