Quantization

by hadadrjt - opened Jun 5, 2025

Are there any plans for quantization, such as 2-bit and 4-bit with Ollama? This could reduce resource usage.

@hadadrjt have you tried to quantize the model by yourself?

Yeah

@hadadrjt how did u do it?

•

@budikomarudin great! I wonder if there is some technical blog post to repro?

@andreaschandra I tried converting it to GGUF and quantizing it using Llama.cpp (Brew).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment