Quantization

#1
by hadadrjt - opened

Are there any plans for quantization, such as 2-bit and 4-bit with Ollama? This could reduce resource usage.

@hadadrjt have you tried to quantize the model by yourself?

@andreaschandra waddap, its me

@hadadrjt how did u do it?

@hadadrjt i'v push 4-bit quantization version here https://ollama.com/csalab/sahabatai2/tags

@budikomarudin great! I wonder if there is some technical blog post to repro?

@andreaschandra I tried converting it to GGUF and quantizing it using Llama.cpp (Brew).

Sign up or log in to comment