Please quant SWE-Swiss-32B https://huggingface.co/SWE-Swiss/SWE-Swiss-32B

#1581
by walter-tiliashvili - opened

I was surprised to find that nobody had created a quant of the apparently fantastic SWE-Swiss (self-proclaimed by them...). There is also a "*-SFT" repo, though I can't tell the difference from the files (and not being able to run it locally). I would be grateful if you could quantize one of them.
Thanks for all of your great contributions to making these tools usable by all!

walter-tiliashvili changed discussion title from Please quant SWE-Swiss-32B to Please quant SWE-Swiss-32B https://huggingface.co/SWE-Swiss/SWE-Swiss-32B

While converting the model to a GGUF works it fails to load in llama.cpp like all SWE-Swiss based models due to having modified the original Qwen2ForCausalLM architecture. I unfortunately don't think this can be fixed which is such a shame as I'm swiss. Here the error I'm getting:

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 835, got 771

Thank you very much for trying anyway!

What is especially painful about this not working in llama.cpp is that I liked this model so much that I created an abliterated version of it: I https://huggingface.co/nicoboss/SWE-Swiss-32B-abliterated

I might try patching out https://github.com/ggml-org/llama.cpp/blob/db97837385edfbc772230debbd49e5efae843a71/src/llama-model-loader.cpp#L845 and see what happens but it will almost certainly just crash later as they simply modified the Qwen2ForCausalLM architecture too much for llama.cpp to understand what it is supposed to do with this model. One probably would need to add a dedicated architecture just for SWE-Swiss based models inside llama.cpp.

Sign up or log in to comment