Please quant SWE-Swiss-32B https://huggingface.co/SWE-Swiss/SWE-Swiss-32B
I was surprised to find that nobody had created a quant of the apparently fantastic SWE-Swiss (self-proclaimed by them...). There is also a "*-SFT" repo, though I can't tell the difference from the files (and not being able to run it locally). I would be grateful if you could quantize one of them.
Thanks for all of your great contributions to making these tools usable by all!
While converting the model to a GGUF works it fails to load in llama.cpp like all SWE-Swiss based models due to having modified the original Qwen2ForCausalLM architecture. I unfortunately don't think this can be fixed which is such a shame as I'm swiss. Here the error I'm getting:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 835, got 771
Thank you very much for trying anyway!
What is especially painful about this not working in llama.cpp is that I liked this model so much that I created an abliterated version of it: I https://huggingface.co/nicoboss/SWE-Swiss-32B-abliterated
I might try patching out https://github.com/ggml-org/llama.cpp/blob/db97837385edfbc772230debbd49e5efae843a71/src/llama-model-loader.cpp#L845 and see what happens but it will almost certainly just crash later as they simply modified the Qwen2ForCausalLM architecture too much for llama.cpp to understand what it is supposed to do with this model. One probably would need to add a dedicated architecture just for SWE-Swiss based models inside llama.cpp.