Please quant SWE-Swiss-32B https://huggingface.co/SWE-Swiss/SWE-Swiss-32B

#1581

by walter-tiliashvili - opened 3 days ago

3 days ago

•

I was surprised to find that nobody had created a quant of the apparently fantastic SWE-Swiss (self-proclaimed by them...). There is also a "*-SFT" repo, though I can't tell the difference from the files (and not being able to run it locally). I would be grateful if you could quantize one of them.
Thanks for all of your great contributions to making these tools usable by all!

walter-tiliashvili changed discussion title from Please quant SWE-Swiss-32B to Please quant SWE-Swiss-32B https://huggingface.co/SWE-Swiss/SWE-Swiss-32B 3 days ago

nicoboss

1 day ago

While converting the model to a GGUF works it fails to load in llama.cpp like all SWE-Swiss based models due to having modified the original Qwen2ForCausalLM architecture. I unfortunately don't think this can be fixed which is such a shame as I'm swiss. Here the error I'm getting:

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 835, got 771

walter-tiliashvili

1 day ago

Thank you very much for trying anyway!

nicoboss

1 day ago

•

edited 1 day ago

What is especially painful about this not working in llama.cpp is that I liked this model so much that I created an abliterated version of it: I https://huggingface.co/nicoboss/SWE-Swiss-32B-abliterated

I might try patching out https://github.com/ggml-org/llama.cpp/blob/db97837385edfbc772230debbd49e5efae843a71/src/llama-model-loader.cpp#L845 and see what happens but it will almost certainly just crash later as they simply modified the Qwen2ForCausalLM architecture too much for llama.cpp to understand what it is supposed to do with this model. One probably would need to add a dedicated architecture just for SWE-Swiss based models inside llama.cpp.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment