Multilingual is ruined
In short, multilingual is ruined. Russian was tested, in particular, and I even started thinking in English. Almost half the words in the response were truncated, the other half were Chinese characters and replaced with English words. So, that's the problem with layer truncation. The second quantum has no problems, Ud2kxl is better. Tested 4ks from Bartovsky
hey @sovetboga , our calibration data mix for pruning is focused on coding + tool calling, so multilingual capabilities might be affected. if you're interested in conversational Russian for this model, you can curate this specific data mix and prune the original GLM4.5-Air model using our code: https://github.com/CerebrasResearch/reap
at least 2 x rtx 6000 pro or 192 gb vram for emi-production grade deploy
4bit version on 1 x rtx 6000 pro
@lazarevich in order to prune it for a specific topic, are you referring to this part in the readme:
DATASET_NAME: (default: theblackcat102/evol-codealpaca-v1)
So basically you would replace that with the dataset on a different topic right?