https://huggingface.co/nightmedia/Qwen3-4B-Element8-Eva-Xiaolong-Heretic

#1723
by nightmedia - opened

Dear Team Radermacher,
I have two creative models, if you could quant them that would be awesome :)

https://huggingface.co/nightmedia/Qwen3-4B-Element8-Eva-Xiaolong-Heretic

https://huggingface.co/nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic

Thank you,
-G

It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3-4B-Element8-Eva-Xiaolong-Heretic-GGUF
https://hf.tst.eu/model#Qwen3-4B-Element8-Eva-Hermes-Heretic-GGUF
for quants to appear.

still waiting for MoEMoEMoE 1T model =)

Working on something :)

just moe all your models lol

There is a movement in that direction, but they need to like each other long enough :)

Meanwhile this one is very good, the "hottest" I got the 30B by itself, without weird models, so if you have the bandwidth to quant it, it would do well. In the Element series I saw at least 10 point of arc increase with every merge, so it is possible to hit 0.6, just need the right combination of smarts :)

https://huggingface.co/nightmedia/Qwen3-30B-A3B-Element7-1M

I have enough bandwidth to download all of your models and MoE them if you tell me how =)

Ah yeah, the how :)

Here is the formula for Element7. I use a nuslerp in 1.6/0.4 with every merge, the proportion being according to how strong is the embed. The DASD is a "beginner" while Element6 comes from SOTA level, so it needs to dominate. 1.5/0.5 works fine when the models are fairly equal in the "brainwave". It really doesn't matter what the model knows, if it aligns cognitively, it will merge.

With every merge, the original Qwen goes away, loses its Qwen-ness so to speak, and becomes, well, an Element :)

None of this can be done without numbers. Everybody claims their model is the best. I have numbers to show that :)

qwen_moe42e7_Qwen3-30B-A3B-Element7.yaml
models:
  - model: Qwen3-30B-A3B-Element6
    parameters:
      weight: 1.6
  - model: Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview
    parameters:
      weight: 0.4
merge_method: nuslerp
tokenizer_source: base
dtype: bfloat16
name: Qwen3-30B-A3B-Element7

hm, what if I merge all your models with weight 1 lol ?
also, what are you using to merge, I completely forgot everything since FATLLAMA-1.7T ...

Believe me, I tried. It doesn't work that way.

Every merge is a fusion of sorts. The emerging model (sic.) needs to establish its own brainstem, so to speak. That's where nuslerp helps.

The first merge is always a multislerp of a few models. Look at that as the "basement". That will be used for scaffolding.

From there on I only do nuslerp of 2 models each. Tried more, doesn't work, metrics go down because of frictions.

Every step of the way I look at arc numbers. All others will follow, or not, doesn't really matter. Some models bring the other numbers down a bit, up a bit, depending on what they contribute. For example, no funny models past stage 3, so no MiroMind, no QwenLong, those were already in the basement, and introducing them late will destabilize the way the model learned about itself. The more you go to the top, add models that bring simple, structured information built on stable bases.

If I want something special, but it's not up to snuff, I merge it first with an Engineer model that is simple enough to want to learn new things, vs fusing what it wants to learn, and with that combination leveled up, I merge it into another Element.

I only use the mlx tools and merge kit

ah, sad...
well, good luck with everything, let me know if you need more quants =)

I noticed the "AttributeError 'list'" on the 30B, removed the "extra_special_tokens " entry from tokenizer_config.json that might be causing the issue.

ok, submitted to redownload =)

Sign up or log in to comment