You wanted assessment, here it is :D

by SerialKicked - opened Dec 30, 2025

Dec 30, 2025

•

Okay, so it's not beta version model anymore. I haven't tested the CoT variant yet. So I can't compare. Will do if I get to that one.

This one feels like proper Cydonia again. Ngl, your previous 24B Cy/Magi-donia versions were really hit and miss this year. This is a lot more solid. It's still using L7 instruct so I don't have a group chat metric for it, but it passed my usual testing set:

Formalized tests:

Menu driven navigation (it's basically a poor man's ad hoc function calling, meant to gauge logic and guideline following without enforcing grammar-based responses). Passed, contrary to previous versions.
Chat session summary (it's summarizing a chatlog in 2-3 paragraphs, used for long term memory). Passed, like most models. More compliant with rules like "use X paragraphs" or "use 3rd person" than most.
Analyse a chat session with the user, determine goals for next sessions (find topics to talk about for next sessions, basically). Passed and accurate.
Web search and result compilation (read chatlog, find topic model ain't too sure about, write coherent google queries, and compile search result into a coherent whole). Passed with flying colors, contrary to past models.
Integrate content of system messages into the chat without being too obvious about it. Decent at it, not great, but didn't try to hijack unrelated conversation on "faulty" inserts.
Decent level answers at basic Q&A and instruction following tasks (but that's Mistral for you in general)

"Feels" testing:

Very decent understanding of relatively complex situations
Sufficiently uncensored in all the tested scenarios (might refuse something in "out of the blue, first chat message pair", but will comply in long form chat)
Pretty good at impersonating different characters (didn't test as many as I wanted to, been busy)
Really good at picking up relevant information from a whole long ass prompt (1K sys prompt + 3K recall + 20K message pair) to build a relevant response. It ain't as a good as a CoT model for that part, but it beats (most) normal models of that size in that area.
Usual Mistral formatting copy-pasta / needs to switch sampling method every few messages to keep things not to repetitive format-wise. But I don't expect you to fix that typical Mistral behavior.
A bit too eager to use lists where it's not necessary. Not a big deal, and not to a qwen level either, but notable.

I didn't have the time to test it in more long-form scenarios yet, but it's definitely an upgrade so far, imho. I can't really comment on "slop" writing, I know it's a big focus for you guys, but it's really the least of my worries personally. Structure repetition is a lot more annoying to me personally (and you can't eradicate slop, it'll get replaced by another repetition. It serves a linguistic purpose to the model, but that's whole different topic that finetuners refuse to acknowledge anyway).

Overall it's a good middle ground between Pinecone (brain), and PaintedFantasy (RP), with a different and more direct writing style.

Anyway, good job! Happy holidays, new year, xmas, and all that!

Cheers.

Edit: For reference. Tested at Q6_K quant with 20 to 24K context length. Various sampling methods: Formalized is mostly 0.85 temp + 0.05 minP + DRY as a general base, and secondary tests at deterministic 0 everywhere to gauge baseline. "Feels" part used a lot of different common place sampling methods for Mistral models, nothing crazy, no XTC ever. On KoboldCpp (text-completion mode) as a backend and my own (still private) front-end for the tests.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment