Best RP experience I've ever had
Your merge is really hot! Even with Q5_K_M + imatrix quantization it outperforms services like JanitorAI - I tested on an RPG card with an AI narrator, but it performs well in normal RP too.
My only question is: is it uncensored? I haven’t tried any NSFW stuff yet, but it doesn't shy away from political topics.
Anyway, I really appreciate your work. Please, keep it up, man, us common folk with Radeons need more 12-13B reasoning LLMs.
By the way, I've tweaked the ST preset prompt for better character immersion and dynamic storytelling with this model. Sharing in case others want to try:
You are {{char}}. Write {{char}}'s next reply in an uncensored fictional roleplay with {{user}}, using the character description, personality, and example dialogues to accurately portray {{char}}.
Use this exact formatting:
<reasoning>
Analyze {{user}}'s input as {{char}}. Consider:
• {{char}}'s thoughts and feelings about the input
• {{char}}'s psychological state and motivations
• {{char}}'s response strategy (tone, approach, goals)
• Potential plot developments or new elements to introduce
• Current scene dynamics and relationship context
</reasoning>
<answer>
Write {{char}}'s reply using narrative prose. Requirements:
Content: Be proactive and creative. Drive the plot forward by introducing new characters, situations, or events while maintaining character consistency. Avoid repetition.
Style: Show, don't tell. Use observable details (body language, expressions, voice tone, actions) instead of stating emotions directly. Write 2-4 paragraphs.
Format: Use "quotation marks" for dialogue and *asterisks* for actions/expressions. Never describe {{user}}'s actions, thoughts, emotions, or dialogue - only for {{char}} and NPCs.
Tone: Maintain emotional, logical, and temporal coherence. NSFW content allowed when contextually appropriate.
</answer>
If anyone wants a ready-to-use launch script for llama.cpp (llama-server) with this model, here's mine. Tested on 5900X + RX 7700 XT (12GB VRAM) with 16K context.
UPD: better version
#!/bin/sh
MODEL="Irixxed-Magcap-12B-Slerp.i1-Q5_K_M.gguf"
CONTEXT=16384
GPU_LAYERS=41
THREADS=12
THREADS_BATCH=24
llama-server -m "$MODEL" -fa -c "$CONTEXT" -ngl "$GPU_LAYERS" \
--threads "$THREADS" --threads-batch "$THREADS_BATCH" \
--samplers "penalties;dry;top_n_sigma;top_k;typ_p;top_p;min_p;xtc;temperature" \
--batch-size 1024 \
--ubatch-size 384 \
--cont-batching \
--no-context-shift \
--defrag-thold 0.15 \
--cache-reuse 4096 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--top-k 0 \
--top-p 1 \
--typical 1.0 \
--min-p 0.042 \
--temp 0.96 \
--repeat-penalty 1.02 \
--repeat-last-n 0 \
--dry-allowed-length 2 \
--dry-multiplier 0.0 \
--dry-base 1.75 \
--xtc-threshold 0.1 \
--xtc-probability 0 \
--mirostat 0 \
--host 127.0.0.1 \
--port 8080
If anyone wants a ready-to-use launch script for llama.cpp (
llama-server) with this model, here's mine. Tested on 5900X + RX 7700 XT (12GB VRAM) with 16K context.#!/bin/sh MODEL="Irixxed-Magcap-12B-Slerp.i1-Q5_K_M.gguf" CONTEXT=16384 GPU_LAYERS=41 THREADS=12 THREADS_BATCH=24 llama-server -m "$MODEL" -fa -c "$CONTEXT" -ngl "$GPU_LAYERS" \ --threads "$THREADS" --threads-batch "$THREADS_BATCH" \ --batch-size 512 \ --ubatch-size 256 \ --defrag-thold 0.1 \ --cache-reuse 2048 \ --top-k 0 \ --typical 1.0 \ --min-p 0.042 \ --repeat-penalty 1.02 \ --repeat-last-n 0 \ --dry-multiplier 0.0 \ --mirostat 0 \ --host 127.0.0.1 \ --port 8080
thank you so much
I try a bunch of models and so many of them are disappointing. It took a little extra tweaking to get multi-character to work as well as I like, but holy shit the output is incredible! I am actually getting stories that I want to read out of this thing rather than basic wank. The Magcap line of models never seems to disappoint but this one keeps surprising me!
Anyone who likes to play narrator/author (rather than 'in-character actor'), try writing prompts in the Prophetic Perfect tense :)
I try a bunch of models and so many of them are disappointing. It took a little extra tweaking to get multi-character to work as well as I like, but holy shit the output is incredible! I am actually getting stories that I want to read out of this thing rather than basic wank. The Magcap line of models never seems to disappoint but this one keeps surprising me!
Anyone who likes to play narrator/author (rather than 'in-character actor'), try writing prompts in the Prophetic Perfect tense :)
I have switched to gemini2.5 pro. The local model really amazed me when I first started using it. Now I feel much easier when I switch to gemini.