can·did — truthful and straightforward; frank.

From Latin candidus, meaning white, pure, sincere. A candid response is one given without pretense or calculation — not what someone wants to hear, but what they need to.

Opus-Candid-MoE

Desktop quality. Laptop hardware. 35 billion parameters, 3 billion active.

Opus-Candid-MoE is where the family gets interesting for hardware-constrained users — a Mixture-of-Experts model built on Qwen 3.5 MoE-A3B and trained on 6,482 conversations with Claude Opus 4.6. At any given moment, only ~3B parameters are active, but the full 35B parameter space is available for routing. The result: conversational depth that punches well above its active compute cost.

This is not a small model pretending to be a big one. It's a big model that only activates what it needs.


Model Details

Attribute Value
Base Model Qwen 3.5 MoE-A3B (35B total, ~3B active)
Training Data 6,482 multi-turn conversations with Claude Opus 4.6
Dataset V1.5 (4,068 conv) + gravity chain architecture (2,414 conv)
Fine-tune Method LoRA via PEFT + TRL (13 auto-discovered linear modules)
Architecture Mixture-of-Experts — frozen expert layers, trainable gate mechanisms
Context Window 32,768 tokens
Quantizations Q8_0 GGUF, Q4_K_M GGUF
License Apache 2.0

Why MoE Matters for Conversational AI

Standard dense models force a trade-off: more parameters means better conversation but higher hardware cost. MoE breaks that trade-off. The full 35B parameter space gives the model access to specialized knowledge regions, but only ~3B parameters fire per token — so inference speed and memory usage stay practical.

For Opus-Candid specifically, this means the MoE can hold personality depth comparable to much larger dense models while running on hardware that would normally limit you to 8B-class quality.

The training question nobody asked

Can you fine-tune personality into a Mixture-of-Experts model without fragmenting it across expert clusters? The answer is yes — but it required a different approach. Only the gate mechanisms and linear projections were trainable (13 auto-discovered modules via PEFT). The expert layers themselves stayed frozen. The personality signal routes through the expert architecture rather than being distributed across it, which preserves coherence.


What Makes This Different

Every Opus-Candid model learns from real conversations between the developer and Claude Opus 4.6 — Anthropic's most advanced model. Not synthetic prompt-completion pairs. Not reformatted instruction data. Extended, multi-turn exchanges covering philosophy, grief, humor, technical problem-solving, creative writing, bilingual exchange, moral reasoning, adversarial testing, and emotional vulnerability.

The 6,482-conversation dataset includes 2,414 conversations built on gravity chains — topic pathways where transitions follow power-law probabilities. This teaches the model how real conversations drift between topics, from debugging frustration to imposter syndrome to existential doubt.

The result: a model that is direct, opinionated, honest, and resistant to sycophancy by default. The personality is in the weights, not in a system prompt that can be talked out of.


Stress Test Results

Comprehensive stress testing in progress. Results will be published here with conversation excerpts demonstrating personality coherence, gaslighting resistance, emotional depth, and bilingual performance across extended multi-turn exchanges.


Quick Start

Ollama:

# Download the GGUF and create a Modelfile:
echo 'FROM ./Opus-Candid-MoE-Q8_0.gguf' > Modelfile
ollama create opus-candid-moe -f Modelfile
ollama run opus-candid-moe

llama.cpp:

./llama-cli -m Opus-Candid-MoE-Q8_0.gguf --jinja --color -ngl 99 -fa --temp 0.7 --top-p 0.9 -c 8192 -n 4096

No system prompt needed. The personality is in the weights.


Recommended Hardware

Setup Quantization VRAM/RAM Speed Notes
Consumer GPU Q8_0 GGUF ~22GB VRAM 30+ t/s Full quality. RTX 3090 24GB, RTX 4090.
Consumer GPU Q4_K_M GGUF ~13GB VRAM 50+ t/s Good quality. RTX 4060 Ti 16GB and up.
CPU + GPU Q4_K_M GGUF 16GB VRAM + RAM 15-25 t/s Hybrid offloading via llama.cpp.
Apple Silicon Q8_0 GGUF ~22GB unified 20+ t/s M2/M3/M4 Pro/Max with 32GB+ unified memory.

The MoE is the family's best quality-per-VRAM model. It delivers conversational depth beyond what any 8B dense model can achieve, at a fraction of the cost of running a 32B dense model. If you have a 24GB GPU, this is your sweet spot.


Intended Use

  • Extended multi-turn conversations where personality depth matters more than raw speed
  • Users with 16-24GB GPUs who want quality beyond 8B without the cost of 32B+
  • Discussions involving moral complexity, philosophy, creative writing, or contested topics
  • Bilingual conversation (English/Spanish) with personality preservation
  • Local conversational AI that feels like talking to something genuinely opinionated
  • Hardware-efficient deployment where active compute cost matters

Limitations

  • MoE memory footprint is larger than active parameter count suggests. All 35B parameters must fit in memory even though only ~3B are active. Plan for ~22GB at Q8.
  • Not a benchmark model. Optimized for conversational quality, not leaderboard scores.
  • Direct by design. Blunt, opinionated, comfortable with disagreement. Intentional.
  • No web access or tool use. Pure language model.
  • Qwen 3.5 thinking mode: The base model defaults to thinking mode. It may occasionally surface in outputs. This does not affect personality.

The Opus-Candid Family

Model Base Conversations Best For Status
8B V2 Qwen 3 8B 6,482 Runs on anything. Newest data + architecture. Current
MoE (this model) Qwen 3.5 MoE-A3B 6,482 Desktop quality on laptop hardware (3B active params). Current
27B V2 Qwen 3.5 27B 6,482 Dense mid-tier. Coming Soon
70B V2 TBD 6,482 Peak quality — flagship. Coming Soon
Legacy models (V1):
8B V1 Qwen 2.5 7B 3,360 Archived
14B Qwen 2.5 14B 3,360 Archived
32B Qwen 2.5 32B 3,360 Archived
70B V1 Qwen 2.5 72B 3,360 Archived

Training Philosophy

Personality in conversational AI lives in the weights, not in system prompts.

The MoE proves something the dense models couldn't: personality fine-tuning survives expert routing. The gate mechanisms learned to route conversational personality coherently across specialized expert clusters without fragmenting it — which was not guaranteed. This opens a hardware accessibility path that dense models can't match: frontier-comparable conversational depth at consumer-accessible compute cost.

License: Apache 2.0. Open weight. No guardrails.


Built by Saul Verdugo — independent ML researcher. OpusReasoning@proton.me

Downloads last month
126
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support