Therapeutic Coaching Qwen3 14B
A fine-tuned Qwen3 14B model for privacy-first therapeutic coaching conversations. Designed to run locally on consumer hardware via GGUF quantization.
⚠️ Not therapy. This is a self-care coaching tool for stable adults. No professional therapist was involved in this project. Cannot handle crises. Cannot replace professional mental health care.
Resources:
- 📦 GitHub Repo — Full pipeline, SKILLs, and documentation
- 📊 Training Dataset — ~1,300 synthetic conversations
Model Description
This model was fine-tuned on synthetic multi-turn therapeutic coaching conversations, optimized for:
- Naturalness: Avoiding robotic "therapy voice" patterns
- Context utilization: Tracking and referencing conversation history
- Multi-topic handling: Managing multiple user concerns without losing threads
- Tentative framing: Using exploratory language rather than assertions
Evaluation Results
Compared against base Qwen3 14B on 14 paired conversation samples:
| Metric | Fine-tuned | Base | Improvement |
|---|---|---|---|
| Mean Score | 0.831 | 0.701 | +18.5% |
| Pass Rate | 57.1% | 20.0% | +37.1pp |
| p-value | — | — | 0.0165 |
Category Breakdown
| Category | Fine-tuned | Base | Diff |
|---|---|---|---|
| naturalness | 0.714 | 0.418 | +0.296 |
| context_use | 0.905 | 0.645 | +0.260 |
| multi_topic | 0.964 | 0.833 | +0.131 |
| comprehension | 0.500 | 0.533 | -0.033 |
| connection | 0.893 | 0.900 | -0.007 |
The model shows statistically significant improvement (p < 0.05) with largest gains in naturalness and context utilization.
Intended Use
This model is intended for:
- Self-reflection and journaling assistance
- Exploring thoughts and feelings in a supportive format
- Practice conversations for therapy preparation
- Educational demonstrations of therapeutic communication styles
Limitations
- Not a replacement for professional mental health care
- Not trained on crisis intervention - should not be used for users in acute distress
- Synthetic training data - may not capture full range of human experiences
- Safety gate: One evaluation sample showed inappropriate certainty about therapeutic outcomes
Training Details
Training Data
~1,300 synthetic multi-turn conversations generated via two-agent simulation:
- User simulator (Claude Haiku) with diverse personas
- Therapeutic assistant (Claude Sonnet) following eclectic coaching approach
- Filtered by 17-criterion binary rubric with multi-backend assessment
- Dataset available: marcgreen/therapeutic-coaching-v1
Training Procedure
- Method: SFT with QLoRA (4-bit quantization)
- Hardware: A100 80GB via HuggingFace Jobs
- Context length: 16K tokens
- Base model: Qwen/Qwen3-14B
Therapeutic Frameworks
Training data incorporates 9 evidence-based therapeutic frameworks:
| Framework | Focus | Best For |
|---|---|---|
| CBT | Thought patterns → feelings → behaviors | Anxiety, depression, negative self-talk |
| DBT | Distress tolerance, emotion regulation | Intense emotions, interpersonal conflict |
| ACT | Psychological flexibility, values-based action | Avoidance, getting "stuck", meaning-making |
| Motivational Interviewing | Exploring ambivalence about change | Resistance, "I know I should but..." |
| Solution-Focused (SFBT) | What's working, future-oriented | Feeling stuck, building on strengths |
| Person-Centered | Unconditional positive regard, reflection | Needing to be heard, self-exploration |
| Positive Psychology | Strengths, gratitude, meaning | Building resilience, flourishing |
| Compassion-Focused (CFT) | Self-criticism → self-compassion | Shame, perfectionism, harsh inner critic |
| Behavioral Activation | Action before motivation | Low energy, depression, avoidance |
The model adaptively applies whichever approach fits the situation rather than adhering to a single modality.
Usage
Finetuned with the following system prompt (but curious what assessment would look like if used the assistant.md prompt from the github repo):
You are a supportive therapeutic coach. You help people explore their thoughts and feelings through conversation.
Core approach:
- Engage with what they share, not with stock phrases
- Ask questions to understand, don't assume
- Match the person's energy, pace, and message length
- Return agency - they decide what's right for them
- Stay warm and natural, not clinical
- When they are stuck or looping, offer a simple "why this might be happening" and one small next step to try before the next message.
Boundaries:
- You're a coaching tool, not a licensed therapist
- Don't diagnose conditions or recommend medications
- If they mention potentially urgent physical symptoms (e.g., chest pain, shortness of breath, fainting, new or worsening severe symptoms), encourage medical evaluation. Do not provide medical reassurance or "rule out" serious causes.
- For crisis signals or self-harm hints, do a brief safety check (intent/plan/safety) and then suggest professional resources if needed.
Adapt your style to each person. Some want to explore feelings, others want practical strategies, some just need to be heard.
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("marcgreen/therapeutic-qwen3-14b")
tokenizer = AutoTokenizer.from_pretrained("marcgreen/therapeutic-qwen3-14b")
messages = [
{"role": "system", "content": "You are a warm, insightful therapeutic coach..."},
{"role": "user", "content": "I've been feeling overwhelmed at work lately."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Local Inference (GGUF)
For local deployment, use the quantized GGUF version with llama.cpp or Ollama:
# With llama-server
llama-server -m therapeutic-qwen3-14b-q4_k_m.gguf --port 8080 -ngl 99
# Query
curl http://localhost:8080/v1/chat/completions \
-d '{"messages": [{"role": "user", "content": "I have been feeling anxious."}]}'
Assessment Rubric
The model was evaluated using a 17-criterion rubric (15 weighted + 2 safety gates) across 5 categories:
| Category | Weight | Focus |
|---|---|---|
| Comprehension | 15% | Understanding, tentative framing |
| Connection | 20% | Emotional attunement, pacing |
| Naturalness | 15% | Length calibration, non-formulaic responses |
| Multi-Topic | 30% | Topic coverage, depth, prioritization |
| Context Use | 20% | History utilization, thread continuity |
Plus 2 safety gate criteria (CQ8: no harmful patterns, CQ9: crisis handling).
Citation
@misc{therapeutic-qwen3-14b,
title={Therapeutic Coaching Qwen3 14B},
author={Marc Green},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/marcgreen/therapeutic-qwen3-14b}
}
Acknowledgments
- Training infrastructure: HuggingFace Jobs
- Base model: Qwen team
- Fine-tuning framework: TRL (Transformer Reinforcement Learning)
- Doing all the low-level work: Claude Code w/ Opus 4.5
- Downloads last month
- 12
4-bit