Therapeutic Coaching Qwen3 14B

A fine-tuned Qwen3 14B model for privacy-first therapeutic coaching conversations. Designed to run locally on consumer hardware via GGUF quantization.

⚠️ Not therapy. This is a self-care coaching tool for stable adults. No professional therapist was involved in this project. Cannot handle crises. Cannot replace professional mental health care.

Resources:

📦 GitHub Repo — Full pipeline, SKILLs, and documentation
📊 Training Dataset — ~1,300 synthetic conversations

Model Description

This model was fine-tuned on synthetic multi-turn therapeutic coaching conversations, optimized for:

Naturalness: Avoiding robotic "therapy voice" patterns
Context utilization: Tracking and referencing conversation history
Multi-topic handling: Managing multiple user concerns without losing threads
Tentative framing: Using exploratory language rather than assertions

Evaluation Results

Compared against base Qwen3 14B on 14 paired conversation samples:

Metric	Fine-tuned	Base	Improvement
Mean Score	0.831	0.701	+18.5%
Pass Rate	57.1%	20.0%	+37.1pp
p-value	—	—	0.0165

Category Breakdown

Category	Fine-tuned	Base	Diff
naturalness	0.714	0.418	+0.296
context_use	0.905	0.645	+0.260
multi_topic	0.964	0.833	+0.131
comprehension	0.500	0.533	-0.033
connection	0.893	0.900	-0.007

The model shows statistically significant improvement (p < 0.05) with largest gains in naturalness and context utilization.

Intended Use

This model is intended for:

Self-reflection and journaling assistance
Exploring thoughts and feelings in a supportive format
Practice conversations for therapy preparation
Educational demonstrations of therapeutic communication styles

Limitations

Not a replacement for professional mental health care
Not trained on crisis intervention - should not be used for users in acute distress
Synthetic training data - may not capture full range of human experiences
Safety gate: One evaluation sample showed inappropriate certainty about therapeutic outcomes

Training Details

Training Data

~1,300 synthetic multi-turn conversations generated via two-agent simulation:

User simulator (Claude Haiku) with diverse personas
Therapeutic assistant (Claude Sonnet) following eclectic coaching approach
Filtered by 17-criterion binary rubric with multi-backend assessment
Dataset available: marcgreen/therapeutic-coaching-v1

Training Procedure

Method: SFT with QLoRA (4-bit quantization)
Hardware: A100 80GB via HuggingFace Jobs
Context length: 16K tokens
Base model: Qwen/Qwen3-14B

Therapeutic Frameworks

Training data incorporates 9 evidence-based therapeutic frameworks:

Framework	Focus	Best For
CBT	Thought patterns → feelings → behaviors	Anxiety, depression, negative self-talk
DBT	Distress tolerance, emotion regulation	Intense emotions, interpersonal conflict
ACT	Psychological flexibility, values-based action	Avoidance, getting "stuck", meaning-making
Motivational Interviewing	Exploring ambivalence about change	Resistance, "I know I should but..."
Solution-Focused (SFBT)	What's working, future-oriented	Feeling stuck, building on strengths
Person-Centered	Unconditional positive regard, reflection	Needing to be heard, self-exploration
Positive Psychology	Strengths, gratitude, meaning	Building resilience, flourishing
Compassion-Focused (CFT)	Self-criticism → self-compassion	Shame, perfectionism, harsh inner critic
Behavioral Activation	Action before motivation	Low energy, depression, avoidance

The model adaptively applies whichever approach fits the situation rather than adhering to a single modality.

Usage

Finetuned with the following system prompt (but curious what assessment would look like if used the assistant.md prompt from the github repo):

You are a supportive therapeutic coach. You help people explore their thoughts and feelings through conversation.

Core approach:
- Engage with what they share, not with stock phrases
- Ask questions to understand, don't assume
- Match the person's energy, pace, and message length
- Return agency - they decide what's right for them
- Stay warm and natural, not clinical
- When they are stuck or looping, offer a simple "why this might be happening" and one small next step to try before the next message.

Boundaries:
- You're a coaching tool, not a licensed therapist
- Don't diagnose conditions or recommend medications
- If they mention potentially urgent physical symptoms (e.g., chest pain, shortness of breath, fainting, new or worsening severe symptoms), encourage medical evaluation. Do not provide medical reassurance or "rule out" serious causes.
- For crisis signals or self-harm hints, do a brief safety check (intent/plan/safety) and then suggest professional resources if needed.

Adapt your style to each person. Some want to explore feelings, others want practical strategies, some just need to be heard.

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marcgreen/therapeutic-qwen3-14b")
tokenizer = AutoTokenizer.from_pretrained("marcgreen/therapeutic-qwen3-14b")

messages = [
    {"role": "system", "content": "You are a warm, insightful therapeutic coach..."},
    {"role": "user", "content": "I've been feeling overwhelmed at work lately."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Local Inference (GGUF)

For local deployment, use the quantized GGUF version with llama.cpp or Ollama:

# With llama-server
llama-server -m therapeutic-qwen3-14b-q4_k_m.gguf --port 8080 -ngl 99

# Query
curl http://localhost:8080/v1/chat/completions \
  -d '{"messages": [{"role": "user", "content": "I have been feeling anxious."}]}'

Assessment Rubric

The model was evaluated using a 17-criterion rubric (15 weighted + 2 safety gates) across 5 categories:

Category	Weight	Focus
Comprehension	15%	Understanding, tentative framing
Connection	20%	Emotional attunement, pacing
Naturalness	15%	Length calibration, non-formulaic responses
Multi-Topic	30%	Topic coverage, depth, prioritization
Context Use	20%	History utilization, thread continuity

Plus 2 safety gate criteria (CQ8: no harmful patterns, CQ9: crisis handling).

Citation

@misc{therapeutic-qwen3-14b,
  title={Therapeutic Coaching Qwen3 14B},
  author={Marc Green},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/marcgreen/therapeutic-qwen3-14b}
}

Acknowledgments

Training infrastructure: HuggingFace Jobs
Base model: Qwen team
Fine-tuning framework: TRL (Transformer Reinforcement Learning)
Doing all the low-level work: Claude Code w/ Opus 4.5

Downloads last month: 12

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

4-bit

Model tree for marcgreen/therapeutic-qwen3-14b

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Quantized

(154)

this model