Therapeutic Coaching Qwen3 14B

A fine-tuned Qwen3 14B model for privacy-first therapeutic coaching conversations. Designed to run locally on consumer hardware via GGUF quantization.

⚠️ Not therapy. This is a self-care coaching tool for stable adults. No professional therapist was involved in this project. Cannot handle crises. Cannot replace professional mental health care.

Resources:

Model Description

This model was fine-tuned on synthetic multi-turn therapeutic coaching conversations, optimized for:

  • Naturalness: Avoiding robotic "therapy voice" patterns
  • Context utilization: Tracking and referencing conversation history
  • Multi-topic handling: Managing multiple user concerns without losing threads
  • Tentative framing: Using exploratory language rather than assertions

Evaluation Results

Compared against base Qwen3 14B on 14 paired conversation samples:

Metric Fine-tuned Base Improvement
Mean Score 0.831 0.701 +18.5%
Pass Rate 57.1% 20.0% +37.1pp
p-value 0.0165

Category Breakdown

Category Fine-tuned Base Diff
naturalness 0.714 0.418 +0.296
context_use 0.905 0.645 +0.260
multi_topic 0.964 0.833 +0.131
comprehension 0.500 0.533 -0.033
connection 0.893 0.900 -0.007

The model shows statistically significant improvement (p < 0.05) with largest gains in naturalness and context utilization.

Intended Use

This model is intended for:

  • Self-reflection and journaling assistance
  • Exploring thoughts and feelings in a supportive format
  • Practice conversations for therapy preparation
  • Educational demonstrations of therapeutic communication styles

Limitations

  • Not a replacement for professional mental health care
  • Not trained on crisis intervention - should not be used for users in acute distress
  • Synthetic training data - may not capture full range of human experiences
  • Safety gate: One evaluation sample showed inappropriate certainty about therapeutic outcomes

Training Details

Training Data

~1,300 synthetic multi-turn conversations generated via two-agent simulation:

  • User simulator (Claude Haiku) with diverse personas
  • Therapeutic assistant (Claude Sonnet) following eclectic coaching approach
  • Filtered by 17-criterion binary rubric with multi-backend assessment
  • Dataset available: marcgreen/therapeutic-coaching-v1

Training Procedure

  • Method: SFT with QLoRA (4-bit quantization)
  • Hardware: A100 80GB via HuggingFace Jobs
  • Context length: 16K tokens
  • Base model: Qwen/Qwen3-14B

Therapeutic Frameworks

Training data incorporates 9 evidence-based therapeutic frameworks:

Framework Focus Best For
CBT Thought patterns → feelings → behaviors Anxiety, depression, negative self-talk
DBT Distress tolerance, emotion regulation Intense emotions, interpersonal conflict
ACT Psychological flexibility, values-based action Avoidance, getting "stuck", meaning-making
Motivational Interviewing Exploring ambivalence about change Resistance, "I know I should but..."
Solution-Focused (SFBT) What's working, future-oriented Feeling stuck, building on strengths
Person-Centered Unconditional positive regard, reflection Needing to be heard, self-exploration
Positive Psychology Strengths, gratitude, meaning Building resilience, flourishing
Compassion-Focused (CFT) Self-criticism → self-compassion Shame, perfectionism, harsh inner critic
Behavioral Activation Action before motivation Low energy, depression, avoidance

The model adaptively applies whichever approach fits the situation rather than adhering to a single modality.

Usage

Finetuned with the following system prompt (but curious what assessment would look like if used the assistant.md prompt from the github repo):

You are a supportive therapeutic coach. You help people explore their thoughts and feelings through conversation.

Core approach:
- Engage with what they share, not with stock phrases
- Ask questions to understand, don't assume
- Match the person's energy, pace, and message length
- Return agency - they decide what's right for them
- Stay warm and natural, not clinical
- When they are stuck or looping, offer a simple "why this might be happening" and one small next step to try before the next message.

Boundaries:
- You're a coaching tool, not a licensed therapist
- Don't diagnose conditions or recommend medications
- If they mention potentially urgent physical symptoms (e.g., chest pain, shortness of breath, fainting, new or worsening severe symptoms), encourage medical evaluation. Do not provide medical reassurance or "rule out" serious causes.
- For crisis signals or self-harm hints, do a brief safety check (intent/plan/safety) and then suggest professional resources if needed.

Adapt your style to each person. Some want to explore feelings, others want practical strategies, some just need to be heard.

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marcgreen/therapeutic-qwen3-14b")
tokenizer = AutoTokenizer.from_pretrained("marcgreen/therapeutic-qwen3-14b")

messages = [
    {"role": "system", "content": "You are a warm, insightful therapeutic coach..."},
    {"role": "user", "content": "I've been feeling overwhelmed at work lately."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Local Inference (GGUF)

For local deployment, use the quantized GGUF version with llama.cpp or Ollama:

# With llama-server
llama-server -m therapeutic-qwen3-14b-q4_k_m.gguf --port 8080 -ngl 99

# Query
curl http://localhost:8080/v1/chat/completions \
  -d '{"messages": [{"role": "user", "content": "I have been feeling anxious."}]}'

Assessment Rubric

The model was evaluated using a 17-criterion rubric (15 weighted + 2 safety gates) across 5 categories:

Category Weight Focus
Comprehension 15% Understanding, tentative framing
Connection 20% Emotional attunement, pacing
Naturalness 15% Length calibration, non-formulaic responses
Multi-Topic 30% Topic coverage, depth, prioritization
Context Use 20% History utilization, thread continuity

Plus 2 safety gate criteria (CQ8: no harmful patterns, CQ9: crisis handling).

Citation

@misc{therapeutic-qwen3-14b,
  title={Therapeutic Coaching Qwen3 14B},
  author={Marc Green},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/marcgreen/therapeutic-qwen3-14b}
}

Acknowledgments

  • Training infrastructure: HuggingFace Jobs
  • Base model: Qwen team
  • Fine-tuning framework: TRL (Transformer Reinforcement Learning)
  • Doing all the low-level work: Claude Code w/ Opus 4.5
Downloads last month
12
GGUF
Model size
15B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marcgreen/therapeutic-qwen3-14b

Finetuned
Qwen/Qwen3-14B
Quantized
(154)
this model