File size: 6,465 Bytes
5e7604a 5d12635 8bac750 e720905 5e7604a 71665e5 5e7604a 5d12635 5e7604a 8bac750 5e7604a 71665e5 c6e9843 5e7604a 8bac750 5d12635 5e7604a 7baf8ba 5e7604a 8bac750 5e7604a 8bac750 5e7604a 71665e5 0cdf561 5e7604a 71665e5 5d12635 5e7604a 7baf8ba 5e7604a 9c9d382 3b1904c 9c9d382 3b1904c 4b9f54a 9c9d382 3b1904c 4b9f54a 9c9d382 3b1904c 5e7604a 8bac750 71665e5 8bac750 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
**Current Status:** Phases 1-14 COMPLETE (Foundation through Demo Submission).
## Development Commands
```bash
# Install all dependencies (including dev)
make install # or: uv sync --all-extras && uv run pre-commit install
# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
make check
# Individual commands
make test # uv run pytest tests/unit/ -v
make lint # uv run ruff check src tests
make format # uv run ruff format src tests
make typecheck # uv run mypy src
make test-cov # uv run pytest --cov=src --cov-report=term-missing
# Run single test
uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
# Integration tests (real APIs)
uv run pytest -m integration
```
## Architecture
**Pattern**: Search-and-judge loop with multi-tool orchestration.
```text
User Question β Orchestrator
β
Search Loop:
1. Query PubMed, ClinicalTrials.gov, Europe PMC
2. Gather evidence
3. Judge quality ("Do we have enough?")
4. If NO β Refine query, search more
5. If YES β Synthesize findings (+ optional Modal analysis)
β
Research Report with Citations
```
**Key Components**:
- `src/orchestrators/` - Unified orchestrator package
- `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
- `factory.py` - Auto-selects backend based on API key presence
- `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
- `src/clients/` - LLM backend adapters
- `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
- `huggingface.py` - HuggingFace adapter for free tier
- `src/tools/pubmed.py` - PubMed E-utilities search
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
- `src/tools/europepmc.py` - Europe PMC search
- `src/tools/search_handler.py` - Scatter-gather orchestration
- `src/services/embeddings.py` - Local embeddings (sentence-transformers, in-memory)
- `src/services/llamaindex_rag.py` - Premium embeddings (OpenAI, persistent ChromaDB)
- `src/services/embedding_protocol.py` - Protocol interface for embedding services
- `src/services/research_memory.py` - Shared memory layer for research state
- `src/utils/service_loader.py` - Tiered service selection (free vs premium)
- `src/agent_factory/judges.py` - LLM-based evidence assessment
- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
- `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
- `src/utils/models.py` - Evidence, Citation, SearchResult models
- `src/utils/exceptions.py` - Exception hierarchy
- `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
## Configuration
Settings via pydantic-settings from `.env`:
- `LLM_PROVIDER`: "openai" or "huggingface"
- `OPENAI_API_KEY`: LLM keys
- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
- `MAX_ITERATIONS`: 1-50, default 10
- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
## Exception Hierarchy
```text
DeepBonerError (base)
βββ SearchError
β βββ RateLimitError
βββ JudgeError
βββ ConfigurationError
βββ EmbeddingError
```
## Testing
- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
- **Markers**: `unit`, `integration`, `slow`
- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
## LLM Model Defaults (December 2025)
Default models in `src/utils/config.py`:
- **OpenAI:** `gpt-5` - Flagship model
- **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
---
## β οΈ OpenAI API Keys
**If you have a valid OpenAI API key, it will work. Period.**
- BYOK (Bring Your Own Key) auto-detects `sk-...` prefix and routes to OpenAI
- If you get errors, the key is **invalid or expired** - NOT an access tier issue
- **NEVER suggest "access tier" or "upgrade your plan"** - this is not how OpenAI works for API keys
- Valid keys work. Invalid keys don't. That's it.
---
## β οΈ CRITICAL: HuggingFace Free Tier Architecture
**THIS IS IMPORTANT - READ BEFORE CHANGING THE FREE TIER MODEL**
HuggingFace has TWO execution paths for inference:
| Path | Host | Reliability | Model Size |
|------|------|-------------|------------|
| **Native Serverless** | HuggingFace infrastructure | β
High | < 30B params |
| **Inference Providers** | Third-party (Novita, Hyperbolic) | β Unreliable | 70B+ params |
**The Trap:** When you request a large model (70B+) without a paid API key, HuggingFace **silently routes** the request to third-party providers. These providers have:
- 500 Internal Server Errors (Novita - current)
- 401 "Staging Mode" auth failures (Hyperbolic - past)
**The Rule:** Free Tier MUST use models < 30B to stay on native infrastructure.
**Current Safe Models (Dec 2025):**
| Model | Size | Status |
|-------|------|--------|
| `Qwen/Qwen2.5-7B-Instruct` | 7B | β
**DEFAULT** - Native, reliable |
| `mistralai/Mistral-Nemo-Instruct-2407` | 12B | β
Native, reliable |
| `Qwen/Qwen2.5-72B-Instruct` | 72B | β Routed to Novita (500 errors) |
| `meta-llama/Llama-3.1-70B-Instruct` | 70B | β Routed to Hyperbolic (401 errors) |
**See:** `HF_FREE_TIER_ANALYSIS.md` for full analysis.
---
## Git Workflow
- `main`: Production-ready (GitHub)
- `dev`: Development integration (GitHub)
- Remote `origin`: GitHub (source of truth for PRs/code review)
- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
**HuggingFace Spaces Collaboration:**
- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
- GitHub is the source of truth; HuggingFace is for deployment/demo
- Consider using git hooks to prevent accidental pushes to protected branches
|