File size: 5,404 Bytes
5e7604a 5d12635 8bac750 e720905 5e7604a 71665e5 5e7604a 5d12635 5e7604a 8bac750 5e7604a 71665e5 7baf8ba 5e7604a 8bac750 5d12635 8bac750 5e7604a 7baf8ba 8bac750 7baf8ba 5e7604a 8bac750 5e7604a 8bac750 5e7604a 71665e5 5e7604a 8bac750 5e7604a 71665e5 5d12635 5e7604a 7baf8ba 5e7604a 3b1904c 0257d2f 4927db5 3b1904c 5e7604a 8bac750 71665e5 8bac750 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
**Current Status:** Phases 1-14 COMPLETE (Foundation through Demo Submission).
## Development Commands
```bash
# Install all dependencies (including dev)
make install # or: uv sync --all-extras && uv run pre-commit install
# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
make check
# Individual commands
make test # uv run pytest tests/unit/ -v
make lint # uv run ruff check src tests
make format # uv run ruff format src tests
make typecheck # uv run mypy src
make test-cov # uv run pytest --cov=src --cov-report=term-missing
# Run single test
uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
# Integration tests (real APIs)
uv run pytest -m integration
```
## Architecture
**Pattern**: Search-and-judge loop with multi-tool orchestration.
```text
User Question β Orchestrator
β
Search Loop:
1. Query PubMed, ClinicalTrials.gov, Europe PMC
2. Gather evidence
3. Judge quality ("Do we have enough?")
4. If NO β Refine query, search more
5. If YES β Synthesize findings (+ optional Modal analysis)
β
Research Report with Citations
```
**Key Components**:
- `src/orchestrators/` - Orchestrator package (simple, advanced, langgraph modes)
- `simple.py` - Main search-and-judge loop
- `advanced.py` - Multi-agent Magentic mode
- `langgraph_orchestrator.py` - LangGraph-based workflow
- `src/tools/pubmed.py` - PubMed E-utilities search
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
- `src/tools/europepmc.py` - Europe PMC search
- `src/tools/code_execution.py` - Modal sandbox execution
- `src/tools/search_handler.py` - Scatter-gather orchestration
- `src/services/embeddings.py` - Local embeddings (sentence-transformers, in-memory)
- `src/services/llamaindex_rag.py` - Premium embeddings (OpenAI, persistent ChromaDB)
- `src/services/embedding_protocol.py` - Protocol interface for embedding services
- `src/services/research_memory.py` - Shared memory layer for research state
- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
- `src/utils/service_loader.py` - Tiered service selection (free vs premium)
- `src/agent_factory/judges.py` - LLM-based evidence assessment
- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
- `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
- `src/utils/models.py` - Evidence, Citation, SearchResult models
- `src/utils/exceptions.py` - Exception hierarchy
- `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
## Configuration
Settings via pydantic-settings from `.env`:
- `LLM_PROVIDER`: "openai" or "anthropic"
- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
- `MAX_ITERATIONS`: 1-50, default 10
- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
## Exception Hierarchy
```text
DeepBonerError (base)
βββ SearchError
β βββ RateLimitError
βββ JudgeError
βββ ConfigurationError
βββ EmbeddingError
```
## Testing
- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
- **Markers**: `unit`, `integration`, `slow`
- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
## LLM Model Defaults (November 2025)
Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (`src/utils/config.py`):
- **OpenAI:** `gpt-5`
- Current flagship model (November 2025). Requires Tier 5 access.
- **Anthropic:** `claude-sonnet-4-5-20250929`
- This is the mid-range Claude 4.5 model, released on September 29, 2025.
- The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
- **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
- This remains the default for the free tier, subject to quota limits.
It is crucial to keep these defaults updated as the LLM landscape evolves.
## Git Workflow
- `main`: Production-ready (GitHub)
- `dev`: Development integration (GitHub)
- Remote `origin`: GitHub (source of truth for PRs/code review)
- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
**HuggingFace Spaces Collaboration:**
- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
- GitHub is the source of truth; HuggingFace is for deployment/demo
- Consider using git hooks to prevent accidental pushes to protected branches
|