| # Agent-Tool-State Contract Registry | |
| > **Status**: Canonical Source of Truth | |
| > **Last Updated**: 2025-12-06 | |
| > **Purpose**: Developer reference for multi-agent coordination | |
| This document defines the exact contracts between agents, tools, and shared state. Use this when: | |
| - Adding new agents or tools | |
| - Modifying agent behavior | |
| - Debugging coordination issues | |
| - Understanding "if I change X, what breaks?" | |
| --- | |
| ## Table of Contents | |
| 1. [System Overview](#system-overview) | |
| 2. [Agent Contracts](#agent-contracts) | |
| 3. [Judge Decision Criteria](#judge-decision-criteria) | |
| 4. [Shared State (ResearchMemory)](#shared-state-researchmemory) | |
| 5. [Tool Contracts](#tool-contracts) | |
| 6. [Event Flow](#event-flow) | |
| 7. [Break Conditions](#break-conditions) | |
| 8. [Dependency Matrix](#dependency-matrix) | |
| --- | |
| ## System Overview | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ORCHESTRATOR (AdvancedOrchestrator) β | |
| β β | |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β | |
| β β Manager ββββΆβ Agents ββββΆβ Memory β β | |
| β β (Magentic) β β (ChatAgent) β β(ResearchMem)β β | |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β | |
| β β β β β | |
| β β βΌ βΌ β | |
| β β βββββββββββββββ βββββββββββββββ β | |
| β ββββββββββΆβ Tools ββββΆβ Embeddings β β | |
| β β(@ai_function)β β (ChromaDB) β β | |
| β βββββββββββββββ βββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Agent Inventory | |
| | Agent | File | Role | Tools | | |
| |-------|------|------|-------| | |
| | **SearchAgent** | `magentic_agents.py` | Evidence gathering | search_pubmed, search_clinical_trials, search_preprints | | |
| | **JudgeAgent** | `magentic_agents.py` | Evidence evaluation | None (LLM only) | | |
| | **HypothesisAgent** | `magentic_agents.py` | Mechanism generation | None (LLM only) | | |
| | **ReportAgent** | `magentic_agents.py` | Report synthesis | get_bibliography | | |
| | **RetrievalAgent** | `retrieval_agent.py` | Web search | search_web | | |
| > **β οΈ Dead Code Warning:** RetrievalAgent is implemented but NOT wired into `magentic_agents.py`. | |
| > The orchestrator only uses SearchAgent (PubMed, ClinicalTrials, EuropePMC), not web search. | |
| > See GitHub issue #134 for decision to delete or wire in. | |
| --- | |
| ## Agent Contracts | |
| ### SearchAgent | |
| **Factory**: `create_search_agent(chat_client, domain, api_key) -> ChatAgent` | |
| #### Input | |
| ```python | |
| # Manager instruction (string) | |
| "Search for testosterone and libido mechanisms in peer-reviewed literature" | |
| ``` | |
| #### Output | |
| ```python | |
| # ChatMessage with: | |
| message.text = """ | |
| Found 15 sources (12 new added to context): | |
| - [Title 1](url): Abstract excerpt... | |
| - [Title 2](url): Abstract excerpt... | |
| """ | |
| message.additional_properties = { | |
| "evidence": [Evidence.model_dump(), ...] | |
| } | |
| ``` | |
| #### State Access | |
| | Operation | Key | Type | Description | | |
| |-----------|-----|------|-------------| | |
| | **READ** | `memory.query` | str | Current research question | | |
| | **READ** | `memory.evidence_ids` | list[str] | Existing evidence URLs | | |
| | **WRITE** | `memory._evidence_cache` | dict[str, Evidence] | Caches Evidence objects | | |
| | **WRITE** | `memory.evidence_ids` | list[str] | Appends new URLs | | |
| | **WRITE** | `embedding_service` | VectorDB | Stores embeddings | | |
| #### Side Effects | |
| 1. Calls external APIs (PubMed, ClinicalTrials, Europe PMC) | |
| 2. Deduplicates via semantic similarity (0.9 threshold) | |
| 3. Stores in vector database | |
| #### Error Behavior | |
| - API failure β Returns "No results found for: {query}" | |
| - Rate limit β Raises `RateLimitError` (caught by orchestrator) | |
| --- | |
| ### JudgeAgent | |
| **Factory**: `create_judge_agent(chat_client, domain, api_key) -> ChatAgent` | |
| #### Input | |
| ```python | |
| # Manager instruction with evidence context | |
| "Evaluate if we have sufficient evidence to answer: {query}" | |
| # + Evidence list in context | |
| ``` | |
| #### Output | |
| ```python | |
| # ChatMessage with: | |
| message.text = """ | |
| ## Assessment | |
| β SUFFICIENT EVIDENCE (confidence: 85%). STOP SEARCHING. | |
| ### Scores | |
| - Mechanism: 8/10 | |
| - Clinical: 7/10 | |
| ### Reasoning | |
| Strong evidence for testosterone-AR pathway... | |
| """ | |
| message.additional_properties = { | |
| "assessment": JudgeAssessment.model_dump() | |
| } | |
| ``` | |
| #### State Access | |
| | Operation | Key | Type | Description | | |
| |-----------|-----|------|-------------| | |
| | **READ** | Evidence from context | list[Evidence] | Passed by Manager | | |
| | **WRITE** | None | - | Read-only evaluation | | |
| #### Side Effects | |
| - None (pure evaluation) | |
| #### Critical Output Signal | |
| - `"β SUFFICIENT EVIDENCE"` β Manager delegates to ReportAgent | |
| - `"β INSUFFICIENT"` β Manager calls SearchAgent with suggested queries | |
| --- | |
| ### HypothesisAgent | |
| **Factory**: `create_hypothesis_agent(chat_client, domain, api_key) -> ChatAgent` | |
| #### Input | |
| ```python | |
| # Manager instruction | |
| "Generate mechanistic hypotheses for: {query}" | |
| ``` | |
| #### Output | |
| ```python | |
| # ChatMessage with: | |
| message.text = """ | |
| ## Hypothesis 1 (Confidence: 75%) | |
| **Mechanism**: Testosterone β Androgen Receptor β BDNF β Libido | |
| **Suggested searches**: testosterone BDNF, androgen receptor signaling | |
| ## Primary Hypothesis | |
| Testosterone β AR β dopamine release β reward pathway | |
| ## Knowledge Gaps | |
| - Dose-response relationship unclear | |
| """ | |
| message.additional_properties = { | |
| "assessment": HypothesisAssessment.model_dump() | |
| } | |
| ``` | |
| #### State Access | |
| | Operation | Key | Type | Description | | |
| |-----------|-----|------|-------------| | |
| | **READ** | `memory.query` | str | Research question | | |
| | **READ** | Evidence from context | list[Evidence] | Current evidence | | |
| | **WRITE** | `evidence_store["hypotheses"]` | list | Appends hypotheses | | |
| --- | |
| ### ReportAgent | |
| **Factory**: `create_report_agent(chat_client, domain, api_key) -> ChatAgent` | |
| #### Input | |
| ```python | |
| # Manager instruction | |
| "Generate final research report for: {query}" | |
| ``` | |
| #### Output | |
| ```python | |
| # ChatMessage with: | |
| message.text = ResearchReport.to_markdown() # Full markdown report | |
| message.additional_properties = { | |
| "report": ResearchReport.model_dump() | |
| } | |
| ``` | |
| #### State Access | |
| | Operation | Key | Type | Description | | |
| |-----------|-----|------|-------------| | |
| | **READ** | `memory.get_all_evidence()` | list[Evidence] | All collected evidence | | |
| | **READ** | `evidence_store["hypotheses"]` | list | Generated hypotheses | | |
| | **READ** | `evidence_store["last_assessment"]` | JudgeAssessment | Final assessment | | |
| | **WRITE** | `evidence_store["final_report"]` | ResearchReport | Stores report | | |
| #### Tool: get_bibliography() | |
| ```python | |
| @ai_function | |
| def get_bibliography() -> str: | |
| """Returns formatted reference list from all evidence.""" | |
| evidence = state.memory.get_all_evidence() | |
| return format_as_references(evidence) | |
| ``` | |
| --- | |
| ## Judge Decision Criteria | |
| ### Scoring Dimensions | |
| **Mechanism Score (0-10)** | |
| | Score | Meaning | | |
| |-------|---------| | |
| | 0-3 | Minimal mechanism understanding | | |
| | 4-5 | Partial mechanism (some targets identified) | | |
| | 6-7 | Clear mechanism (targets + pathways) | | |
| | 8-9 | Comprehensive (multiple pathways, regulation) | | |
| | 10 | Complete understanding | | |
| **Clinical Evidence Score (0-10)** | |
| | Score | Meaning | | |
| |-------|---------| | |
| | 0-3 | Preclinical only or weak human evidence | | |
| | 4-5 | Some human evidence (small trials, case reports) | | |
| | 6-7 | Strong human evidence (RCTs) | | |
| | 8-9 | Robust (meta-analysis, large RCTs) | | |
| | 10 | Definitive clinical proof | | |
| ### Sufficiency Decision | |
| ```python | |
| # SUFFICIENT (recommendation="synthesize") | |
| if ( | |
| confidence >= 0.7 # 70% | |
| and mechanism_score >= 6 | |
| and clinical_evidence_score >= 6 | |
| ): | |
| sufficient = True | |
| recommendation = "synthesize" | |
| # INSUFFICIENT (recommendation="continue") | |
| else: | |
| sufficient = False | |
| recommendation = "continue" | |
| next_search_queries = ["suggested query 1", "suggested query 2"] | |
| ``` | |
| ### JudgeAssessment Model | |
| ```python | |
| class JudgeAssessment(BaseModel): | |
| details: AssessmentDetails | |
| mechanism_score: int # 0-10 | |
| mechanism_reasoning: str # min 10 chars | |
| clinical_evidence_score: int # 0-10 | |
| clinical_reasoning: str # min 10 chars | |
| drug_candidates: list[str] | |
| key_findings: list[str] | |
| sufficient: bool # Ready for synthesis? | |
| confidence: float # 0.0-1.0 | |
| recommendation: Literal["continue", "synthesize"] | |
| next_search_queries: list[str] # If continue | |
| reasoning: str # min 20 chars | |
| ``` | |
| --- | |
| ## Shared State (ResearchMemory) | |
| ### Initialization | |
| ```python | |
| # Per-query isolation via ContextVar | |
| state = init_magentic_state(query, embedding_service) | |
| # Returns MagenticState wrapping ResearchMemory | |
| ``` | |
| ### Memory Structure | |
| ```python | |
| class ResearchMemory: | |
| query: str # Research question | |
| hypotheses: list[Hypothesis] # Generated hypotheses | |
| conflicts: list[Conflict] # Detected conflicts | |
| evidence_ids: list[str] # URLs (unique keys) | |
| _evidence_cache: dict[str, Evidence] # URL -> Evidence | |
| iteration_count: int # Current iteration | |
| _embedding_service: EmbeddingServiceProtocol | |
| ``` | |
| ### Key Methods | |
| | Method | Returns | Description | | |
| |--------|---------|-------------| | |
| | `store_evidence(evidence)` | `list[str]` | Store with dedup, return new IDs | | |
| | `get_all_evidence()` | `list[Evidence]` | All accumulated evidence | | |
| | `get_relevant_evidence(n)` | `list[Evidence]` | Top N by semantic similarity | | |
| | `get_context_summary()` | `str` | Markdown summary for fallback | | |
| | `add_hypothesis(h)` | `None` | Append hypothesis | | |
| | `get_confirmed_hypotheses()` | `list[Hypothesis]` | Confidence > 0.8 | | |
| ### State Flow | |
| ``` | |
| User Query | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ResearchMemory initialized (empty) β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| SearchAgent βββΆ store_evidence([Evidence]) βββΆ evidence_ids grows | |
| β | |
| βΌ | |
| JudgeAgent βββΆ reads evidence from context βββΆ returns assessment | |
| β | |
| ββββ INSUFFICIENT βββΆ SearchAgent (with next_search_queries) | |
| β | |
| ββββ SUFFICIENT βββΆ ReportAgent | |
| β | |
| βΌ | |
| get_all_evidence() βββΆ ResearchReport | |
| ``` | |
| --- | |
| ## Tool Contracts | |
| ### search_pubmed | |
| **File**: `src/agents/tools.py` | |
| ```python | |
| @ai_function | |
| async def search_pubmed(query: str, max_results: int = 10) -> str: | |
| """Search PubMed for biomedical research papers.""" | |
| ``` | |
| | Aspect | Value | | |
| |--------|-------| | |
| | External API | NCBI E-utilities | | |
| | Rate Limit | 3/sec (10/sec with NCBI_API_KEY) | | |
| | Output | Formatted string with titles/abstracts | | |
| | Side Effect | Stores Evidence in memory | | |
| ### search_clinical_trials | |
| ```python | |
| @ai_function | |
| async def search_clinical_trials(query: str, max_results: int = 10) -> str: | |
| """Search ClinicalTrials.gov for clinical studies.""" | |
| ``` | |
| | Aspect | Value | | |
| |--------|-------| | |
| | External API | ClinicalTrials.gov (uses `requests` not httpx) | | |
| | Rate Limit | Standard HTTP limits | | |
| | Output | Trial status, conditions, interventions | | |
| | Side Effect | Stores Evidence in memory | | |
| ### search_preprints | |
| ```python | |
| @ai_function | |
| async def search_preprints(query: str, max_results: int = 10) -> str: | |
| """Search Europe PMC for preprints and papers.""" | |
| ``` | |
| | Aspect | Value | | |
| |--------|-------| | |
| | External API | Europe PMC REST API | | |
| | Output | Papers with PMIDs, DOIs | | |
| | Side Effect | Stores Evidence in memory | | |
| ### get_bibliography | |
| ```python | |
| @ai_function | |
| def get_bibliography() -> str: | |
| """Get formatted reference list from all collected evidence.""" | |
| ``` | |
| | Aspect | Value | | |
| |--------|-------| | |
| | External API | None | | |
| | Reads | `memory.get_all_evidence()` | | |
| | Output | Numbered reference list | | |
| ### search_web | |
| ```python | |
| @ai_function | |
| async def search_web(query: str, max_results: int = 10) -> str: | |
| """Search web using DuckDuckGo.""" | |
| ``` | |
| | Aspect | Value | | |
| |--------|-------| | |
| | External API | DuckDuckGo | | |
| | Output | Web results with URLs | | |
| | Side Effect | Stores Evidence in memory | | |
| --- | |
| ## Event Flow | |
| ### AgentEvent Types | |
| | Type | When Emitted | Data | | |
| |------|--------------|------| | |
| | `started` | Workflow begins | None | | |
| | `thinking` | Before first agent event | None | | |
| | `searching` | SearchAgent active | agent_id | | |
| | `search_complete` | SearchAgent done | evidence count | | |
| | `judging` | JudgeAgent active | agent_id | | |
| | `judge_complete` | JudgeAgent done | assessment | | |
| | `hypothesizing` | HypothesisAgent active | agent_id | | |
| | `synthesizing` | ReportAgent active | agent_id | | |
| | `streaming` | Real-time text | text, agent_id | | |
| | `complete` | Workflow done | report, iterations | | |
| | `error` | Error occurred | error message | | |
| | `progress` | Status update | status message | | |
| ### Typical Sequence | |
| ``` | |
| 1. started β "Starting research..." | |
| 2. progress β "Loading embedding service..." | |
| 3. thinking β "Multi-agent reasoning..." | |
| 4. streaming (searcher) β "Found 15 sources..." | |
| 5. streaming (judge) β "β SUFFICIENT..." | |
| 6. streaming (reporter) β "## Research Report..." | |
| 7. complete β Final report | |
| ``` | |
| --- | |
| ## Break Conditions | |
| The orchestrator exits when ANY of these occur: | |
| ### 1. Judge Approval β | |
| ```python | |
| if "SUFFICIENT EVIDENCE" in judge_response: | |
| # Manager delegates to ReportAgent | |
| # ReportAgent completes β Workflow ends | |
| ``` | |
| ### 2. Max Rounds Reached π | |
| ```python | |
| # MagenticBuilder config | |
| max_round_count = 5 # Default | |
| # After 5 manager rounds: | |
| if not reporter_ran: | |
| # Force fallback synthesis | |
| async for event in _synthesize_fallback(iteration, "max_rounds"): | |
| yield event | |
| ``` | |
| ### 3. Timeout β±οΈ | |
| ```python | |
| try: | |
| async with asyncio.timeout(settings.advanced_timeout): # 600s default | |
| async for event in workflow.run_stream(task): | |
| yield event | |
| except TimeoutError: | |
| async for event in _synthesize_fallback(iteration, "timeout"): | |
| yield event | |
| ``` | |
| ### 4. Token Budget πΎ | |
| ```python | |
| # Implicit via PydanticAI/LLM client | |
| # ~50K tokens per query (from settings) | |
| # Individual agent calls handle retries | |
| ``` | |
| --- | |
| ## Dependency Matrix | |
| ### "If I change X, what breaks?" | |
| | Changed Component | Affected Components | Impact | | |
| |-------------------|---------------------|--------| | |
| | **Evidence model** | All agents, Memory, Tools | HIGH - Core data type | | |
| | **JudgeAssessment** | Judge, Orchestrator | HIGH - Decision flow | | |
| | **ResearchMemory** | All agents | HIGH - Shared state | | |
| | **search_pubmed** | SearchAgent | MEDIUM - One tool | | |
| | **get_bibliography** | ReportAgent | MEDIUM - References | | |
| | **AgentEvent** | Orchestrator, UI | MEDIUM - Streaming | | |
| | **EmbeddingService** | Memory, Dedup | MEDIUM - Similarity | | |
| | **Judge thresholds** | Workflow loop count | LOW - Tuning | | |
| | **System prompts** | Agent behavior | LOW - Prompt eng | | |
| ### Agent Dependencies | |
| ``` | |
| SearchAgent | |
| βββ REQUIRES: MagenticState, EmbeddingService | |
| βββ WRITES TO: ResearchMemory (evidence) | |
| βββ NO DEPS ON: Other agents | |
| JudgeAgent | |
| βββ REQUIRES: Evidence context (from Manager) | |
| βββ WRITES TO: Nothing | |
| βββ CONTROLS: SearchAgent (continue) or ReportAgent (synthesize) | |
| HypothesisAgent | |
| βββ REQUIRES: Evidence context | |
| βββ WRITES TO: evidence_store["hypotheses"] | |
| βββ NO DEPS ON: Other agents | |
| ReportAgent | |
| βββ REQUIRES: ResearchMemory, hypotheses, assessment | |
| βββ READS FROM: All prior state | |
| βββ WRITES TO: evidence_store["final_report"] | |
| ``` | |
| --- | |
| ## Critical Thresholds | |
| | Threshold | Value | Location | Impact | | |
| |-----------|-------|----------|--------| | |
| | Confidence threshold | 0.7 (70%) | JudgeAssessment | Sufficiency decision | | |
| | Mechanism score threshold | 6 | Judge criteria | Sufficiency decision | | |
| | Clinical score threshold | 6 | Judge criteria | Sufficiency decision | | |
| | Max manager rounds | 5 | AdvancedOrchestrator | Loop termination | | |
| | Max stall count | 3 | MagenticBuilder | Stall detection | | |
| | Dedup similarity | 0.9 | EmbeddingService | Evidence dedup | | |
| | Max evidence for judge | 30 | prompts/judge.py | Context limit | | |
| | Confirmed hypothesis | 0.8 | ResearchMemory | High-confidence filter | | |
| | Timeout | 600s | settings.advanced_timeout | Workflow timeout | | |
| --- | |
| ## Developer Checklist | |
| When modifying agents: | |
| - [ ] Update this document if contracts change | |
| - [ ] Verify state access (read/write) is correct | |
| - [ ] Check tool side effects | |
| - [ ] Test with `make check` | |
| - [ ] Verify event emission | |
| When adding new agents: | |
| - [ ] Create factory function in `magentic_agents.py` | |
| - [ ] Define input/output contract | |
| - [ ] Document state access | |
| - [ ] Add to Agent Inventory table | |
| - [ ] Update Dependency Matrix | |
| When changing Judge criteria: | |
| - [ ] Update JudgeAssessment model | |
| - [ ] Update Critical Thresholds table | |
| - [ ] Test workflow loop behavior | |
| - [ ] Verify fallback synthesis triggers correctly | |
| --- | |
| *This document is the source of truth for multi-agent coordination.* | |