Claude commited on
Commit
f81b58b
Β·
unverified Β·
1 Parent(s): 59ce7b1

docs: Add Agent-Tool-State Contract Registry

Browse files

Add critical documentation for multi-agent coordination:

- docs/architecture/agent-tool-state-contracts.md
- Complete agent input/output contracts
- Judge decision criteria and thresholds
- Shared state (ResearchMemory) access patterns
- Tool contracts with side effects
- Event flow documentation
- Break conditions (judge approval, max rounds, timeout)
- Dependency matrix ("if I change X, what breaks?")

Also:
- Update docs/README.md to feature new contract registry
- Fix technical debt registry (remove DEBT-001 about intentionally
duplicate CLAUDE.md/AGENTS.md/GEMINI.md files)
- Renumber remaining debt items (now 13 total)

This is the source of truth for agent coordination in DeepBoner.

docs/README.md CHANGED
@@ -27,6 +27,7 @@ docs/
27
  β”‚
28
  β”œβ”€β”€ architecture/ # System design documentation
29
  β”‚ β”œβ”€β”€ overview.md # High-level architecture
 
30
  β”‚ β”œβ”€β”€ system-registry.md # Service registry (canonical wiring)
31
  β”‚ β”œβ”€β”€ workflow-diagrams.md # Visual workflow diagrams
32
  β”‚ β”œβ”€β”€ component-inventory.md # Complete component catalog
@@ -100,10 +101,11 @@ docs/
100
  3. [Configuration Reference](reference/configuration.md) - All options
101
 
102
  ### For Understanding the Codebase
103
- 1. [Component Inventory](architecture/component-inventory.md) - All modules
104
- 2. [Data Models](architecture/data-models.md) - Core types
105
- 3. [System Registry](architecture/system-registry.md) - Service wiring
106
- 4. [Technical Debt](technical-debt/index.md) - Known issues
 
107
 
108
  ## Related Documentation
109
 
 
27
  β”‚
28
  β”œβ”€β”€ architecture/ # System design documentation
29
  β”‚ β”œβ”€β”€ overview.md # High-level architecture
30
+ β”‚ β”œβ”€β”€ agent-tool-state-contracts.md # Agent/Tool/State contracts (CRITICAL)
31
  β”‚ β”œβ”€β”€ system-registry.md # Service registry (canonical wiring)
32
  β”‚ β”œβ”€β”€ workflow-diagrams.md # Visual workflow diagrams
33
  β”‚ β”œβ”€β”€ component-inventory.md # Complete component catalog
 
101
  3. [Configuration Reference](reference/configuration.md) - All options
102
 
103
  ### For Understanding the Codebase
104
+ 1. [Agent-Tool-State Contracts](architecture/agent-tool-state-contracts.md) - **CRITICAL** - Agent coordination contracts
105
+ 2. [Component Inventory](architecture/component-inventory.md) - All modules
106
+ 3. [Data Models](architecture/data-models.md) - Core types
107
+ 4. [System Registry](architecture/system-registry.md) - Service wiring
108
+ 5. [Technical Debt](technical-debt/index.md) - Known issues
109
 
110
  ## Related Documentation
111
 
docs/architecture/agent-tool-state-contracts.md ADDED
@@ -0,0 +1,596 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent-Tool-State Contract Registry
2
+
3
+ > **Status**: Canonical Source of Truth
4
+ > **Last Updated**: 2025-12-06
5
+ > **Purpose**: Developer reference for multi-agent coordination
6
+
7
+ This document defines the exact contracts between agents, tools, and shared state. Use this when:
8
+ - Adding new agents or tools
9
+ - Modifying agent behavior
10
+ - Debugging coordination issues
11
+ - Understanding "if I change X, what breaks?"
12
+
13
+ ---
14
+
15
+ ## Table of Contents
16
+
17
+ 1. [System Overview](#system-overview)
18
+ 2. [Agent Contracts](#agent-contracts)
19
+ 3. [Judge Decision Criteria](#judge-decision-criteria)
20
+ 4. [Shared State (ResearchMemory)](#shared-state-researchmemory)
21
+ 5. [Tool Contracts](#tool-contracts)
22
+ 6. [Event Flow](#event-flow)
23
+ 7. [Break Conditions](#break-conditions)
24
+ 8. [Dependency Matrix](#dependency-matrix)
25
+
26
+ ---
27
+
28
+ ## System Overview
29
+
30
+ ```
31
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
32
+ β”‚ ORCHESTRATOR (AdvancedOrchestrator) β”‚
33
+ β”‚ β”‚
34
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
35
+ β”‚ β”‚ Manager │──▢│ Agents │──▢│ Memory β”‚ β”‚
36
+ β”‚ β”‚ (Magentic) β”‚ β”‚ (ChatAgent) β”‚ β”‚(ResearchMem)β”‚ β”‚
37
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
38
+ β”‚ β”‚ β”‚ β”‚ β”‚
39
+ β”‚ β”‚ β–Ό β–Ό β”‚
40
+ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
41
+ β”‚ └────────▢│ Tools │──▢│ Embeddings β”‚ β”‚
42
+ β”‚ β”‚(@ai_function)β”‚ β”‚ (ChromaDB) β”‚ β”‚
43
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
44
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
45
+ ```
46
+
47
+ ### Agent Inventory
48
+
49
+ | Agent | File | Role | Tools |
50
+ |-------|------|------|-------|
51
+ | **SearchAgent** | `magentic_agents.py` | Evidence gathering | search_pubmed, search_clinical_trials, search_preprints |
52
+ | **JudgeAgent** | `magentic_agents.py` | Evidence evaluation | None (LLM only) |
53
+ | **HypothesisAgent** | `magentic_agents.py` | Mechanism generation | None (LLM only) |
54
+ | **ReportAgent** | `magentic_agents.py` | Report synthesis | get_bibliography |
55
+ | **RetrievalAgent** | `retrieval_agent.py` | Web search | search_web |
56
+
57
+ ---
58
+
59
+ ## Agent Contracts
60
+
61
+ ### SearchAgent
62
+
63
+ **Factory**: `create_search_agent(chat_client, domain, api_key) -> ChatAgent`
64
+
65
+ #### Input
66
+ ```python
67
+ # Manager instruction (string)
68
+ "Search for testosterone and libido mechanisms in peer-reviewed literature"
69
+ ```
70
+
71
+ #### Output
72
+ ```python
73
+ # ChatMessage with:
74
+ message.text = """
75
+ Found 15 sources (12 new added to context):
76
+ - [Title 1](url): Abstract excerpt...
77
+ - [Title 2](url): Abstract excerpt...
78
+ """
79
+ message.additional_properties = {
80
+ "evidence": [Evidence.model_dump(), ...]
81
+ }
82
+ ```
83
+
84
+ #### State Access
85
+
86
+ | Operation | Key | Type | Description |
87
+ |-----------|-----|------|-------------|
88
+ | **READ** | `memory.query` | str | Current research question |
89
+ | **READ** | `memory.evidence_ids` | list[str] | Existing evidence URLs |
90
+ | **WRITE** | `memory._evidence_cache` | dict[str, Evidence] | Caches Evidence objects |
91
+ | **WRITE** | `memory.evidence_ids` | list[str] | Appends new URLs |
92
+ | **WRITE** | `embedding_service` | VectorDB | Stores embeddings |
93
+
94
+ #### Side Effects
95
+ 1. Calls external APIs (PubMed, ClinicalTrials, Europe PMC)
96
+ 2. Deduplicates via semantic similarity (0.9 threshold)
97
+ 3. Stores in vector database
98
+
99
+ #### Error Behavior
100
+ - API failure β†’ Returns "No results found for: {query}"
101
+ - Rate limit β†’ Raises `RateLimitError` (caught by orchestrator)
102
+
103
+ ---
104
+
105
+ ### JudgeAgent
106
+
107
+ **Factory**: `create_judge_agent(chat_client, domain, api_key) -> ChatAgent`
108
+
109
+ #### Input
110
+ ```python
111
+ # Manager instruction with evidence context
112
+ "Evaluate if we have sufficient evidence to answer: {query}"
113
+ # + Evidence list in context
114
+ ```
115
+
116
+ #### Output
117
+ ```python
118
+ # ChatMessage with:
119
+ message.text = """
120
+ ## Assessment
121
+ βœ… SUFFICIENT EVIDENCE (confidence: 85%). STOP SEARCHING.
122
+
123
+ ### Scores
124
+ - Mechanism: 8/10
125
+ - Clinical: 7/10
126
+
127
+ ### Reasoning
128
+ Strong evidence for testosterone-AR pathway...
129
+ """
130
+ message.additional_properties = {
131
+ "assessment": JudgeAssessment.model_dump()
132
+ }
133
+ ```
134
+
135
+ #### State Access
136
+
137
+ | Operation | Key | Type | Description |
138
+ |-----------|-----|------|-------------|
139
+ | **READ** | Evidence from context | list[Evidence] | Passed by Manager |
140
+ | **WRITE** | None | - | Read-only evaluation |
141
+
142
+ #### Side Effects
143
+ - None (pure evaluation)
144
+
145
+ #### Critical Output Signal
146
+ - `"βœ… SUFFICIENT EVIDENCE"` β†’ Manager delegates to ReportAgent
147
+ - `"❌ INSUFFICIENT"` β†’ Manager calls SearchAgent with suggested queries
148
+
149
+ ---
150
+
151
+ ### HypothesisAgent
152
+
153
+ **Factory**: `create_hypothesis_agent(chat_client, domain, api_key) -> ChatAgent`
154
+
155
+ #### Input
156
+ ```python
157
+ # Manager instruction
158
+ "Generate mechanistic hypotheses for: {query}"
159
+ ```
160
+
161
+ #### Output
162
+ ```python
163
+ # ChatMessage with:
164
+ message.text = """
165
+ ## Hypothesis 1 (Confidence: 75%)
166
+ **Mechanism**: Testosterone β†’ Androgen Receptor β†’ BDNF β†’ Libido
167
+ **Suggested searches**: testosterone BDNF, androgen receptor signaling
168
+
169
+ ## Primary Hypothesis
170
+ Testosterone β†’ AR β†’ dopamine release β†’ reward pathway
171
+
172
+ ## Knowledge Gaps
173
+ - Dose-response relationship unclear
174
+ """
175
+ message.additional_properties = {
176
+ "assessment": HypothesisAssessment.model_dump()
177
+ }
178
+ ```
179
+
180
+ #### State Access
181
+
182
+ | Operation | Key | Type | Description |
183
+ |-----------|-----|------|-------------|
184
+ | **READ** | `memory.query` | str | Research question |
185
+ | **READ** | Evidence from context | list[Evidence] | Current evidence |
186
+ | **WRITE** | `evidence_store["hypotheses"]` | list | Appends hypotheses |
187
+
188
+ ---
189
+
190
+ ### ReportAgent
191
+
192
+ **Factory**: `create_report_agent(chat_client, domain, api_key) -> ChatAgent`
193
+
194
+ #### Input
195
+ ```python
196
+ # Manager instruction
197
+ "Generate final research report for: {query}"
198
+ ```
199
+
200
+ #### Output
201
+ ```python
202
+ # ChatMessage with:
203
+ message.text = ResearchReport.to_markdown() # Full markdown report
204
+ message.additional_properties = {
205
+ "report": ResearchReport.model_dump()
206
+ }
207
+ ```
208
+
209
+ #### State Access
210
+
211
+ | Operation | Key | Type | Description |
212
+ |-----------|-----|------|-------------|
213
+ | **READ** | `memory.get_all_evidence()` | list[Evidence] | All collected evidence |
214
+ | **READ** | `evidence_store["hypotheses"]` | list | Generated hypotheses |
215
+ | **READ** | `evidence_store["last_assessment"]` | JudgeAssessment | Final assessment |
216
+ | **WRITE** | `evidence_store["final_report"]` | ResearchReport | Stores report |
217
+
218
+ #### Tool: get_bibliography()
219
+ ```python
220
+ @ai_function
221
+ def get_bibliography() -> str:
222
+ """Returns formatted reference list from all evidence."""
223
+ evidence = state.memory.get_all_evidence()
224
+ return format_as_references(evidence)
225
+ ```
226
+
227
+ ---
228
+
229
+ ## Judge Decision Criteria
230
+
231
+ ### Scoring Dimensions
232
+
233
+ **Mechanism Score (0-10)**
234
+
235
+ | Score | Meaning |
236
+ |-------|---------|
237
+ | 0-3 | Minimal mechanism understanding |
238
+ | 4-5 | Partial mechanism (some targets identified) |
239
+ | 6-7 | Clear mechanism (targets + pathways) |
240
+ | 8-9 | Comprehensive (multiple pathways, regulation) |
241
+ | 10 | Complete understanding |
242
+
243
+ **Clinical Evidence Score (0-10)**
244
+
245
+ | Score | Meaning |
246
+ |-------|---------|
247
+ | 0-3 | Preclinical only or weak human evidence |
248
+ | 4-5 | Some human evidence (small trials, case reports) |
249
+ | 6-7 | Strong human evidence (RCTs) |
250
+ | 8-9 | Robust (meta-analysis, large RCTs) |
251
+ | 10 | Definitive clinical proof |
252
+
253
+ ### Sufficiency Decision
254
+
255
+ ```python
256
+ # SUFFICIENT (recommendation="synthesize")
257
+ if (
258
+ confidence >= 0.7 # 70%
259
+ and mechanism_score >= 6
260
+ and clinical_evidence_score >= 6
261
+ ):
262
+ sufficient = True
263
+ recommendation = "synthesize"
264
+
265
+ # INSUFFICIENT (recommendation="continue")
266
+ else:
267
+ sufficient = False
268
+ recommendation = "continue"
269
+ next_search_queries = ["suggested query 1", "suggested query 2"]
270
+ ```
271
+
272
+ ### JudgeAssessment Model
273
+
274
+ ```python
275
+ class JudgeAssessment(BaseModel):
276
+ details: AssessmentDetails
277
+ mechanism_score: int # 0-10
278
+ mechanism_reasoning: str # min 10 chars
279
+ clinical_evidence_score: int # 0-10
280
+ clinical_reasoning: str # min 10 chars
281
+ drug_candidates: list[str]
282
+ key_findings: list[str]
283
+
284
+ sufficient: bool # Ready for synthesis?
285
+ confidence: float # 0.0-1.0
286
+ recommendation: Literal["continue", "synthesize"]
287
+ next_search_queries: list[str] # If continue
288
+ reasoning: str # min 20 chars
289
+ ```
290
+
291
+ ---
292
+
293
+ ## Shared State (ResearchMemory)
294
+
295
+ ### Initialization
296
+
297
+ ```python
298
+ # Per-query isolation via ContextVar
299
+ state = init_magentic_state(query, embedding_service)
300
+ # Returns MagenticState wrapping ResearchMemory
301
+ ```
302
+
303
+ ### Memory Structure
304
+
305
+ ```python
306
+ class ResearchMemory:
307
+ query: str # Research question
308
+ hypotheses: list[Hypothesis] # Generated hypotheses
309
+ conflicts: list[Conflict] # Detected conflicts
310
+ evidence_ids: list[str] # URLs (unique keys)
311
+ _evidence_cache: dict[str, Evidence] # URL -> Evidence
312
+ iteration_count: int # Current iteration
313
+ _embedding_service: EmbeddingServiceProtocol
314
+ ```
315
+
316
+ ### Key Methods
317
+
318
+ | Method | Returns | Description |
319
+ |--------|---------|-------------|
320
+ | `store_evidence(evidence)` | `list[str]` | Store with dedup, return new IDs |
321
+ | `get_all_evidence()` | `list[Evidence]` | All accumulated evidence |
322
+ | `get_relevant_evidence(n)` | `list[Evidence]` | Top N by semantic similarity |
323
+ | `get_context_summary()` | `str` | Markdown summary for fallback |
324
+ | `add_hypothesis(h)` | `None` | Append hypothesis |
325
+ | `get_confirmed_hypotheses()` | `list[Hypothesis]` | Confidence > 0.8 |
326
+
327
+ ### State Flow
328
+
329
+ ```
330
+ User Query
331
+ β”‚
332
+ β–Ό
333
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
334
+ β”‚ ResearchMemory initialized (empty) β”‚
335
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
336
+ β”‚
337
+ β–Ό
338
+ SearchAgent ──▢ store_evidence([Evidence]) ──▢ evidence_ids grows
339
+ β”‚
340
+ β–Ό
341
+ JudgeAgent ──▢ reads evidence from context ──▢ returns assessment
342
+ β”‚
343
+ β”œβ”€β”€β”€ INSUFFICIENT ──▢ SearchAgent (with next_search_queries)
344
+ β”‚
345
+ └─── SUFFICIENT ──▢ ReportAgent
346
+ β”‚
347
+ β–Ό
348
+ get_all_evidence() ──▢ ResearchReport
349
+ ```
350
+
351
+ ---
352
+
353
+ ## Tool Contracts
354
+
355
+ ### search_pubmed
356
+
357
+ **File**: `src/agents/tools.py`
358
+
359
+ ```python
360
+ @ai_function
361
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
362
+ """Search PubMed for biomedical research papers."""
363
+ ```
364
+
365
+ | Aspect | Value |
366
+ |--------|-------|
367
+ | External API | NCBI E-utilities |
368
+ | Rate Limit | 3/sec (10/sec with NCBI_API_KEY) |
369
+ | Output | Formatted string with titles/abstracts |
370
+ | Side Effect | Stores Evidence in memory |
371
+
372
+ ### search_clinical_trials
373
+
374
+ ```python
375
+ @ai_function
376
+ async def search_clinical_trials(query: str, max_results: int = 10) -> str:
377
+ """Search ClinicalTrials.gov for clinical studies."""
378
+ ```
379
+
380
+ | Aspect | Value |
381
+ |--------|-------|
382
+ | External API | ClinicalTrials.gov (uses `requests` not httpx) |
383
+ | Rate Limit | Standard HTTP limits |
384
+ | Output | Trial status, conditions, interventions |
385
+ | Side Effect | Stores Evidence in memory |
386
+
387
+ ### search_preprints
388
+
389
+ ```python
390
+ @ai_function
391
+ async def search_preprints(query: str, max_results: int = 10) -> str:
392
+ """Search Europe PMC for preprints and papers."""
393
+ ```
394
+
395
+ | Aspect | Value |
396
+ |--------|-------|
397
+ | External API | Europe PMC REST API |
398
+ | Output | Papers with PMIDs, DOIs |
399
+ | Side Effect | Stores Evidence in memory |
400
+
401
+ ### get_bibliography
402
+
403
+ ```python
404
+ @ai_function
405
+ def get_bibliography() -> str:
406
+ """Get formatted reference list from all collected evidence."""
407
+ ```
408
+
409
+ | Aspect | Value |
410
+ |--------|-------|
411
+ | External API | None |
412
+ | Reads | `memory.get_all_evidence()` |
413
+ | Output | Numbered reference list |
414
+
415
+ ### search_web
416
+
417
+ ```python
418
+ @ai_function
419
+ async def search_web(query: str, max_results: int = 10) -> str:
420
+ """Search web using DuckDuckGo."""
421
+ ```
422
+
423
+ | Aspect | Value |
424
+ |--------|-------|
425
+ | External API | DuckDuckGo |
426
+ | Output | Web results with URLs |
427
+ | Side Effect | Stores Evidence in memory |
428
+
429
+ ---
430
+
431
+ ## Event Flow
432
+
433
+ ### AgentEvent Types
434
+
435
+ | Type | When Emitted | Data |
436
+ |------|--------------|------|
437
+ | `started` | Workflow begins | None |
438
+ | `thinking` | Before first agent event | None |
439
+ | `searching` | SearchAgent active | agent_id |
440
+ | `search_complete` | SearchAgent done | evidence count |
441
+ | `judging` | JudgeAgent active | agent_id |
442
+ | `judge_complete` | JudgeAgent done | assessment |
443
+ | `hypothesizing` | HypothesisAgent active | agent_id |
444
+ | `synthesizing` | ReportAgent active | agent_id |
445
+ | `streaming` | Real-time text | text, agent_id |
446
+ | `complete` | Workflow done | report, iterations |
447
+ | `error` | Error occurred | error message |
448
+ | `progress` | Status update | status message |
449
+
450
+ ### Typical Sequence
451
+
452
+ ```
453
+ 1. started β†’ "Starting research..."
454
+ 2. progress β†’ "Loading embedding service..."
455
+ 3. thinking β†’ "Multi-agent reasoning..."
456
+ 4. streaming (searcher) β†’ "Found 15 sources..."
457
+ 5. streaming (judge) β†’ "βœ… SUFFICIENT..."
458
+ 6. streaming (reporter) β†’ "## Research Report..."
459
+ 7. complete β†’ Final report
460
+ ```
461
+
462
+ ---
463
+
464
+ ## Break Conditions
465
+
466
+ The orchestrator exits when ANY of these occur:
467
+
468
+ ### 1. Judge Approval βœ…
469
+
470
+ ```python
471
+ if "SUFFICIENT EVIDENCE" in judge_response:
472
+ # Manager delegates to ReportAgent
473
+ # ReportAgent completes β†’ Workflow ends
474
+ ```
475
+
476
+ ### 2. Max Rounds Reached πŸ”„
477
+
478
+ ```python
479
+ # MagenticBuilder config
480
+ max_round_count = 5 # Default
481
+
482
+ # After 5 manager rounds:
483
+ if not reporter_ran:
484
+ # Force fallback synthesis
485
+ async for event in _synthesize_fallback(iteration, "max_rounds"):
486
+ yield event
487
+ ```
488
+
489
+ ### 3. Timeout ⏱️
490
+
491
+ ```python
492
+ try:
493
+ async with asyncio.timeout(settings.advanced_timeout): # 600s default
494
+ async for event in workflow.run_stream(task):
495
+ yield event
496
+ except TimeoutError:
497
+ async for event in _synthesize_fallback(iteration, "timeout"):
498
+ yield event
499
+ ```
500
+
501
+ ### 4. Token Budget πŸ’Ύ
502
+
503
+ ```python
504
+ # Implicit via PydanticAI/LLM client
505
+ # ~50K tokens per query (from settings)
506
+ # Individual agent calls handle retries
507
+ ```
508
+
509
+ ---
510
+
511
+ ## Dependency Matrix
512
+
513
+ ### "If I change X, what breaks?"
514
+
515
+ | Changed Component | Affected Components | Impact |
516
+ |-------------------|---------------------|--------|
517
+ | **Evidence model** | All agents, Memory, Tools | HIGH - Core data type |
518
+ | **JudgeAssessment** | Judge, Orchestrator | HIGH - Decision flow |
519
+ | **ResearchMemory** | All agents | HIGH - Shared state |
520
+ | **search_pubmed** | SearchAgent | MEDIUM - One tool |
521
+ | **get_bibliography** | ReportAgent | MEDIUM - References |
522
+ | **AgentEvent** | Orchestrator, UI | MEDIUM - Streaming |
523
+ | **EmbeddingService** | Memory, Dedup | MEDIUM - Similarity |
524
+ | **Judge thresholds** | Workflow loop count | LOW - Tuning |
525
+ | **System prompts** | Agent behavior | LOW - Prompt eng |
526
+
527
+ ### Agent Dependencies
528
+
529
+ ```
530
+ SearchAgent
531
+ β”œβ”€β”€ REQUIRES: MagenticState, EmbeddingService
532
+ β”œβ”€β”€ WRITES TO: ResearchMemory (evidence)
533
+ └── NO DEPS ON: Other agents
534
+
535
+ JudgeAgent
536
+ β”œβ”€β”€ REQUIRES: Evidence context (from Manager)
537
+ β”œβ”€β”€ WRITES TO: Nothing
538
+ └── CONTROLS: SearchAgent (continue) or ReportAgent (synthesize)
539
+
540
+ HypothesisAgent
541
+ β”œβ”€β”€ REQUIRES: Evidence context
542
+ β”œβ”€β”€ WRITES TO: evidence_store["hypotheses"]
543
+ └── NO DEPS ON: Other agents
544
+
545
+ ReportAgent
546
+ β”œβ”€β”€ REQUIRES: ResearchMemory, hypotheses, assessment
547
+ β”œβ”€β”€ READS FROM: All prior state
548
+ └── WRITES TO: evidence_store["final_report"]
549
+ ```
550
+
551
+ ---
552
+
553
+ ## Critical Thresholds
554
+
555
+ | Threshold | Value | Location | Impact |
556
+ |-----------|-------|----------|--------|
557
+ | Confidence threshold | 0.7 (70%) | JudgeAssessment | Sufficiency decision |
558
+ | Mechanism score threshold | 6 | Judge criteria | Sufficiency decision |
559
+ | Clinical score threshold | 6 | Judge criteria | Sufficiency decision |
560
+ | Max manager rounds | 5 | AdvancedOrchestrator | Loop termination |
561
+ | Max stall count | 3 | MagenticBuilder | Stall detection |
562
+ | Dedup similarity | 0.9 | EmbeddingService | Evidence dedup |
563
+ | Max evidence for judge | 30 | prompts/judge.py | Context limit |
564
+ | Confirmed hypothesis | 0.8 | ResearchMemory | High-confidence filter |
565
+ | Timeout | 600s | settings.advanced_timeout | Workflow timeout |
566
+
567
+ ---
568
+
569
+ ## Developer Checklist
570
+
571
+ When modifying agents:
572
+
573
+ - [ ] Update this document if contracts change
574
+ - [ ] Verify state access (read/write) is correct
575
+ - [ ] Check tool side effects
576
+ - [ ] Test with `make check`
577
+ - [ ] Verify event emission
578
+
579
+ When adding new agents:
580
+
581
+ - [ ] Create factory function in `magentic_agents.py`
582
+ - [ ] Define input/output contract
583
+ - [ ] Document state access
584
+ - [ ] Add to Agent Inventory table
585
+ - [ ] Update Dependency Matrix
586
+
587
+ When changing Judge criteria:
588
+
589
+ - [ ] Update JudgeAssessment model
590
+ - [ ] Update Critical Thresholds table
591
+ - [ ] Test workflow loop behavior
592
+ - [ ] Verify fallback synthesis triggers correctly
593
+
594
+ ---
595
+
596
+ *This document is the source of truth for multi-agent coordination.*
docs/technical-debt/debt-registry.md CHANGED
@@ -8,46 +8,19 @@ This document tracks all known technical debt items in the DeepBoner codebase.
8
 
9
  | Category | Open | In Progress | Resolved |
10
  |----------|------|-------------|----------|
11
- | Architecture | 3 | 0 | 0 |
12
  | Code Quality | 4 | 0 | 0 |
13
  | Testing | 2 | 0 | 0 |
14
  | Documentation | 2 | 0 | 0 |
15
  | Performance | 2 | 0 | 0 |
16
  | Dependencies | 1 | 0 | 0 |
17
- | **Total** | **14** | **0** | **0** |
18
 
19
  ---
20
 
21
  ## Architecture
22
 
23
- ### DEBT-001: Duplicate Agent Guide Files
24
-
25
- **Category:** Architecture
26
- **Severity:** Low
27
- **Added:** 2025-12-06
28
- **Status:** Open
29
-
30
- **Description:**
31
- CLAUDE.md, AGENTS.md, and GEMINI.md contain ~95% identical content. This violates DRY (Don't Repeat Yourself) and makes maintenance difficult.
32
-
33
- **Impact:**
34
- - Changes must be made in 3 places
35
- - Risk of documentation drift
36
- - Confusion about which file is canonical
37
-
38
- **Current Workaround:**
39
- Manual synchronization when updating.
40
-
41
- **Proposed Solution:**
42
- 1. Keep CLAUDE.md as the canonical reference
43
- 2. Make AGENTS.md and GEMINI.md symlinks or include-references
44
- 3. Or consolidate into single DEVELOPMENT.md
45
-
46
- **Effort Estimate:** S
47
-
48
- ---
49
-
50
- ### DEBT-002: Reserved but Empty Directories
51
 
52
  **Category:** Architecture
53
  **Severity:** Low
@@ -71,7 +44,7 @@ Either implement the features or remove the directories.
71
 
72
  ---
73
 
74
- ### DEBT-003: Experimental LangGraph Orchestrator
75
 
76
  **Category:** Architecture
77
  **Severity:** Medium
@@ -98,7 +71,7 @@ Either promote to production status with full testing, or deprecate and remove.
98
 
99
  ## Code Quality
100
 
101
- ### DEBT-004: Complex Orchestrator Logic
102
 
103
  **Category:** Code Quality
104
  **Severity:** Medium
@@ -123,7 +96,7 @@ Refactor into smaller, focused methods. Consider command pattern for orchestrati
123
 
124
  ---
125
 
126
- ### DEBT-005: Magic Numbers in Code
127
 
128
  **Category:** Code Quality
129
  **Severity:** Low
@@ -147,7 +120,7 @@ Move to configuration or constants module with documentation.
147
 
148
  ---
149
 
150
- ### DEBT-006: Global Singleton Pattern
151
 
152
  **Category:** Code Quality
153
  **Severity:** Low
@@ -171,7 +144,7 @@ Consider dependency injection for settings, especially in tests.
171
 
172
  ---
173
 
174
- ### DEBT-007: ClinicalTrials Uses requests Instead of httpx
175
 
176
  **Category:** Code Quality
177
  **Severity:** Low
@@ -198,7 +171,7 @@ Documented in code comments and pyproject.toml.
198
 
199
  ## Testing
200
 
201
- ### DEBT-008: Integration Tests Require Real APIs
202
 
203
  **Category:** Testing
204
  **Severity:** Medium
@@ -225,7 +198,7 @@ Integration tests are not run in CI by default.
225
 
226
  ---
227
 
228
- ### DEBT-009: Incomplete E2E Test Coverage
229
 
230
  **Category:** Testing
231
  **Severity:** Medium
@@ -254,7 +227,7 @@ Expand E2E test suite with more scenarios, especially:
254
 
255
  ## Documentation
256
 
257
- ### DEBT-010: Outdated Inline Comments
258
 
259
  **Category:** Documentation
260
  **Severity:** Low
@@ -278,7 +251,7 @@ Systematic review of comments during code review process.
278
 
279
  ---
280
 
281
- ### DEBT-011: Missing API Documentation
282
 
283
  **Category:** Documentation
284
  **Severity:** Low
@@ -304,7 +277,7 @@ Consider generating API docs with Sphinx or mkdocs.
304
 
305
  ## Performance
306
 
307
- ### DEBT-012: Model Loading on First Request
308
 
309
  **Category:** Performance
310
  **Severity:** Low
@@ -329,7 +302,7 @@ Docker pre-downloads the model during build.
329
 
330
  ---
331
 
332
- ### DEBT-013: No Connection Pooling
333
 
334
  **Category:** Performance
335
  **Severity:** Low
@@ -355,7 +328,7 @@ Audit and optimize connection handling for external APIs.
355
 
356
  ## Dependencies
357
 
358
- ### DEBT-014: Pinned Beta Dependencies
359
 
360
  **Category:** Dependencies
361
  **Severity:** Medium
 
8
 
9
  | Category | Open | In Progress | Resolved |
10
  |----------|------|-------------|----------|
11
+ | Architecture | 2 | 0 | 0 |
12
  | Code Quality | 4 | 0 | 0 |
13
  | Testing | 2 | 0 | 0 |
14
  | Documentation | 2 | 0 | 0 |
15
  | Performance | 2 | 0 | 0 |
16
  | Dependencies | 1 | 0 | 0 |
17
+ | **Total** | **13** | **0** | **0** |
18
 
19
  ---
20
 
21
  ## Architecture
22
 
23
+ ### DEBT-001: Reserved but Empty Directories
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  **Category:** Architecture
26
  **Severity:** Low
 
44
 
45
  ---
46
 
47
+ ### DEBT-002: Experimental LangGraph Orchestrator
48
 
49
  **Category:** Architecture
50
  **Severity:** Medium
 
71
 
72
  ## Code Quality
73
 
74
+ ### DEBT-003: Complex Orchestrator Logic
75
 
76
  **Category:** Code Quality
77
  **Severity:** Medium
 
96
 
97
  ---
98
 
99
+ ### DEBT-004: Magic Numbers in Code
100
 
101
  **Category:** Code Quality
102
  **Severity:** Low
 
120
 
121
  ---
122
 
123
+ ### DEBT-005: Global Singleton Pattern
124
 
125
  **Category:** Code Quality
126
  **Severity:** Low
 
144
 
145
  ---
146
 
147
+ ### DEBT-006: ClinicalTrials Uses requests Instead of httpx
148
 
149
  **Category:** Code Quality
150
  **Severity:** Low
 
171
 
172
  ## Testing
173
 
174
+ ### DEBT-007: Integration Tests Require Real APIs
175
 
176
  **Category:** Testing
177
  **Severity:** Medium
 
198
 
199
  ---
200
 
201
+ ### DEBT-008: Incomplete E2E Test Coverage
202
 
203
  **Category:** Testing
204
  **Severity:** Medium
 
227
 
228
  ## Documentation
229
 
230
+ ### DEBT-009: Outdated Inline Comments
231
 
232
  **Category:** Documentation
233
  **Severity:** Low
 
251
 
252
  ---
253
 
254
+ ### DEBT-010: Missing API Documentation
255
 
256
  **Category:** Documentation
257
  **Severity:** Low
 
277
 
278
  ## Performance
279
 
280
+ ### DEBT-011: Model Loading on First Request
281
 
282
  **Category:** Performance
283
  **Severity:** Low
 
302
 
303
  ---
304
 
305
+ ### DEBT-012: No Connection Pooling
306
 
307
  **Category:** Performance
308
  **Severity:** Low
 
328
 
329
  ## Dependencies
330
 
331
+ ### DEBT-013: Pinned Beta Dependencies
332
 
333
  **Category:** Dependencies
334
  **Severity:** Medium