fix(p2): Resolve duplicate report content and first-turn timeout bugs (#123)
Browse files## Summary
Fixes two P2 bugs + addresses CodeRabbit review feedback:
**P2 Bug Fixes:**
- **Duplicate Report Content**: Stateful deduplication in `run()` loop via `_handle_final_event()`
- **First Turn Timeout**: Reduced `max_results_per_tool` 10β5, increased `advanced_timeout` 300β600s
**CodeRabbit Fixes:**
- Added missing `@patch` decorator + return type to timeout test
- Removed unreachable dead code from `_process_event()`
- Created P3 tech debt doc for Modal integration removal
All 318 tests pass.
- docs/ARCHITECTURE.md +17 -18
- docs/architecture/HF_FREE_TIER_ANALYSIS.md +45 -0
- docs/bugs/ACTIVE_BUGS.md +14 -1
- docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md +151 -0
- docs/bugs/P2_FIRST_TURN_TIMEOUT.md +160 -0
- docs/bugs/{P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md β archive/P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md} +0 -0
- docs/bugs/{P2_7B_MODEL_GARBAGE_OUTPUT.md β archive/P2_7B_MODEL_GARBAGE_OUTPUT.md} +0 -0
- docs/future-roadmap/P3_MODAL_INTEGRATION_REMOVAL.md +78 -0
- docs/index.md +1 -1
- src/agents/search_agent.py +1 -1
- src/orchestrators/advanced.py +50 -27
- src/utils/config.py +2 -2
- tests/unit/agents/test_search_agent.py +3 -3
- tests/unit/orchestrators/test_advanced_orchestrator.py +3 -3
docs/ARCHITECTURE.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# DeepBoner Architecture
|
| 2 |
|
| 3 |
-
> **Last Updated**: 2025-12-
|
| 4 |
|
| 5 |
---
|
| 6 |
|
|
@@ -15,7 +15,7 @@
|
|
| 15 |
β NO (Free Tier) YES (Paid Tier) β
|
| 16 |
β ββββββββββββββ βββββββββββββββ β
|
| 17 |
β HuggingFace backend OpenAI backend β
|
| 18 |
-
β Qwen 2.5
|
| 19 |
β β
|
| 20 |
β SAME orchestration logic for both β
|
| 21 |
β ONE codebase, different LLM backends β
|
|
@@ -28,12 +28,21 @@
|
|
| 28 |
|
| 29 |
## Current Status
|
| 30 |
|
| 31 |
-
**
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
---
|
| 39 |
|
|
@@ -65,7 +74,7 @@ def get_chat_client():
|
|
| 65 |
| Condition | Backend | Model |
|
| 66 |
|-----------|---------|-------|
|
| 67 |
| User provides OpenAI key | OpenAI | GPT-5 |
|
| 68 |
-
| No API key provided | HuggingFace | Qwen 2.5
|
| 69 |
|
| 70 |
---
|
| 71 |
|
|
@@ -87,16 +96,6 @@ def get_chat_client():
|
|
| 87 |
|
| 88 |
---
|
| 89 |
|
| 90 |
-
## Upstream Blocker
|
| 91 |
-
|
| 92 |
-
**Bug:** Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
|
| 93 |
-
|
| 94 |
-
**Fix:** [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) - waiting to merge.
|
| 95 |
-
|
| 96 |
-
**Tracking:** [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)
|
| 97 |
-
|
| 98 |
-
---
|
| 99 |
-
|
| 100 |
## References
|
| 101 |
|
| 102 |
- [Pydantic AI](https://ai.pydantic.dev/) - Structured outputs framework
|
|
|
|
| 1 |
# DeepBoner Architecture
|
| 2 |
|
| 3 |
+
> **Last Updated**: 2025-12-03
|
| 4 |
|
| 5 |
---
|
| 6 |
|
|
|
|
| 15 |
β NO (Free Tier) YES (Paid Tier) β
|
| 16 |
β ββββββββββββββ βββββββββββββββ β
|
| 17 |
β HuggingFace backend OpenAI backend β
|
| 18 |
+
β Qwen 2.5 7B (free) GPT-5 (paid) β
|
| 19 |
β β
|
| 20 |
β SAME orchestration logic for both β
|
| 21 |
β ONE codebase, different LLM backends β
|
|
|
|
| 28 |
|
| 29 |
## Current Status
|
| 30 |
|
| 31 |
+
**Both tiers are WORKING** as of December 2025.
|
| 32 |
|
| 33 |
+
- **Free Tier**: Uses Accumulator Pattern to bypass upstream repr bug
|
| 34 |
+
- **Paid Tier**: Full OpenAI GPT-5 integration
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## Key Fixes Applied
|
| 39 |
+
|
| 40 |
+
| Issue | Fix | PR |
|
| 41 |
+
|-------|-----|-----|
|
| 42 |
+
| Tool execution failure | Removed premature `__function_invoking_chat_client__` marker | fix/P1-free-tier |
|
| 43 |
+
| Repr garbage in output | Accumulator Pattern bypasses upstream bug | PR #117 |
|
| 44 |
+
| 72B model routing failures | Switched to 7B (native HF infra) | PR #118 |
|
| 45 |
+
| Evidence deduplication | Cross-source dedup by PMID/DOI | PR #122 |
|
| 46 |
|
| 47 |
---
|
| 48 |
|
|
|
|
| 74 |
| Condition | Backend | Model |
|
| 75 |
|-----------|---------|-------|
|
| 76 |
| User provides OpenAI key | OpenAI | GPT-5 |
|
| 77 |
+
| No API key provided | HuggingFace | Qwen 2.5 7B (free) |
|
| 78 |
|
| 79 |
---
|
| 80 |
|
|
|
|
| 96 |
|
| 97 |
---
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
## References
|
| 100 |
|
| 101 |
- [Pydantic AI](https://ai.pydantic.dev/) - Structured outputs framework
|
docs/architecture/HF_FREE_TIER_ANALYSIS.md
CHANGED
|
@@ -64,5 +64,50 @@ For the Unified Chat Client architecture:
|
|
| 64 |
1. **Tier 0 (Free):** Hardcoded to Native Models (Qwen 7B, Mistral Nemo).
|
| 65 |
2. **Tier 1 (BYO Key):** Allow user to select any model (70B+), assuming they provide a key that grants access to premium providers or PRO tier.
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
---
|
| 68 |
*Analysis performed by Gemini CLI Agent, Dec 2, 2025*
|
|
|
|
|
|
| 64 |
1. **Tier 0 (Free):** Hardcoded to Native Models (Qwen 7B, Mistral Nemo).
|
| 65 |
2. **Tier 1 (BYO Key):** Allow user to select any model (70B+), assuming they provide a key that grants access to premium providers or PRO tier.
|
| 66 |
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## 5. Known Content Quality Limitations (7B Models)
|
| 70 |
+
|
| 71 |
+
**Status**: As of December 2025, the Free Tier (Qwen 2.5 7B) produces **working multi-agent orchestration** but with notable content quality limitations.
|
| 72 |
+
|
| 73 |
+
### What Works Well
|
| 74 |
+
- Multi-agent coordination (Manager β Search β Hypothesis β Report)
|
| 75 |
+
- Clean streaming output (no garbage tokens, no raw JSON)
|
| 76 |
+
- Proper agent handoffs and progress tracking
|
| 77 |
+
- Coherent narrative structure
|
| 78 |
+
|
| 79 |
+
### Known Limitations
|
| 80 |
+
|
| 81 |
+
| Issue | Description | Severity |
|
| 82 |
+
|-------|-------------|----------|
|
| 83 |
+
| **Hallucinated Citations** | Model generates plausible-sounding but fake paper titles/authors instead of using actual search results | Medium |
|
| 84 |
+
| **Anatomical Confusion** | May apply male anatomy (e.g., "penile rigidity") to female health queries | High |
|
| 85 |
+
| **Nonsensical Medical Claims** | May generate claims like "prostate cancer risk" in context of female patients | High |
|
| 86 |
+
| **Duplicate Content** | Final reports sometimes contain repeated sections | Low |
|
| 87 |
+
|
| 88 |
+
### Why This Happens
|
| 89 |
+
|
| 90 |
+
7B parameter models have limited:
|
| 91 |
+
- **World knowledge**: Can't reliably recall specific paper titles/authors
|
| 92 |
+
- **Context grounding**: May ignore search results and hallucinate instead
|
| 93 |
+
- **Domain reasoning**: Complex medical topics exceed reasoning capacity
|
| 94 |
+
|
| 95 |
+
### User Guidance
|
| 96 |
+
|
| 97 |
+
**Free Tier is best for:**
|
| 98 |
+
- Understanding the research workflow
|
| 99 |
+
- Getting general topic overviews
|
| 100 |
+
- Testing the system before committing to paid tier
|
| 101 |
+
|
| 102 |
+
**For accurate medical research:**
|
| 103 |
+
- Use Paid Tier (GPT-5) for citation accuracy
|
| 104 |
+
- Always verify citations against actual databases
|
| 105 |
+
- Treat Free Tier output as "draft quality"
|
| 106 |
+
|
| 107 |
+
### Not a Stack Bug
|
| 108 |
+
|
| 109 |
+
These are **model capability limitations**, not bugs in the DeepBoner architecture. The orchestration, streaming, and agent coordination are working correctly.
|
| 110 |
+
|
| 111 |
---
|
| 112 |
*Analysis performed by Gemini CLI Agent, Dec 2, 2025*
|
| 113 |
+
*Content quality section added Dec 3, 2025*
|
docs/bugs/ACTIVE_BUGS.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
-
> Last updated: 2025-12-
|
| 4 |
>
|
| 5 |
> **Note:** Completed bug docs archived to `docs/bugs/archive/`
|
| 6 |
> **See also:** [ARCHITECTURE.md](../ARCHITECTURE.md) for unified architecture plan
|
|
@@ -33,6 +33,17 @@
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
## Resolved Bugs (December 2025)
|
| 37 |
|
| 38 |
All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
|
|
@@ -55,6 +66,8 @@ All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
|
|
| 55 |
- **P1 Simple Mode Removed Breaks Free Tier UX** - FIXED via Accumulator Pattern (PR #117)
|
| 56 |
|
| 57 |
### P2 Bugs (All FIXED)
|
|
|
|
|
|
|
| 58 |
- **P2 7B Model Garbage Output** - SUPERSEDED by P1 Free Tier fix (root cause was premature marker, not model capacity)
|
| 59 |
- **P2 Advanced Mode Cold Start No Feedback** - FIXED, all phases complete
|
| 60 |
- **P2 Architectural BYOK Gaps** - FIXED, end-to-end BYOK support in PR #119
|
|
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
+
> Last updated: 2025-12-04
|
| 4 |
>
|
| 5 |
> **Note:** Completed bug docs archived to `docs/bugs/archive/`
|
| 6 |
> **See also:** [ARCHITECTURE.md](../ARCHITECTURE.md) for unified architecture plan
|
|
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
| 36 |
+
### P3 - Remove Modal Integration
|
| 37 |
+
|
| 38 |
+
**File:** `docs/future-roadmap/P3_MODAL_INTEGRATION_REMOVAL.md`
|
| 39 |
+
**Status:** OPEN - Tech Debt
|
| 40 |
+
|
| 41 |
+
**Problem:** Modal (cloud functions) is integrated in 9 files but was decided against for this project. Creates dead code paths and confusion.
|
| 42 |
+
|
| 43 |
+
**Fix:** Remove all Modal references from codebase (config, services, agents, tools).
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
## Resolved Bugs (December 2025)
|
| 48 |
|
| 49 |
All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
|
|
|
|
| 66 |
- **P1 Simple Mode Removed Breaks Free Tier UX** - FIXED via Accumulator Pattern (PR #117)
|
| 67 |
|
| 68 |
### P2 Bugs (All FIXED)
|
| 69 |
+
- **P2 Duplicate Report Content** - FIXED in PR fix/p2-double-bug-squash, stateful deduplication in `run()` loop
|
| 70 |
+
- **P2 First Turn Timeout** - FIXED in PR fix/p2-double-bug-squash, reduced results per tool (10β5), increased timeout (5β10 min)
|
| 71 |
- **P2 7B Model Garbage Output** - SUPERSEDED by P1 Free Tier fix (root cause was premature marker, not model capacity)
|
| 72 |
- **P2 Advanced Mode Cold Start No Feedback** - FIXED, all phases complete
|
| 73 |
- **P2 Architectural BYOK Gaps** - FIXED, end-to-end BYOK support in PR #119
|
docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P2 Bug: Duplicate Report Content in Output
|
| 2 |
+
|
| 3 |
+
**Date**: 2025-12-03
|
| 4 |
+
**Status**: FIXED (PR fix/p2-double-bug-squash)
|
| 5 |
+
**Severity**: P2 (UX - Duplicate content confuses users)
|
| 6 |
+
**Component**: `src/orchestrators/advanced.py`
|
| 7 |
+
**Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Executive Summary
|
| 12 |
+
|
| 13 |
+
This is a **confirmed stack bug**, NOT a model limitation. The duplicate report appears because:
|
| 14 |
+
|
| 15 |
+
1. Streaming events yield the full report content character-by-character
|
| 16 |
+
2. Final events (`MagenticFinalResultEvent`/`WorkflowOutputEvent`) contain the SAME content
|
| 17 |
+
3. No deduplication exists between streamed content and final event content
|
| 18 |
+
4. Both are appended to the output
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Symptom
|
| 23 |
+
|
| 24 |
+
The final research report appears **twice** in the UI output:
|
| 25 |
+
1. First as streaming content (with `π‘ **STREAMING**:` prefix)
|
| 26 |
+
2. Then again as a complete event (without prefix)
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Root Cause
|
| 31 |
+
|
| 32 |
+
The `_process_event()` method handles final events but has **no access to buffer state**. The buffer was already cleared at line 337 before these events arrive.
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
# Line 337: Buffer cleared
|
| 36 |
+
current_message_buffer = ""
|
| 37 |
+
continue
|
| 38 |
+
|
| 39 |
+
# Line 341: Final events processed WITHOUT buffer context
|
| 40 |
+
agent_event = self._process_event(event, iteration) # No buffer info!
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## The Fix (Consensus: Stateful Orchestrator Logic)
|
| 46 |
+
|
| 47 |
+
**Location**: `src/orchestrators/advanced.py` `run()` method
|
| 48 |
+
|
| 49 |
+
**Strategy**: Handle final events **inline in the run() loop** where buffer state exists. Track streaming volume to decide whether to re-emit content.
|
| 50 |
+
|
| 51 |
+
### Why This Is Correct
|
| 52 |
+
|
| 53 |
+
| Rejected Approach | Why Wrong |
|
| 54 |
+
|-------------------|-----------|
|
| 55 |
+
| UI-side string comparison | Wrong layer, fragile, treats symptom |
|
| 56 |
+
| Stateless `_process_event` fix | No state = can't know if streaming occurred |
|
| 57 |
+
| **Stateful run() loop** | β
Only place with full lifecycle visibility |
|
| 58 |
+
|
| 59 |
+
The `run()` loop is the **single source of truth** for the request lifecycle. It "saw" the content stream out. It must decide whether to re-emit.
|
| 60 |
+
|
| 61 |
+
### Implementation
|
| 62 |
+
|
| 63 |
+
```python
|
| 64 |
+
# In run() method, add tracking variable after line 302:
|
| 65 |
+
last_streamed_length: int = 0
|
| 66 |
+
|
| 67 |
+
# Before clearing buffer at line 337, save its length:
|
| 68 |
+
last_streamed_length = len(current_message_buffer)
|
| 69 |
+
current_message_buffer = ""
|
| 70 |
+
continue
|
| 71 |
+
|
| 72 |
+
# Replace lines 340-345 with inline handling of final events:
|
| 73 |
+
if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
|
| 74 |
+
final_event_received = True
|
| 75 |
+
|
| 76 |
+
# DECISION: Did we stream substantial content?
|
| 77 |
+
if last_streamed_length > 100:
|
| 78 |
+
# YES: Final event is a SIGNAL, not a payload
|
| 79 |
+
yield AgentEvent(
|
| 80 |
+
type="complete",
|
| 81 |
+
message="Research complete.",
|
| 82 |
+
data={"iterations": iteration, "streamed_chars": last_streamed_length},
|
| 83 |
+
iteration=iteration,
|
| 84 |
+
)
|
| 85 |
+
else:
|
| 86 |
+
# NO: Final event must carry the payload (tool-only turn, cache hit)
|
| 87 |
+
if isinstance(event, MagenticFinalResultEvent):
|
| 88 |
+
text = self._extract_text(event.message) if event.message else "No result"
|
| 89 |
+
else: # WorkflowOutputEvent
|
| 90 |
+
text = self._extract_text(event.data) if event.data else "Research complete"
|
| 91 |
+
yield AgentEvent(
|
| 92 |
+
type="complete",
|
| 93 |
+
message=text,
|
| 94 |
+
data={"iterations": iteration},
|
| 95 |
+
iteration=iteration,
|
| 96 |
+
)
|
| 97 |
+
continue
|
| 98 |
+
|
| 99 |
+
# Keep existing fallback for other events:
|
| 100 |
+
agent_event = self._process_event(event, iteration)
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
### Why Threshold of 100 Chars?
|
| 104 |
+
|
| 105 |
+
- `> 0` is too aggressive (might catch single-word streams)
|
| 106 |
+
- `> 500` is too conservative (might miss short but complete responses)
|
| 107 |
+
- `> 100` distinguishes "real content was streamed" from "just status messages"
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
## Edge Cases Handled
|
| 112 |
+
|
| 113 |
+
| Scenario | `last_streamed_length` | Action |
|
| 114 |
+
|----------|------------------------|--------|
|
| 115 |
+
| Normal streaming report | 5000+ | Emit "Research complete." |
|
| 116 |
+
| Tool call, no text | 0 | Emit full content from final event |
|
| 117 |
+
| Very short response | 50 | Emit full content (fallback) |
|
| 118 |
+
| Agent switch mid-stream | Reset on switch | Tracks only final agent |
|
| 119 |
+
|
| 120 |
+
---
|
| 121 |
+
|
| 122 |
+
## Files to Modify
|
| 123 |
+
|
| 124 |
+
| File | Lines | Change |
|
| 125 |
+
|------|-------|--------|
|
| 126 |
+
| `src/orchestrators/advanced.py` | 296-345 | Add `last_streamed_length`, handle final events inline |
|
| 127 |
+
| `src/orchestrators/advanced.py` | 532-552 | Optional: remove dead code from `_process_event()` |
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## Test Plan
|
| 132 |
+
|
| 133 |
+
1. **Happy Path**: Run query, verify report appears ONCE
|
| 134 |
+
2. **Fallback**: Mock tool-only turn (no streaming), verify full content emitted
|
| 135 |
+
3. **Both Tiers**: Test Free Tier and Paid Tier
|
| 136 |
+
|
| 137 |
+
---
|
| 138 |
+
|
| 139 |
+
## Validation
|
| 140 |
+
|
| 141 |
+
This fix was independently validated by two AI agents (Claude and Gemini) analyzing the architecture. Both concluded:
|
| 142 |
+
|
| 143 |
+
> "The Stateful Orchestrator Fix is the correct engineering solution. The 'Source of Truth' is the Orchestrator's runtime state."
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
+
## Related
|
| 148 |
+
|
| 149 |
+
- **Not related to model quality** - This is a stack bug
|
| 150 |
+
- P1 Free Tier fix enabled streaming, exposing this bug
|
| 151 |
+
- SPEC-17 Accumulator Pattern addressed repr bug but created this side effect
|
docs/bugs/P2_FIRST_TURN_TIMEOUT.md
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P2 Bug: First Agent Turn Exceeds Workflow Timeout
|
| 2 |
+
|
| 3 |
+
**Date**: 2025-12-03
|
| 4 |
+
**Status**: FIXED (PR fix/p2-double-bug-squash)
|
| 5 |
+
**Severity**: P2 (UX - Workflow always times out on complex queries)
|
| 6 |
+
**Component**: `src/orchestrators/advanced.py` + `src/agents/search_agent.py`
|
| 7 |
+
**Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Executive Summary
|
| 12 |
+
|
| 13 |
+
The search agent's first turn can exceed the 5-minute workflow timeout, causing:
|
| 14 |
+
1. `iterations=0` at timeout (no agent completed a turn)
|
| 15 |
+
2. `_handle_timeout()` synthesizes from partial evidence
|
| 16 |
+
3. Users get incomplete research results
|
| 17 |
+
|
| 18 |
+
This is a **performance/architecture bug**, not a model issue.
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Symptom
|
| 23 |
+
|
| 24 |
+
```
|
| 25 |
+
[warning] Workflow timed out iterations=0
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
The workflow times out with `iterations=0` - meaning the first agent (search agent) never completed its turn before the 5-minute timeout.
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## Root Cause
|
| 33 |
+
|
| 34 |
+
The search agent's first turn is **extremely expensive**:
|
| 35 |
+
|
| 36 |
+
```
|
| 37 |
+
Search Agent First Turn:
|
| 38 |
+
βββ Manager assigns task
|
| 39 |
+
βββ Search agent starts
|
| 40 |
+
β βββ Calls PubMed search tool (10 results)
|
| 41 |
+
β βββ Calls ClinicalTrials search tool (10 results)
|
| 42 |
+
β βββ Calls EuropePMC search tool (10 results)
|
| 43 |
+
β βββ For EACH result (30 total):
|
| 44 |
+
β βββ Generate embedding (OpenAI API call)
|
| 45 |
+
β βββ Check for duplicates (ChromaDB query)
|
| 46 |
+
β βββ Store in ChromaDB
|
| 47 |
+
β
|
| 48 |
+
β TOTAL: 30 results Γ (embedding + dedup + store) = 90+ API/DB operations
|
| 49 |
+
β
|
| 50 |
+
βββ Agent turn completes (if timeout hasn't fired)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
**The timeout is on the WORKFLOW, not individual agent turns.** A single greedy agent can consume the entire timeout budget.
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Impact
|
| 58 |
+
|
| 59 |
+
| Aspect | Impact |
|
| 60 |
+
|--------|--------|
|
| 61 |
+
| UX | Queries always timeout on first turn |
|
| 62 |
+
| Research quality | Synthesis happens on partial evidence |
|
| 63 |
+
| Confusion | `iterations=0` looks like nothing happened |
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## The Fix (Consensus)
|
| 68 |
+
|
| 69 |
+
**Reduce work per turn + increase timeout budget.**
|
| 70 |
+
|
| 71 |
+
### Implementation
|
| 72 |
+
|
| 73 |
+
**1. Reduce results per tool (immediate)**
|
| 74 |
+
|
| 75 |
+
`src/agents/search_agent.py` line 70:
|
| 76 |
+
```python
|
| 77 |
+
# Change from 10 to 5
|
| 78 |
+
result: SearchResult = await self._handler.execute(query, max_results_per_tool=5)
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
**2. Increase workflow timeout (immediate)**
|
| 82 |
+
|
| 83 |
+
`src/utils/config.py`:
|
| 84 |
+
```python
|
| 85 |
+
advanced_timeout: float = Field(
|
| 86 |
+
default=600.0, # Was 300.0 (5 min), now 10 min
|
| 87 |
+
ge=60.0,
|
| 88 |
+
le=900.0,
|
| 89 |
+
description="Timeout for Advanced mode in seconds",
|
| 90 |
+
)
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
### Why NOT Per-Turn Timeout
|
| 94 |
+
|
| 95 |
+
**DANGER**: The SearchHandler uses `asyncio.gather()`:
|
| 96 |
+
|
| 97 |
+
```python
|
| 98 |
+
# src/tools/search_handler.py line 163-164
|
| 99 |
+
results = await asyncio.gather(*tasks, return_exceptions=True)
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
This is an **all-or-nothing** operation. If you wrap it with `asyncio.timeout()` and the timeout fires, you get **zero results**, not partial results.
|
| 103 |
+
|
| 104 |
+
```python
|
| 105 |
+
# DON'T DO THIS - yields nothing on timeout
|
| 106 |
+
async with asyncio.timeout(60):
|
| 107 |
+
result = await self._handler.execute(query) # Cancelled = zero results
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
Per-turn timeout requires `SearchHandler` to support cancellation with partial results. That's a separate architectural change (see Future Work).
|
| 111 |
+
|
| 112 |
+
---
|
| 113 |
+
|
| 114 |
+
## Future Work (Streaming Evidence Ingestion)
|
| 115 |
+
|
| 116 |
+
For proper fix, `SearchHandler.execute()` should:
|
| 117 |
+
1. Yield results as they arrive (async generator)
|
| 118 |
+
2. Support cancellation with partial results
|
| 119 |
+
3. Allow agent to return "what we have so far" on timeout
|
| 120 |
+
|
| 121 |
+
```python
|
| 122 |
+
# Future architecture
|
| 123 |
+
async def execute_streaming(self, query: str) -> AsyncIterator[Evidence]:
|
| 124 |
+
for tool in self.tools:
|
| 125 |
+
async for evidence in tool.search_streaming(query):
|
| 126 |
+
yield evidence # Can be cancelled at any point
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
This is out of scope for the immediate fix.
|
| 130 |
+
|
| 131 |
+
---
|
| 132 |
+
|
| 133 |
+
## Test Plan
|
| 134 |
+
|
| 135 |
+
1. Run query with 10-minute timeout
|
| 136 |
+
2. Verify first agent turn completes before timeout
|
| 137 |
+
3. Verify `iterations >= 1` at workflow end
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
## Verification Data
|
| 142 |
+
|
| 143 |
+
From diagnostic run:
|
| 144 |
+
```
|
| 145 |
+
=== RAW FRAMEWORK EVENTS ===
|
| 146 |
+
MagenticAgentDeltaEvent: 284
|
| 147 |
+
MagenticOrchestratorMessageEvent: 3
|
| 148 |
+
...
|
| 149 |
+
NO MagenticAgentMessageEvent β Agent never completed a turn!
|
| 150 |
+
|
| 151 |
+
[warning] Workflow timed out iterations=0
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
|
| 156 |
+
## Related
|
| 157 |
+
|
| 158 |
+
- P2 Duplicate Report Bug (separate issue, happens after successful completion)
|
| 159 |
+
- `_handle_timeout()` correctly synthesizes, but with partial evidence
|
| 160 |
+
- Not related to model quality - this is infrastructure/performance
|
docs/bugs/{P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md β archive/P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md}
RENAMED
|
File without changes
|
docs/bugs/{P2_7B_MODEL_GARBAGE_OUTPUT.md β archive/P2_7B_MODEL_GARBAGE_OUTPUT.md}
RENAMED
|
File without changes
|
docs/future-roadmap/P3_MODAL_INTEGRATION_REMOVAL.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P3 Tech Debt: Modal Integration Removal
|
| 2 |
+
|
| 3 |
+
**Date**: 2025-12-04
|
| 4 |
+
**Status**: OPEN - Tech Debt (Future Roadmap)
|
| 5 |
+
**Severity**: P3 (Tech Debt - Not blocking functionality)
|
| 6 |
+
**Component**: Multiple files
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Executive Summary
|
| 11 |
+
|
| 12 |
+
Modal (cloud function execution platform) is integrated throughout the codebase but was decided against for this project. This creates potential confusion and dead code paths that should be cleaned up when time permits.
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## Affected Files
|
| 17 |
+
|
| 18 |
+
The following files contain Modal references:
|
| 19 |
+
|
| 20 |
+
| File | Usage |
|
| 21 |
+
|------|-------|
|
| 22 |
+
| `src/utils/config.py` | `MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET` settings |
|
| 23 |
+
| `src/utils/service_loader.py` | Modal service initialization |
|
| 24 |
+
| `src/services/llamaindex_rag.py` | Modal integration for RAG |
|
| 25 |
+
| `src/agents/code_executor_agent.py` | Modal sandbox execution |
|
| 26 |
+
| `src/utils/exceptions.py` | Modal-related exceptions |
|
| 27 |
+
| `src/tools/code_execution.py` | Modal code execution tool |
|
| 28 |
+
| `src/services/statistical_analyzer.py` | Modal statistical analysis |
|
| 29 |
+
| `src/mcp_tools.py` | Modal MCP tool wrappers |
|
| 30 |
+
| `src/agents/analysis_agent.py` | Modal analysis agent |
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Context
|
| 35 |
+
|
| 36 |
+
Modal was originally integrated for:
|
| 37 |
+
1. **Code Execution Sandbox**: Running untrusted code in isolated containers
|
| 38 |
+
2. **Statistical Analysis**: Offloading heavy statistical computations
|
| 39 |
+
3. **LlamaIndex RAG**: Premium embeddings with persistent storage
|
| 40 |
+
|
| 41 |
+
However, the project decided against Modal because:
|
| 42 |
+
- Added infrastructure complexity
|
| 43 |
+
- Free Tier doesn't need cloud functions
|
| 44 |
+
- Paid Tier uses OpenAI directly
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Recommended Fix
|
| 49 |
+
|
| 50 |
+
1. Remove Modal-related code from all affected files
|
| 51 |
+
2. Remove `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` from config
|
| 52 |
+
3. Remove Modal from dependencies in `pyproject.toml`
|
| 53 |
+
4. Update any documentation referencing Modal
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Impact If Not Fixed
|
| 58 |
+
|
| 59 |
+
- Confusion for new contributors
|
| 60 |
+
- Dead code in production
|
| 61 |
+
- Unnecessary dependencies
|
| 62 |
+
- Config settings that do nothing
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## Test Plan
|
| 67 |
+
|
| 68 |
+
1. Remove Modal code
|
| 69 |
+
2. Run `make check` to ensure no breakage
|
| 70 |
+
3. Verify Free Tier and Paid Tier still work
|
| 71 |
+
4. Search codebase for any remaining Modal references
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## Related
|
| 76 |
+
|
| 77 |
+
- `P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md` - Similar tech debt for Anthropic
|
| 78 |
+
- ARCHITECTURE.md - Current architecture excludes Modal
|
docs/index.md
CHANGED
|
@@ -103,5 +103,5 @@ User Question β Research Agent (Orchestrator)
|
|
| 103 |
|-------|--------|
|
| 104 |
| Phases 1-14 | β
COMPLETE |
|
| 105 |
|
| 106 |
-
**Tests**:
|
| 107 |
**Known Issues**: See [Active Bugs](bugs/ACTIVE_BUGS.md)
|
|
|
|
| 103 |
|-------|--------|
|
| 104 |
| Phases 1-14 | β
COMPLETE |
|
| 105 |
|
| 106 |
+
**Tests**: 318 passing, 0 warnings
|
| 107 |
**Known Issues**: See [Active Bugs](bugs/ACTIVE_BUGS.md)
|
src/agents/search_agent.py
CHANGED
|
@@ -67,7 +67,7 @@ class SearchAgent(BaseAgent): # type: ignore[misc]
|
|
| 67 |
)
|
| 68 |
|
| 69 |
# Execute search
|
| 70 |
-
result: SearchResult = await self._handler.execute(query, max_results_per_tool=
|
| 71 |
|
| 72 |
# Track what to show in response (initialized to search results as default)
|
| 73 |
evidence_to_show: list[Evidence] = result.evidence
|
|
|
|
| 67 |
)
|
| 68 |
|
| 69 |
# Execute search
|
| 70 |
+
result: SearchResult = await self._handler.execute(query, max_results_per_tool=5)
|
| 71 |
|
| 72 |
# Track what to show in response (initialized to search results as default)
|
| 73 |
evidence_to_show: list[Evidence] = result.evidence
|
src/orchestrators/advanced.py
CHANGED
|
@@ -301,6 +301,7 @@ The final output should be a structured research report."""
|
|
| 301 |
# to repr(message) if text is empty. We must reconstruct text from Deltas.
|
| 302 |
current_message_buffer: str = ""
|
| 303 |
current_agent_id: str | None = None
|
|
|
|
| 304 |
|
| 305 |
try:
|
| 306 |
async with asyncio.timeout(self._timeout_seconds):
|
|
@@ -333,15 +334,21 @@ The final output should be a structured research report."""
|
|
| 333 |
yield comp_event
|
| 334 |
yield prog_event
|
| 335 |
|
|
|
|
|
|
|
| 336 |
# Clear buffer after consuming
|
| 337 |
current_message_buffer = ""
|
| 338 |
continue
|
| 339 |
|
| 340 |
-
# 3. Handle
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 341 |
agent_event = self._process_event(event, iteration)
|
| 342 |
if agent_event:
|
| 343 |
-
if agent_event.type == "complete":
|
| 344 |
-
final_event_received = True
|
| 345 |
yield agent_event
|
| 346 |
|
| 347 |
# GUARANTEE: Always emit termination event if stream ends without one
|
|
@@ -413,6 +420,40 @@ The final output should be a structured research report."""
|
|
| 413 |
|
| 414 |
return completion_event, progress_event
|
| 415 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 416 |
def _extract_text(self, message: Any) -> str:
|
| 417 |
"""
|
| 418 |
Defensively extract text from a message object.
|
|
@@ -526,30 +567,12 @@ The final output should be a structured research report."""
|
|
| 526 |
iteration=iteration,
|
| 527 |
)
|
| 528 |
|
| 529 |
-
# NOTE:
|
| 530 |
-
#
|
| 531 |
-
|
| 532 |
-
|
| 533 |
-
|
| 534 |
-
|
| 535 |
-
type="complete",
|
| 536 |
-
message=text,
|
| 537 |
-
data={"iterations": iteration},
|
| 538 |
-
iteration=iteration,
|
| 539 |
-
)
|
| 540 |
-
|
| 541 |
-
# NOTE: MagenticAgentDeltaEvent is handled in run() loop with Accumulator Pattern
|
| 542 |
-
# (see lines 306-320) and never reaches this method due to `continue` statement.
|
| 543 |
-
|
| 544 |
-
elif isinstance(event, WorkflowOutputEvent):
|
| 545 |
-
if event.data:
|
| 546 |
-
# Use _extract_text to properly handle ChatMessage objects
|
| 547 |
-
text = self._extract_text(event.data)
|
| 548 |
-
return AgentEvent(
|
| 549 |
-
type="complete",
|
| 550 |
-
message=text if text else "Research complete (no synthesis)",
|
| 551 |
-
iteration=iteration,
|
| 552 |
-
)
|
| 553 |
|
| 554 |
return None
|
| 555 |
|
|
|
|
| 301 |
# to repr(message) if text is empty. We must reconstruct text from Deltas.
|
| 302 |
current_message_buffer: str = ""
|
| 303 |
current_agent_id: str | None = None
|
| 304 |
+
last_streamed_length: int = 0 # Track for P2 Duplicate Report Bug Fix
|
| 305 |
|
| 306 |
try:
|
| 307 |
async with asyncio.timeout(self._timeout_seconds):
|
|
|
|
| 334 |
yield comp_event
|
| 335 |
yield prog_event
|
| 336 |
|
| 337 |
+
# P2 BUG FIX: Save length before clearing
|
| 338 |
+
last_streamed_length = len(current_message_buffer)
|
| 339 |
# Clear buffer after consuming
|
| 340 |
current_message_buffer = ""
|
| 341 |
continue
|
| 342 |
|
| 343 |
+
# 3. Handle Final Events Inline (P2 Duplicate Report Fix)
|
| 344 |
+
if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
|
| 345 |
+
final_event_received = True
|
| 346 |
+
yield self._handle_final_event(event, iteration, last_streamed_length)
|
| 347 |
+
continue
|
| 348 |
+
|
| 349 |
+
# 4. Handle other events normally
|
| 350 |
agent_event = self._process_event(event, iteration)
|
| 351 |
if agent_event:
|
|
|
|
|
|
|
| 352 |
yield agent_event
|
| 353 |
|
| 354 |
# GUARANTEE: Always emit termination event if stream ends without one
|
|
|
|
| 420 |
|
| 421 |
return completion_event, progress_event
|
| 422 |
|
| 423 |
+
def _handle_final_event(
|
| 424 |
+
self,
|
| 425 |
+
event: MagenticFinalResultEvent | WorkflowOutputEvent,
|
| 426 |
+
iteration: int,
|
| 427 |
+
last_streamed_length: int,
|
| 428 |
+
) -> AgentEvent:
|
| 429 |
+
"""Handle final workflow events with duplicate content suppression (P2 Bug Fix)."""
|
| 430 |
+
# DECISION: Did we stream substantial content?
|
| 431 |
+
if last_streamed_length > 100:
|
| 432 |
+
# YES: Final event is a SIGNAL, not a payload
|
| 433 |
+
return AgentEvent(
|
| 434 |
+
type="complete",
|
| 435 |
+
message="Research complete.",
|
| 436 |
+
data={
|
| 437 |
+
"iterations": iteration,
|
| 438 |
+
"streamed_chars": last_streamed_length,
|
| 439 |
+
},
|
| 440 |
+
iteration=iteration,
|
| 441 |
+
)
|
| 442 |
+
|
| 443 |
+
# NO: Final event must carry the payload (tool-only turn, cache hit)
|
| 444 |
+
text = ""
|
| 445 |
+
if isinstance(event, MagenticFinalResultEvent):
|
| 446 |
+
text = self._extract_text(event.message) if event.message else "No result"
|
| 447 |
+
elif isinstance(event, WorkflowOutputEvent):
|
| 448 |
+
text = self._extract_text(event.data) if event.data else "Research complete"
|
| 449 |
+
|
| 450 |
+
return AgentEvent(
|
| 451 |
+
type="complete",
|
| 452 |
+
message=text,
|
| 453 |
+
data={"iterations": iteration},
|
| 454 |
+
iteration=iteration,
|
| 455 |
+
)
|
| 456 |
+
|
| 457 |
def _extract_text(self, message: Any) -> str:
|
| 458 |
"""
|
| 459 |
Defensively extract text from a message object.
|
|
|
|
| 567 |
iteration=iteration,
|
| 568 |
)
|
| 569 |
|
| 570 |
+
# NOTE: The following event types are handled inline in run() loop and never reach
|
| 571 |
+
# this method due to `continue` statements:
|
| 572 |
+
# - MagenticAgentMessageEvent: Accumulator Pattern (lines 322-335)
|
| 573 |
+
# - MagenticAgentDeltaEvent: Accumulator Pattern (lines 306-320)
|
| 574 |
+
# - MagenticFinalResultEvent: P2 Duplicate Fix via _handle_final_event() (lines 343-347)
|
| 575 |
+
# - WorkflowOutputEvent: P2 Duplicate Fix via _handle_final_event() (lines 343-347)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 576 |
|
| 577 |
return None
|
| 578 |
|
src/utils/config.py
CHANGED
|
@@ -71,10 +71,10 @@ class Settings(BaseSettings):
|
|
| 71 |
description="Max coordination rounds for Advanced mode (default 5 for faster demos)",
|
| 72 |
)
|
| 73 |
advanced_timeout: float = Field(
|
| 74 |
-
default=
|
| 75 |
ge=60.0,
|
| 76 |
le=900.0,
|
| 77 |
-
description="Timeout for Advanced mode in seconds (default
|
| 78 |
)
|
| 79 |
search_timeout: int = Field(default=30, description="Seconds to wait for search")
|
| 80 |
magentic_timeout: int = Field(
|
|
|
|
| 71 |
description="Max coordination rounds for Advanced mode (default 5 for faster demos)",
|
| 72 |
)
|
| 73 |
advanced_timeout: float = Field(
|
| 74 |
+
default=600.0,
|
| 75 |
ge=60.0,
|
| 76 |
le=900.0,
|
| 77 |
+
description="Timeout for Advanced mode in seconds (default 10 min)",
|
| 78 |
)
|
| 79 |
search_timeout: int = Field(default=30, description="Seconds to wait for search")
|
| 80 |
magentic_timeout: int = Field(
|
tests/unit/agents/test_search_agent.py
CHANGED
|
@@ -47,7 +47,7 @@ async def test_run_executes_search(mock_handler: AsyncMock) -> None:
|
|
| 47 |
response = await agent.run("test query")
|
| 48 |
|
| 49 |
# Check handler called
|
| 50 |
-
mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=
|
| 51 |
|
| 52 |
# Check store updated
|
| 53 |
assert len(store["current"]) == 1
|
|
@@ -67,7 +67,7 @@ async def test_run_handles_chat_message_input(mock_handler: AsyncMock) -> None:
|
|
| 67 |
message = ChatMessage(role=Role.USER, text="test query")
|
| 68 |
await agent.run(message)
|
| 69 |
|
| 70 |
-
mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=
|
| 71 |
|
| 72 |
|
| 73 |
@pytest.mark.asyncio
|
|
@@ -81,7 +81,7 @@ async def test_run_handles_list_input(mock_handler: AsyncMock) -> None:
|
|
| 81 |
ChatMessage(role=Role.USER, text="test query"),
|
| 82 |
]
|
| 83 |
await agent.run(messages)
|
| 84 |
-
mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=
|
| 85 |
|
| 86 |
|
| 87 |
@pytest.mark.asyncio
|
|
|
|
| 47 |
response = await agent.run("test query")
|
| 48 |
|
| 49 |
# Check handler called
|
| 50 |
+
mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=5)
|
| 51 |
|
| 52 |
# Check store updated
|
| 53 |
assert len(store["current"]) == 1
|
|
|
|
| 67 |
message = ChatMessage(role=Role.USER, text="test query")
|
| 68 |
await agent.run(message)
|
| 69 |
|
| 70 |
+
mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=5)
|
| 71 |
|
| 72 |
|
| 73 |
@pytest.mark.asyncio
|
|
|
|
| 81 |
ChatMessage(role=Role.USER, text="test query"),
|
| 82 |
]
|
| 83 |
await agent.run(messages)
|
| 84 |
+
mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=5)
|
| 85 |
|
| 86 |
|
| 87 |
@pytest.mark.asyncio
|
tests/unit/orchestrators/test_advanced_orchestrator.py
CHANGED
|
@@ -28,11 +28,11 @@ class TestAdvancedOrchestratorConfig:
|
|
| 28 |
assert orch._max_rounds == 7
|
| 29 |
|
| 30 |
@patch("src.orchestrators.advanced.get_chat_client")
|
| 31 |
-
def
|
| 32 |
-
"""
|
| 33 |
mock_get_client.return_value = MagicMock()
|
| 34 |
orch = AdvancedOrchestrator()
|
| 35 |
-
assert orch._timeout_seconds ==
|
| 36 |
|
| 37 |
@patch("src.orchestrators.advanced.get_chat_client")
|
| 38 |
def test_explicit_timeout_overrides_settings(self, mock_get_client) -> None:
|
|
|
|
| 28 |
assert orch._max_rounds == 7
|
| 29 |
|
| 30 |
@patch("src.orchestrators.advanced.get_chat_client")
|
| 31 |
+
def test_timeout_default_is_ten_minutes(self, mock_get_client: MagicMock) -> None:
|
| 32 |
+
"""Test that default timeout is 10 minutes (P2 fix)."""
|
| 33 |
mock_get_client.return_value = MagicMock()
|
| 34 |
orch = AdvancedOrchestrator()
|
| 35 |
+
assert orch._timeout_seconds == 600.0
|
| 36 |
|
| 37 |
@patch("src.orchestrators.advanced.get_chat_client")
|
| 38 |
def test_explicit_timeout_overrides_settings(self, mock_get_client) -> None:
|