Architectural Debt Specification
Status: IMPLEMENTED (Phase 1 Complete) Date: 2025-12-04 Scope:
src/orchestrators/advanced.py(Primary), System-Wide (Secondary) Purpose: Roadmap for "DeepMind Status" Code Quality Author: Claude (Senior Review Incorporated)
Executive Summary
The P1/P2 bug fixes in PR #124 introduced technical debt that must be addressed before the PR is considered "done". This spec documents three immediate priorities (DRY violation, redundant imports, magic strings) and five medium-term system-wide improvements.
Part 1: Immediate Cleanup (MUST Complete Before PR Merge)
Priority 1: DRY Violation - Synthesis Methods
Problem: _handle_timeout() (lines 201-248) and _force_synthesis() (lines 250-297) are 95% identical.
_handle_timeout() |
_force_synthesis() |
|---|---|
| Lines 201-248 (47 lines) | Lines 250-297 (47 lines) |
| Yields "Workflow timed out. Synthesizing..." | Yields "Synthesizing research findings..." |
Error data: timeout_synthesis |
Error data: forced_synthesis |
| Everything else is identical | Everything else is identical |
SOLID Violation: DRY (Don't Repeat Yourself). Changes to synthesis logic must be made in two places. This is a maintenance nightmare and a source of future bugs.
Fix: Extract unified method _synthesize_fallback(iteration: int, reason: str):
async def _synthesize_fallback(
self, iteration: int, reason: str
) -> AsyncGenerator[AgentEvent, None]:
"""
Unified fallback synthesis for all termination scenarios.
Args:
iteration: Current workflow iteration
reason: Why synthesis is being forced ("timeout", "no_reporter", "max_rounds")
"""
status_messages = {
"timeout": "Workflow timed out. Synthesizing available evidence...",
"no_reporter": "Synthesizing research findings...",
"max_rounds": "Max rounds reached. Synthesizing findings...",
}
try:
state = get_magentic_state()
evidence_summary = await state.memory.get_context_summary()
report_agent = create_report_agent(self._chat_client, domain=self.domain)
yield AgentEvent(
type="synthesizing",
message=status_messages.get(reason, "Synthesizing..."),
iteration=iteration,
)
synthesis_result = await report_agent.run(
f"Synthesize research report from this evidence. "
f"If evidence is sparse, say so.\n\n{evidence_summary}"
)
yield AgentEvent(
type="complete",
message=synthesis_result.text,
data={"reason": f"{reason}_synthesis", "iterations": iteration},
iteration=iteration,
)
except Exception as synth_error:
logger.error(f"{reason} synthesis failed", error=str(synth_error))
yield AgentEvent(
type="complete",
message=f"Research completed. Synthesis failed: {synth_error}",
data={"reason": f"{reason}_synthesis_failed", "iterations": iteration},
iteration=iteration,
)
Call Sites to Update:
- Line 447:
async for event in self._handle_timeout(iteration):βself._synthesize_fallback(iteration, "timeout") - Line 412:
async for synth_event in self._force_synthesis(iteration):βself._synthesize_fallback(iteration, "no_reporter") - Line 432:
async for synth_event in self._force_synthesis(iteration):βself._synthesize_fallback(iteration, "max_rounds")
Delete After Refactor:
_handle_timeout()method (lines 201-248)_force_synthesis()method (lines 250-297)
Priority 2: Redundant Imports
Problem: Imports inside methods that already exist at module level.
| Location | Import | Already At |
|---|---|---|
| Line 207 | from src.agents.magentic_agents import create_report_agent |
Line 35 |
| Line 208 | from src.agents.state import get_magentic_state |
Missing! |
| Line 257 | from src.agents.magentic_agents import create_report_agent |
Line 35 |
| Line 258 | from src.agents.state import get_magentic_state |
Missing! |
SOLID Violation: SRP (Single Responsibility). Import management is scattered across the file instead of centralized at the top.
Fix:
Add to module-level imports (around line 38):
from src.agents.state import get_magentic_state, init_magentic_stateNote:
init_magentic_stateis already imported at line 38. Addget_magentic_stateto that import.Remove redundant imports from:
- Lines 207-208 (inside
_handle_timeout) - Lines 257-258 (inside
_force_synthesis)
- Lines 207-208 (inside
Priority 3: Magic Strings
Problem: Agent detection relies on string literals that break silently if agents are renamed.
Current Code (line 385):
agent_name = (event.agent_id or "").lower()
if "report" in agent_name: # FRAGILE: Breaks if agent renamed
reporter_ran = True
Also in _get_event_type_for_agent() (lines 593-602):
if "search" in agent_lower: # Magic string
if "judge" in agent_lower: # Magic string
if "hypothes" in agent_lower: # Magic string
if "report" in agent_lower: # Magic string
SOLID Violation: OCP (Open/Closed Principle). Renaming an agent requires changes in multiple locations.
Fix Option A - Constants:
# At module level (after imports)
REPORTER_AGENT_ID = "reporter"
SEARCHER_AGENT_ID = "searcher"
JUDGE_AGENT_ID = "judge"
HYPOTHESIZER_AGENT_ID = "hypothesizer"
Fix Option B - Agent Name Attribute (Preferred):
# In magentic_agents.py, ensure each agent has a .name attribute
# Then in advanced.py:
if event.agent_id == report_agent.name:
reporter_ran = True
Recommendation: Option A is simpler and doesn't require modifying agent factory. Use constants.
Part 2: System-Wide Issues (Future PRs)
These are valid concerns identified during code review but are NOT blockers for the current PR.
Priority 4: Dead Config
Location: src/utils/config.py
Issue: Zombie configuration values that are never used or raise NotImplemented.
magentic_timeout: Deprecated, never readanthropic_api_key: Config exists but factory raises NotImplemented
Fix: Audit config.py, remove dead settings, add deprecation warnings for transitional settings.
Priority 5: Prompt Unification
Location: src/prompts/ vs src/config/domain.py
Issue: Two sources of truth for prompts. src/prompts/ files exist but are ignored. System uses hardcoded strings in domain.py.
Fix: Pick ONE source of truth. Recommendation: Delete src/prompts/ if unused, or migrate domain.py prompts there.
Priority 6: Factory Monolith
Location: src/clients/factory.py
Issue: Hardcoded logic for detecting API key prefixes (sk- β OpenAI, sk-ant- β Anthropic error).
SOLID Violation: OCP. Adding a new provider requires modifying the factory.
Fix: Provider registry pattern with auto-registration, or strategy pattern with key prefix handlers.
Priority 7: State Class
Location: src/orchestrators/advanced.py run() method
Issue: Method manages 6+ loose variables (iteration, reporter_ran, buffer, current_agent_id, last_streamed_length, final_event_received).
Fix: Extract to WorkflowState dataclass:
@dataclass
class WorkflowState:
iteration: int = 0
reporter_ran: bool = False
current_message_buffer: str = ""
current_agent_id: str | None = None
last_streamed_length: int = 0
final_event_received: bool = False
Priority 8: Real Integration Tests
Location: tests/e2e/
Issue: We deleted flaky integration tests. Now we have ZERO automated tests against real APIs.
Fix: Create stable make test-live suite with:
- Real HuggingFace Free Tier test
- Real OpenAI BYOK test
- Proper timeout handling
- Skip markers for CI (run manually or on schedule)
Regression Prevention Strategy
CRITICAL: Each phase MUST pass smoke tests before merge. Unit tests alone are insufficient.
Smoke Test Infrastructure
Add to Makefile:
# Smoke tests - run against real APIs (slow, not for CI)
smoke-free:
@echo "Running Free Tier smoke test..."
uv run python -m pytest tests/e2e/test_smoke.py::test_free_tier_synthesis -v -s --timeout=600
smoke-paid:
@echo "Running Paid Tier smoke test (requires OPENAI_API_KEY)..."
uv run python -m pytest tests/e2e/test_smoke.py::test_paid_tier_synthesis -v -s --timeout=300
smoke: smoke-free # Default to free tier
Smoke Test Implementation
Create tests/e2e/test_smoke.py:
"""
Smoke tests for regression prevention.
These tests run against REAL APIs and verify end-to-end functionality.
They are slow (2-5 minutes) and should NOT run in CI.
Usage:
make smoke-free # Test Free Tier (HuggingFace)
make smoke-paid # Test Paid Tier (OpenAI BYOK)
"""
import pytest
from src.orchestrators.advanced import AdvancedOrchestrator
@pytest.mark.e2e
@pytest.mark.timeout(600) # 10 minute timeout for Free Tier
async def test_free_tier_synthesis():
"""Verify Free Tier produces actual synthesis (not just 'Research complete.')"""
orch = AdvancedOrchestrator(max_rounds=2)
events = []
async for event in orch.run("What is libido?"):
events.append(event)
# MUST have a complete event
complete_events = [e for e in events if e.type == "complete"]
assert len(complete_events) >= 1, "No complete event received"
# Complete event MUST have substantive content (not just signal)
final = complete_events[-1]
assert len(final.message) > 100, f"Synthesis too short: {len(final.message)} chars"
assert "Research complete." not in final.message or len(final.message) > 50, \
"Got empty synthesis signal instead of actual report"
# Should NOT have duplicate content
messages = [e.message for e in events if e.message]
# Check for exact duplicates of long content
long_messages = [m for m in messages if len(m) > 200]
assert len(long_messages) == len(set(long_messages)), "Duplicate content detected"
@pytest.mark.e2e
@pytest.mark.timeout(300) # 5 minute timeout for Paid Tier
async def test_paid_tier_synthesis():
"""Verify Paid Tier (BYOK) produces synthesis."""
import os
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
pytest.skip("OPENAI_API_KEY not set")
orch = AdvancedOrchestrator(max_rounds=2, api_key=api_key)
events = []
async for event in orch.run("What is libido?"):
events.append(event)
complete_events = [e for e in events if e.type == "complete"]
assert len(complete_events) >= 1
assert len(complete_events[-1].message) > 100
Phase Gate Checklist
Before merging ANY refactoring PR:
[ ] make check # All 318+ unit tests pass
[ ] make smoke-free # Free Tier produces real synthesis
[ ] make smoke-paid # Paid Tier works (if you have key)
[ ] CodeRabbit approved # No blocking issues
Execution Strategy
Phase 1: Current PR (REQUIRED)
Implement Priority 1, 2, and 3 before merging PR #124.
Definition of Done:
-
_synthesize_fallback(iteration, reason)implemented -
_handle_timeout()and_force_synthesis()deleted - All synthesis call sites updated
- Redundant imports removed
-
get_magentic_stateadded to module-level imports - Magic strings replaced with constants
- All tests pass (
make check)
Phase 2: Future PRs (Separate Tickets)
Create GitHub issues for Priority 4-8. Do NOT bloat the current bug fix PR.
Appendix: Line Number Reference
| Item | Current Location |
|---|---|
_handle_timeout() |
Lines 201-248 |
_force_synthesis() |
Lines 250-297 |
| Redundant imports (timeout) | Lines 207-208 |
| Redundant imports (force) | Lines 257-258 |
| Magic string detection | Line 385 |
_get_event_type_for_agent() |
Lines 582-602 |
| Module imports | Lines 18-48 |
run() method |
Lines 299-456 |