DeepBoner / SPEC_ARCHITECTURAL_DEBT.md
VibecoderMcSwaggins's picture
docs: Address all remaining CodeRabbit nitpicks
b6ec445
|
raw
history blame
12.2 kB

Architectural Debt Specification

Status: IMPLEMENTED (Phase 1 Complete) Date: 2025-12-04 Scope: src/orchestrators/advanced.py (Primary), System-Wide (Secondary) Purpose: Roadmap for "DeepMind Status" Code Quality Author: Claude (Senior Review Incorporated)


Executive Summary

The P1/P2 bug fixes in PR #124 introduced technical debt that must be addressed before the PR is considered "done". This spec documents three immediate priorities (DRY violation, redundant imports, magic strings) and five medium-term system-wide improvements.


Part 1: Immediate Cleanup (MUST Complete Before PR Merge)

Priority 1: DRY Violation - Synthesis Methods

Problem: _handle_timeout() (lines 201-248) and _force_synthesis() (lines 250-297) are 95% identical.

_handle_timeout() _force_synthesis()
Lines 201-248 (47 lines) Lines 250-297 (47 lines)
Yields "Workflow timed out. Synthesizing..." Yields "Synthesizing research findings..."
Error data: timeout_synthesis Error data: forced_synthesis
Everything else is identical Everything else is identical

SOLID Violation: DRY (Don't Repeat Yourself). Changes to synthesis logic must be made in two places. This is a maintenance nightmare and a source of future bugs.

Fix: Extract unified method _synthesize_fallback(iteration: int, reason: str):

async def _synthesize_fallback(
    self, iteration: int, reason: str
) -> AsyncGenerator[AgentEvent, None]:
    """
    Unified fallback synthesis for all termination scenarios.

    Args:
        iteration: Current workflow iteration
        reason: Why synthesis is being forced ("timeout", "no_reporter", "max_rounds")
    """
    status_messages = {
        "timeout": "Workflow timed out. Synthesizing available evidence...",
        "no_reporter": "Synthesizing research findings...",
        "max_rounds": "Max rounds reached. Synthesizing findings...",
    }

    try:
        state = get_magentic_state()
        evidence_summary = await state.memory.get_context_summary()
        report_agent = create_report_agent(self._chat_client, domain=self.domain)

        yield AgentEvent(
            type="synthesizing",
            message=status_messages.get(reason, "Synthesizing..."),
            iteration=iteration,
        )

        synthesis_result = await report_agent.run(
            f"Synthesize research report from this evidence. "
            f"If evidence is sparse, say so.\n\n{evidence_summary}"
        )

        yield AgentEvent(
            type="complete",
            message=synthesis_result.text,
            data={"reason": f"{reason}_synthesis", "iterations": iteration},
            iteration=iteration,
        )
    except Exception as synth_error:
        logger.error(f"{reason} synthesis failed", error=str(synth_error))
        yield AgentEvent(
            type="complete",
            message=f"Research completed. Synthesis failed: {synth_error}",
            data={"reason": f"{reason}_synthesis_failed", "iterations": iteration},
            iteration=iteration,
        )

Call Sites to Update:

  1. Line 447: async for event in self._handle_timeout(iteration): β†’ self._synthesize_fallback(iteration, "timeout")
  2. Line 412: async for synth_event in self._force_synthesis(iteration): β†’ self._synthesize_fallback(iteration, "no_reporter")
  3. Line 432: async for synth_event in self._force_synthesis(iteration): β†’ self._synthesize_fallback(iteration, "max_rounds")

Delete After Refactor:

  • _handle_timeout() method (lines 201-248)
  • _force_synthesis() method (lines 250-297)

Priority 2: Redundant Imports

Problem: Imports inside methods that already exist at module level.

Location Import Already At
Line 207 from src.agents.magentic_agents import create_report_agent Line 35
Line 208 from src.agents.state import get_magentic_state Missing!
Line 257 from src.agents.magentic_agents import create_report_agent Line 35
Line 258 from src.agents.state import get_magentic_state Missing!

SOLID Violation: SRP (Single Responsibility). Import management is scattered across the file instead of centralized at the top.

Fix:

  1. Add to module-level imports (around line 38):

    from src.agents.state import get_magentic_state, init_magentic_state
    

    Note: init_magentic_state is already imported at line 38. Add get_magentic_state to that import.

  2. Remove redundant imports from:

    • Lines 207-208 (inside _handle_timeout)
    • Lines 257-258 (inside _force_synthesis)

Priority 3: Magic Strings

Problem: Agent detection relies on string literals that break silently if agents are renamed.

Current Code (line 385):

agent_name = (event.agent_id or "").lower()
if "report" in agent_name:  # FRAGILE: Breaks if agent renamed
    reporter_ran = True

Also in _get_event_type_for_agent() (lines 593-602):

if "search" in agent_lower:    # Magic string
if "judge" in agent_lower:     # Magic string
if "hypothes" in agent_lower:  # Magic string
if "report" in agent_lower:    # Magic string

SOLID Violation: OCP (Open/Closed Principle). Renaming an agent requires changes in multiple locations.

Fix Option A - Constants:

# At module level (after imports)
REPORTER_AGENT_ID = "reporter"
SEARCHER_AGENT_ID = "searcher"
JUDGE_AGENT_ID = "judge"
HYPOTHESIZER_AGENT_ID = "hypothesizer"

Fix Option B - Agent Name Attribute (Preferred):

# In magentic_agents.py, ensure each agent has a .name attribute
# Then in advanced.py:
if event.agent_id == report_agent.name:
    reporter_ran = True

Recommendation: Option A is simpler and doesn't require modifying agent factory. Use constants.


Part 2: System-Wide Issues (Future PRs)

These are valid concerns identified during code review but are NOT blockers for the current PR.

Priority 4: Dead Config

Location: src/utils/config.py Issue: Zombie configuration values that are never used or raise NotImplemented.

  • magentic_timeout: Deprecated, never read
  • anthropic_api_key: Config exists but factory raises NotImplemented

Fix: Audit config.py, remove dead settings, add deprecation warnings for transitional settings.


Priority 5: Prompt Unification

Location: src/prompts/ vs src/config/domain.py Issue: Two sources of truth for prompts. src/prompts/ files exist but are ignored. System uses hardcoded strings in domain.py.

Fix: Pick ONE source of truth. Recommendation: Delete src/prompts/ if unused, or migrate domain.py prompts there.


Priority 6: Factory Monolith

Location: src/clients/factory.py Issue: Hardcoded logic for detecting API key prefixes (sk- β†’ OpenAI, sk-ant- β†’ Anthropic error).

SOLID Violation: OCP. Adding a new provider requires modifying the factory.

Fix: Provider registry pattern with auto-registration, or strategy pattern with key prefix handlers.


Priority 7: State Class

Location: src/orchestrators/advanced.py run() method Issue: Method manages 6+ loose variables (iteration, reporter_ran, buffer, current_agent_id, last_streamed_length, final_event_received).

Fix: Extract to WorkflowState dataclass:

@dataclass
class WorkflowState:
    iteration: int = 0
    reporter_ran: bool = False
    current_message_buffer: str = ""
    current_agent_id: str | None = None
    last_streamed_length: int = 0
    final_event_received: bool = False

Priority 8: Real Integration Tests

Location: tests/e2e/ Issue: We deleted flaky integration tests. Now we have ZERO automated tests against real APIs.

Fix: Create stable make test-live suite with:

  • Real HuggingFace Free Tier test
  • Real OpenAI BYOK test
  • Proper timeout handling
  • Skip markers for CI (run manually or on schedule)

Regression Prevention Strategy

CRITICAL: Each phase MUST pass smoke tests before merge. Unit tests alone are insufficient.

Smoke Test Infrastructure

Add to Makefile:

# Smoke tests - run against real APIs (slow, not for CI)
smoke-free:
    @echo "Running Free Tier smoke test..."
    uv run python -m pytest tests/e2e/test_smoke.py::test_free_tier_synthesis -v -s --timeout=600

smoke-paid:
    @echo "Running Paid Tier smoke test (requires OPENAI_API_KEY)..."
    uv run python -m pytest tests/e2e/test_smoke.py::test_paid_tier_synthesis -v -s --timeout=300

smoke: smoke-free  # Default to free tier

Smoke Test Implementation

Create tests/e2e/test_smoke.py:

"""
Smoke tests for regression prevention.

These tests run against REAL APIs and verify end-to-end functionality.
They are slow (2-5 minutes) and should NOT run in CI.

Usage:
    make smoke-free   # Test Free Tier (HuggingFace)
    make smoke-paid   # Test Paid Tier (OpenAI BYOK)
"""
import pytest
from src.orchestrators.advanced import AdvancedOrchestrator

@pytest.mark.e2e
@pytest.mark.timeout(600)  # 10 minute timeout for Free Tier
async def test_free_tier_synthesis():
    """Verify Free Tier produces actual synthesis (not just 'Research complete.')"""
    orch = AdvancedOrchestrator(max_rounds=2)

    events = []
    async for event in orch.run("What is libido?"):
        events.append(event)

    # MUST have a complete event
    complete_events = [e for e in events if e.type == "complete"]
    assert len(complete_events) >= 1, "No complete event received"

    # Complete event MUST have substantive content (not just signal)
    final = complete_events[-1]
    assert len(final.message) > 100, f"Synthesis too short: {len(final.message)} chars"
    assert "Research complete." not in final.message or len(final.message) > 50, \
        "Got empty synthesis signal instead of actual report"

    # Should NOT have duplicate content
    messages = [e.message for e in events if e.message]
    # Check for exact duplicates of long content
    long_messages = [m for m in messages if len(m) > 200]
    assert len(long_messages) == len(set(long_messages)), "Duplicate content detected"

@pytest.mark.e2e
@pytest.mark.timeout(300)  # 5 minute timeout for Paid Tier
async def test_paid_tier_synthesis():
    """Verify Paid Tier (BYOK) produces synthesis."""
    import os
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        pytest.skip("OPENAI_API_KEY not set")

    orch = AdvancedOrchestrator(max_rounds=2, api_key=api_key)

    events = []
    async for event in orch.run("What is libido?"):
        events.append(event)

    complete_events = [e for e in events if e.type == "complete"]
    assert len(complete_events) >= 1
    assert len(complete_events[-1].message) > 100

Phase Gate Checklist

Before merging ANY refactoring PR:

[ ] make check          # All 318+ unit tests pass
[ ] make smoke-free     # Free Tier produces real synthesis
[ ] make smoke-paid     # Paid Tier works (if you have key)
[ ] CodeRabbit approved # No blocking issues

Execution Strategy

Phase 1: Current PR (REQUIRED)

Implement Priority 1, 2, and 3 before merging PR #124.

Definition of Done:

  • _synthesize_fallback(iteration, reason) implemented
  • _handle_timeout() and _force_synthesis() deleted
  • All synthesis call sites updated
  • Redundant imports removed
  • get_magentic_state added to module-level imports
  • Magic strings replaced with constants
  • All tests pass (make check)

Phase 2: Future PRs (Separate Tickets)

Create GitHub issues for Priority 4-8. Do NOT bloat the current bug fix PR.


Appendix: Line Number Reference

Item Current Location
_handle_timeout() Lines 201-248
_force_synthesis() Lines 250-297
Redundant imports (timeout) Lines 207-208
Redundant imports (force) Lines 257-258
Magic string detection Line 385
_get_event_type_for_agent() Lines 582-602
Module imports Lines 18-48
run() method Lines 299-456