feat(SPEC-16): Unified Chat Client Architecture (#115)
Browse files* chore: Update dependencies and verify SPEC-16 for Unified Chat Client
* feat: Implement unified ChatClient architecture (SPEC-16) Phase 1
* refactor: Deprecate Simple Mode and map to Unified Advanced Mode (SPEC-16 Phase 2)
* refactor: Complete SPEC-16 cleanup - remove stale dual-mode tests
- Delete obsolete e2e/integration tests referencing removed functions
(check_magentic_requirements, mode="simple", etc.)
- Update unit tests for unified architecture (no mode parameter)
- Fix type errors in HuggingFaceChatClient (add type: ignore for untyped base)
- Remove mode toggle from Gradio UI
- Add ChatClient factory tests
Closes #105, Fixes #113
Refs #114 (tech debt: naming cleanup deferred)
* chore: Sync pre-commit mypy with project dependencies
Add agent-framework-core to pre-commit additional_dependencies so
mypy runs with the same type information in pre-commit hooks as in
`make typecheck`.
Previously, the pre-commit mypy hook ran in isolation without
agent_framework types, causing BaseChatClient to appear as Any.
* style: Format files for CI compliance
* chore: Sync ruff version (0.14.7) between pre-commit and uv.lock
Fixes divergence where pre-commit used v0.14.7 but CI/local used v0.14.6,
causing formatting differences.
* fix: Address CodeRabbit review findings (PR #115)
## Factory (CRITICAL)
- Add case-insensitive provider matching (OpenAI β openai)
- Raise ValueError for unsupported providers (no silent fallback)
- Fix misleading Gemini log (now warns + falls through)
## HuggingFace Client (CRITICAL + MAJOR)
- Fix Role enum conversion: use .value, not str(enum)
- str(Role.USER) β "Role.USER" (wrong)
- Role.USER.value β "user" (correct)
- Fix temperature/max_tokens: use `is not None` instead of `or`
- `or` treats 0/0.0 as falsy, breaking temperature=0.0
## Tests
- Add test for unsupported provider ValueError
- Add test for case-insensitive provider matching
- Add test for Role enum conversion
* fix: Apply same defensive patterns codebase-wide
## Case-insensitive provider matching
- llm_factory.py: Normalize llm_provider before comparison
- config.py: Normalize llm_provider in get_api_key()
## Explicit None checks for numeric defaults
- judge.py: total_evidence_count=0 is now honored
These are the same anti-patterns fixed in the CodeRabbit review,
now applied consistently across the codebase.
- .pre-commit-config.yaml +1 -0
- docs/bugs/ACTIVE_BUGS.md +20 -1
- docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md +219 -0
- docs/specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md +246 -175
- pyproject.toml +1 -1
- src/agents/code_executor_agent.py +4 -7
- src/agents/magentic_agents.py +10 -22
- src/agents/retrieval_agent.py +4 -7
- src/app.py +25 -76
- src/clients/__init__.py +0 -0
- src/clients/base.py +19 -0
- src/clients/factory.py +76 -0
- src/clients/huggingface.py +191 -0
- src/orchestrators/__init__.py +16 -15
- src/orchestrators/advanced.py +40 -38
- src/orchestrators/factory.py +20 -73
- src/orchestrators/simple.py +0 -778
- src/prompts/judge.py +2 -1
- src/utils/config.py +18 -4
- src/utils/llm_factory.py +23 -60
- tests/e2e/test_advanced_mode.py +0 -70
- tests/e2e/test_simple_mode.py +0 -65
- tests/integration/test_dual_mode_e2e.py +0 -83
- tests/integration/test_simple_mode_synthesis.py +0 -157
- tests/unit/agents/test_magentic_agents_domain.py +8 -8
- tests/unit/agents/test_magentic_judge_termination.py +26 -14
- tests/unit/clients/__init__.py +1 -0
- tests/unit/clients/test_chat_client_factory.py +211 -0
- tests/unit/orchestrators/test_advanced_orchestrator.py +21 -17
- tests/unit/orchestrators/test_advanced_orchestrator_domain.py +15 -20
- tests/unit/orchestrators/test_factory_domain.py +7 -9
- tests/unit/orchestrators/test_simple_orchestrator_domain.py +0 -47
- tests/unit/orchestrators/test_simple_synthesis.py +0 -320
- tests/unit/orchestrators/test_termination.py +0 -104
- tests/unit/test_app_domain.py +43 -34
- tests/unit/test_gradio_crash.py +2 -2
- tests/unit/test_magentic_fix.py +0 -101
- tests/unit/test_magentic_termination.py +0 -155
- tests/unit/test_orchestrator.py +0 -290
- tests/unit/test_orchestrator_factory.py +20 -25
- tests/unit/test_streaming_fix.py +2 -1
- tests/unit/test_ui_elements.py +38 -18
- uv.lock +23 -23
|
@@ -18,4 +18,5 @@ repos:
|
|
| 18 |
- pydantic-settings>=2.2
|
| 19 |
- tenacity>=8.2
|
| 20 |
- pydantic-ai>=0.0.16
|
|
|
|
| 21 |
args: [--ignore-missing-imports]
|
|
|
|
| 18 |
- pydantic-settings>=2.2
|
| 19 |
- tenacity>=8.2
|
| 20 |
- pydantic-ai>=0.0.16
|
| 21 |
+
- agent-framework-core>=1.0.0b251120
|
| 22 |
args: [--ignore-missing-imports]
|
|
@@ -7,7 +7,26 @@
|
|
| 7 |
|
| 8 |
## P0 - Blocker
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
---
|
| 13 |
|
|
|
|
| 7 |
|
| 8 |
## P0 - Blocker
|
| 9 |
|
| 10 |
+
### P0 - Simple Mode Ignores Forced Synthesis (Issue #113)
|
| 11 |
+
**File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
|
| 12 |
+
**Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
|
| 13 |
+
**Found:** 2025-12-01 (Free Tier Testing)
|
| 14 |
+
|
| 15 |
+
**Problem:** When HuggingFace Inference fails 3 times, the Judge returns `recommendation="synthesize"` but Simple Mode's `_should_synthesize()` ignores it due to strict score thresholds (requires `combined_score >= 10` but forced synthesis has score 0).
|
| 16 |
+
|
| 17 |
+
**Impact:** Free tier users see 10 iterations of "Gathering more evidence" despite Judge saying "synthesize".
|
| 18 |
+
|
| 19 |
+
**Root Cause:** Coordination bug between two fixes:
|
| 20 |
+
- **PR #71 (SPEC_06):** Added `_should_synthesize()` with strict thresholds
|
| 21 |
+
- **Commit 5e761eb:** Added `_create_forced_synthesis_assessment()` with `score=0, confidence=0.1`
|
| 22 |
+
- These don't work together - forced synthesis bypasses nothing.
|
| 23 |
+
|
| 24 |
+
**Strategic Fix:** [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) - **INTEGRATION, NOT DELETION**
|
| 25 |
+
- Create `HuggingFaceChatClient` adapter for Microsoft Agent Framework
|
| 26 |
+
- **INTEGRATE** Simple Mode's free-tier capability into Advanced Mode
|
| 27 |
+
- Users without API keys β Advanced Mode with HuggingFace backend (capability PRESERVED)
|
| 28 |
+
- Retire Simple Mode's redundant orchestration CODE (not the capability!)
|
| 29 |
+
- Bug disappears because Advanced Mode handles termination correctly (Manager agent signals)
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -0,0 +1,219 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P0 BUG: Simple Mode Ignores Forced Synthesis from HF Inference Failures
|
| 2 |
+
|
| 3 |
+
**Status**: Open β **Fix via SPEC_16 (Integration)**
|
| 4 |
+
**Priority**: P0 (Demo-blocking)
|
| 5 |
+
**Discovered**: 2025-12-01
|
| 6 |
+
**Affected Component**: `src/orchestrators/simple.py`
|
| 7 |
+
**Strategic Fix**: [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md)
|
| 8 |
+
**GitHub Issue**: [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
|
| 9 |
+
|
| 10 |
+
> **Decision**: Instead of patching Simple Mode, we will **INTEGRATE its capability into Advanced Mode** per SPEC_16.
|
| 11 |
+
>
|
| 12 |
+
> **What this means:**
|
| 13 |
+
> - β
Free-tier HuggingFace capability is PRESERVED via `HuggingFaceChatClient`
|
| 14 |
+
> - β
Users without API keys still get full functionality (Advanced Mode + HuggingFace backend)
|
| 15 |
+
> - ποΈ Simple Mode's redundant orchestration CODE is retired (not the capability!)
|
| 16 |
+
> - π The bug disappears because Advanced Mode's Manager agent handles termination correctly
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## Problem Statement
|
| 21 |
+
|
| 22 |
+
When HuggingFace Inference API fails 3 consecutive times, the `HFInferenceJudgeHandler` correctly returns a "forced synthesis" assessment with `sufficient=True, recommendation="synthesize"`. However, **Simple Mode's `_should_synthesize()` method ignores this signal** because of overly strict code-enforced thresholds.
|
| 23 |
+
|
| 24 |
+
### Observed Behavior
|
| 25 |
+
|
| 26 |
+
```
|
| 27 |
+
β
JUDGE_COMPLETE: Assessment: synthesize (confidence: 10%)
|
| 28 |
+
π LOOPING: Gathering more evidence... β BUG: Should have synthesized!
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
The orchestrator loops **10 full iterations** despite the judge repeatedly saying "synthesize" after iteration 4.
|
| 32 |
+
|
| 33 |
+
### Expected Behavior
|
| 34 |
+
|
| 35 |
+
When `HFInferenceJudgeHandler._create_forced_synthesis_assessment()` returns:
|
| 36 |
+
- `sufficient=True`
|
| 37 |
+
- `recommendation="synthesize"`
|
| 38 |
+
|
| 39 |
+
The orchestrator should **immediately synthesize**, regardless of score thresholds.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## Root Cause Analysis
|
| 44 |
+
|
| 45 |
+
### The Forced Synthesis Assessment (judges.py:514-549)
|
| 46 |
+
|
| 47 |
+
```python
|
| 48 |
+
def _create_forced_synthesis_assessment(self, question, evidence):
|
| 49 |
+
return JudgeAssessment(
|
| 50 |
+
details=AssessmentDetails(
|
| 51 |
+
mechanism_score=0, # β Problem 1: Score is 0
|
| 52 |
+
clinical_evidence_score=0, # β Problem 2: Score is 0
|
| 53 |
+
drug_candidates=["AI analysis required..."],
|
| 54 |
+
key_findings=findings,
|
| 55 |
+
),
|
| 56 |
+
sufficient=True, # β Correct: Says sufficient
|
| 57 |
+
confidence=0.1, # β Problem 3: Too low for emergency
|
| 58 |
+
recommendation="synthesize", # β Correct: Says synthesize
|
| 59 |
+
...
|
| 60 |
+
)
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
### The _should_synthesize Logic (simple.py:159-216)
|
| 64 |
+
|
| 65 |
+
```python
|
| 66 |
+
def _should_synthesize(self, assessment, iteration, max_iterations, evidence_count):
|
| 67 |
+
combined_score = mechanism_score + clinical_evidence_score # = 0
|
| 68 |
+
|
| 69 |
+
# Priority 1: Judge approved - BUT REQUIRES combined_score >= 10!
|
| 70 |
+
if assessment.sufficient and assessment.recommendation == "synthesize":
|
| 71 |
+
if combined_score >= 10: # β 0 >= 10 is FALSE!
|
| 72 |
+
return True, "judge_approved"
|
| 73 |
+
|
| 74 |
+
# Priority 2-5: All require scores or drug candidates we don't have
|
| 75 |
+
|
| 76 |
+
# Priority 6: Emergency synthesis
|
| 77 |
+
if is_late_iteration and evidence_count >= 30 and confidence >= 0.5:
|
| 78 |
+
# β 0.1 >= 0.5 is FALSE!
|
| 79 |
+
return True, "emergency_synthesis"
|
| 80 |
+
|
| 81 |
+
return False, "continue_searching" # β Always ends up here!
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
### The Bug
|
| 85 |
+
|
| 86 |
+
1. **Priority 1 has wrong precondition**: It checks `combined_score >= 10` even when the judge explicitly says "synthesize". The score check should be skipped when it's a forced/error recovery synthesis.
|
| 87 |
+
|
| 88 |
+
2. **Priority 6 confidence threshold is too high**: 0.5 confidence is reasonable for "emergency" synthesis, but forced synthesis from API failures uses 0.1 confidence to indicate low qualityβthis should still trigger synthesis.
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## Impact
|
| 93 |
+
|
| 94 |
+
- **User sees**: 10 iterations of "Gathering more evidence" with 0% confidence
|
| 95 |
+
- **Final output**: Partial synthesis with "Max iterations reached"
|
| 96 |
+
- **Time wasted**: ~2-3 minutes of useless API calls
|
| 97 |
+
- **UX**: Extremely confusing - user sees "synthesize" but system keeps searching
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## Proposed Fix
|
| 102 |
+
|
| 103 |
+
### ~~Option A: Patch Simple Mode~~ (REJECTED)
|
| 104 |
+
|
| 105 |
+
We considered patching `_should_synthesize()` to respect forced synthesis signals. However, this adds MORE complexity to an already complex system that we plan to delete.
|
| 106 |
+
|
| 107 |
+
### β
Strategic Fix: SPEC_16 Unification (APPROVED)
|
| 108 |
+
|
| 109 |
+
**Delete Simple Mode entirely and unify on Advanced Mode.**
|
| 110 |
+
|
| 111 |
+
See: [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md)
|
| 112 |
+
|
| 113 |
+
The implementation path:
|
| 114 |
+
|
| 115 |
+
1. **Phase 1**: Create `HuggingFaceChatClient` adapter (~150 lines)
|
| 116 |
+
- Implements `agent_framework.BaseChatClient`
|
| 117 |
+
- Wraps `huggingface_hub.InferenceClient`
|
| 118 |
+
- Enables Advanced Mode to work with free tier
|
| 119 |
+
|
| 120 |
+
2. **Phase 2**: Delete Simple Mode
|
| 121 |
+
- Remove `src/orchestrators/simple.py` (~778 lines)
|
| 122 |
+
- Remove `src/tools/search_handler.py` (~219 lines)
|
| 123 |
+
- Update factory to always use `AdvancedOrchestrator`
|
| 124 |
+
|
| 125 |
+
3. **Why this works**: Advanced Mode uses Microsoft Agent Framework's built-in termination. When JudgeAgent returns "SUFFICIENT EVIDENCE" (per SPEC_15), the Manager agent immediately delegates to ReportAgent. **No custom `_should_synthesize()` thresholds needed.**
|
| 126 |
+
|
| 127 |
+
### Why Unification > Patching
|
| 128 |
+
|
| 129 |
+
| Approach | Lines Changed | Bug Fixed? | Technical Debt |
|
| 130 |
+
|----------|---------------|------------|----------------|
|
| 131 |
+
| Patch Simple Mode | +20 lines | Temporarily | Adds complexity |
|
| 132 |
+
| **SPEC_16 Unification** | **-997 lines** | **Permanently** | **Eliminates 778 lines** |
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
## Files to DELETE (via SPEC_16)
|
| 137 |
+
|
| 138 |
+
| File | Lines | Reason |
|
| 139 |
+
|------|-------|--------|
|
| 140 |
+
| `src/orchestrators/simple.py` | 778 | Contains buggy `_should_synthesize()` - entire file deleted |
|
| 141 |
+
| `src/tools/search_handler.py` | 219 | Manager agent handles orchestration in Advanced Mode |
|
| 142 |
+
|
| 143 |
+
## Files to CREATE (via SPEC_16)
|
| 144 |
+
|
| 145 |
+
| File | Lines | Purpose |
|
| 146 |
+
|------|-------|---------|
|
| 147 |
+
| `src/clients/__init__.py` | ~10 | Package exports |
|
| 148 |
+
| `src/clients/factory.py` | ~50 | `get_chat_client()` factory |
|
| 149 |
+
| `src/clients/huggingface.py` | ~150 | `HuggingFaceChatClient` adapter |
|
| 150 |
+
|
| 151 |
+
**Net change: -997 lines deleted, +210 lines added = ~787 lines removed**
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
+
## Acceptance Criteria (SPEC_16 Implementation)
|
| 156 |
+
|
| 157 |
+
- [ ] `HuggingFaceChatClient` implements `agent_framework.BaseChatClient`
|
| 158 |
+
- [ ] `get_chat_client()` returns HuggingFace client when no OpenAI key
|
| 159 |
+
- [ ] `AdvancedOrchestrator` works with HuggingFace backend
|
| 160 |
+
- [ ] `simple.py` is deleted (778 lines removed)
|
| 161 |
+
- [ ] Free tier users get Advanced Mode with HuggingFace
|
| 162 |
+
- [ ] No more "continue_searching" loops when HF fails
|
| 163 |
+
- [ ] Manager agent respects "SUFFICIENT EVIDENCE" signal (SPEC_15)
|
| 164 |
+
|
| 165 |
+
---
|
| 166 |
+
|
| 167 |
+
## Test Case (SPEC_16 Verification)
|
| 168 |
+
|
| 169 |
+
```python
|
| 170 |
+
@pytest.mark.asyncio
|
| 171 |
+
async def test_unified_architecture_handles_hf_failures():
|
| 172 |
+
"""
|
| 173 |
+
After SPEC_16: Free tier uses Advanced Mode with HuggingFace backend.
|
| 174 |
+
When HF fails, Manager agent should trigger synthesis via ReportAgent.
|
| 175 |
+
|
| 176 |
+
This test replaces the old Simple Mode test because:
|
| 177 |
+
- simple.py is DELETED
|
| 178 |
+
- Advanced Mode handles termination via Manager agent signals
|
| 179 |
+
- No _should_synthesize() thresholds to bypass
|
| 180 |
+
"""
|
| 181 |
+
from unittest.mock import patch, MagicMock
|
| 182 |
+
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 183 |
+
from src.clients.factory import get_chat_client
|
| 184 |
+
|
| 185 |
+
# Verify factory returns HuggingFace client when no OpenAI key
|
| 186 |
+
with patch("src.utils.config.settings") as mock_settings:
|
| 187 |
+
mock_settings.has_openai_key = False
|
| 188 |
+
mock_settings.has_gemini_key = False
|
| 189 |
+
mock_settings.has_huggingface_key = True
|
| 190 |
+
|
| 191 |
+
client = get_chat_client()
|
| 192 |
+
assert "HuggingFace" in type(client).__name__
|
| 193 |
+
|
| 194 |
+
# Verify AdvancedOrchestrator accepts HuggingFace client
|
| 195 |
+
# (The actual termination is handled by Manager agent respecting
|
| 196 |
+
# "SUFFICIENT EVIDENCE" signals per SPEC_15)
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
## Related Issues & Specs
|
| 202 |
+
|
| 203 |
+
| Reference | Type | Relationship |
|
| 204 |
+
|-----------|------|--------------|
|
| 205 |
+
| [SPEC_16](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) | Spec | **THE FIX** - Unified architecture eliminates this bug |
|
| 206 |
+
| [SPEC_15](../specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md) | Spec | Manager agent termination logic (already implemented) |
|
| 207 |
+
| [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105) | GitHub | Deprecate Simple Mode |
|
| 208 |
+
| [Issue #109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109) | GitHub | Simplify Provider Architecture |
|
| 209 |
+
| [Issue #110](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/110) | GitHub | Remove Anthropic Support |
|
| 210 |
+
| PR #71 (SPEC_06) | PR | Added `_should_synthesize()` - now causes this bug |
|
| 211 |
+
| Commit 5e761eb | Commit | Added `_create_forced_synthesis_assessment()` |
|
| 212 |
+
|
| 213 |
+
---
|
| 214 |
+
|
| 215 |
+
## References
|
| 216 |
+
|
| 217 |
+
- `src/orchestrators/simple.py:159-216` - `_should_synthesize()` method
|
| 218 |
+
- `src/agent_factory/judges.py:514-549` - `_create_forced_synthesis_assessment()`
|
| 219 |
+
- `src/agent_factory/judges.py:477-512` - `_create_quota_exhausted_assessment()`
|
|
@@ -1,279 +1,350 @@
|
|
| 1 |
# SPEC_16: Unified Chat Client Architecture
|
| 2 |
|
| 3 |
**Status**: Proposed
|
| 4 |
-
**Priority**:
|
| 5 |
-
**Issue**: Updates [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105), [#109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109)
|
| 6 |
**Created**: 2025-12-01
|
| 7 |
-
**Last
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Summary
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
-
2. **Full-Stack Provider Chain**: Prioritize providers that offer both LLM and Embeddings (OpenAI, Gemini, HuggingFace+Local) to ensure a unified environment.
|
| 17 |
-
3. **Fragmentation Reduction**: Remove "LLM-only" providers (Anthropic) that force complex hybrid dependency chains (e.g., Anthropic LLM + OpenAI Embeddings).
|
| 18 |
|
| 19 |
-
##
|
| 20 |
|
| 21 |
-
###
|
| 22 |
|
| 23 |
```text
|
| 24 |
User Query
|
| 25 |
β
|
| 26 |
βββ Has API Key? ββYesβββ Advanced Mode (488 lines)
|
| 27 |
β βββ Microsoft Agent Framework
|
| 28 |
-
β βββ OpenAIChatClient (hardcoded
|
| 29 |
β
|
| 30 |
βββ No API Key? βββββββββββ Simple Mode (778 lines)
|
| 31 |
-
βββ While-loop orchestration
|
| 32 |
βββ Pydantic AI + HuggingFace
|
| 33 |
```
|
| 34 |
|
| 35 |
-
**
|
| 36 |
-
1. **Double Maintenance**: 1,266 lines across two orchestrator systems.
|
| 37 |
-
2. **Namespace Lock-in**: The Advanced Orchestrator is tightly coupled to `OpenAIChatClient` (25 references across 5 files).
|
| 38 |
-
3. **Fragmented Chains**: Using Anthropic requires a "Frankenstein" chain (Anthropic LLM + OpenAI Embeddings).
|
| 39 |
-
4. **Testing Burden**: Two test suites, two CI paths.
|
| 40 |
-
|
| 41 |
-
## Proposed Solution: ChatClientFactory
|
| 42 |
|
| 43 |
-
### Architecture
|
| 44 |
|
| 45 |
```text
|
| 46 |
User Query
|
| 47 |
β
|
| 48 |
-
ββββ Advanced Mode (unified)
|
| 49 |
βββ Microsoft Agent Framework
|
| 50 |
-
βββ
|
| 51 |
-
|
| 52 |
-
βββ
|
| 53 |
-
|
|
|
|
| 54 |
```
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
-
src/
|
| 60 |
-
βββ clients/
|
| 61 |
-
β βββ __init__.py
|
| 62 |
-
β βββ base.py # Re-export BaseChatClient (The neutral protocol)
|
| 63 |
-
β βββ factory.py # ChatClientFactory
|
| 64 |
-
β βββ huggingface.py # HuggingFaceChatClient
|
| 65 |
-
β βββ gemini.py # GeminiChatClient [Future]
|
| 66 |
-
```
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
|
| 71 |
-
# src/clients/factory.py
|
| 72 |
-
from agent_framework import BaseChatClient
|
| 73 |
-
from agent_framework.openai import OpenAIChatClient
|
| 74 |
-
from src.utils.config import settings
|
| 75 |
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
)
|
| 80 |
-
|
| 81 |
-
|
| 82 |
|
| 83 |
-
|
| 84 |
-
1. Explicit provider parameter
|
| 85 |
-
2. OpenAI key (Best Function Calling)
|
| 86 |
-
3. Gemini key (Best Context/Cost)
|
| 87 |
-
4. HuggingFace (Free Fallback)
|
| 88 |
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
-
|
| 94 |
-
Configured BaseChatClient instance (Neutral Namespace)
|
| 95 |
-
"""
|
| 96 |
-
# OpenAI (Standard)
|
| 97 |
-
if provider == "openai" or (provider is None and settings.has_openai_key):
|
| 98 |
-
return OpenAIChatClient(
|
| 99 |
-
model_id=settings.openai_model,
|
| 100 |
-
api_key=api_key or settings.openai_api_key,
|
| 101 |
-
)
|
| 102 |
|
| 103 |
-
|
| 104 |
-
if provider == "gemini" or (provider is None and settings.has_gemini_key):
|
| 105 |
-
from src.clients.gemini import GeminiChatClient
|
| 106 |
-
return GeminiChatClient(
|
| 107 |
-
model_id="gemini-2.0-flash",
|
| 108 |
-
api_key=api_key or settings.gemini_api_key,
|
| 109 |
-
)
|
| 110 |
|
| 111 |
-
|
| 112 |
-
from src.clients.huggingface import HuggingFaceChatClient
|
| 113 |
-
return HuggingFaceChatClient(
|
| 114 |
-
model_id="meta-llama/Llama-3.1-70B-Instruct",
|
| 115 |
-
)
|
| 116 |
-
```
|
| 117 |
|
| 118 |
-
###
|
| 119 |
|
| 120 |
```python
|
| 121 |
-
#
|
| 122 |
-
|
| 123 |
-
# BEFORE (hardcoded namespace):
|
| 124 |
from agent_framework.openai import OpenAIChatClient
|
| 125 |
|
| 126 |
class AdvancedOrchestrator:
|
| 127 |
def __init__(self, ...):
|
| 128 |
-
self._chat_client = OpenAIChatClient(...)
|
| 129 |
|
| 130 |
-
# AFTER (neutral
|
|
|
|
| 131 |
from src.clients.factory import get_chat_client
|
| 132 |
|
| 133 |
class AdvancedOrchestrator:
|
| 134 |
-
def __init__(self,
|
| 135 |
-
|
| 136 |
-
self._chat_client = chat_client or get_chat_client(
|
| 137 |
-
provider=provider,
|
| 138 |
-
api_key=api_key,
|
| 139 |
-
)
|
| 140 |
```
|
| 141 |
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
## Technical Requirements
|
| 145 |
-
|
| 146 |
-
### BaseChatClient Protocol (Verified)
|
| 147 |
-
|
| 148 |
-
The `agent_framework.BaseChatClient` requires implementing **2 abstract methods**:
|
| 149 |
|
| 150 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
class HuggingFaceChatClient(BaseChatClient):
|
| 152 |
-
"""Adapter
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
async def _inner_get_response(
|
| 155 |
self,
|
| 156 |
messages: list[ChatMessage],
|
| 157 |
**kwargs
|
| 158 |
) -> ChatResponse:
|
| 159 |
-
"""
|
| 160 |
-
|
|
|
|
| 161 |
|
| 162 |
-
|
| 163 |
-
self
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
...
|
| 169 |
```
|
| 170 |
|
| 171 |
-
###
|
| 172 |
-
|
| 173 |
-
**BEFORE implementation**, add to `src/utils/config.py`:
|
| 174 |
|
| 175 |
```python
|
| 176 |
-
#
|
| 177 |
-
|
|
|
|
|
|
|
| 178 |
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
```
|
| 184 |
|
| 185 |
---
|
| 186 |
|
| 187 |
-
##
|
| 188 |
|
| 189 |
-
###
|
| 190 |
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
| `src/agents/retrieval_agent.py` | 5, 53, 62 | Change type hints to `BaseChatClient` |
|
| 196 |
-
| `src/agents/code_executor_agent.py` | 7, 43, 52 | Change type hints to `BaseChatClient` |
|
| 197 |
-
| `src/utils/llm_factory.py` | 19, 22, 35, 38, 42 | Merge into `clients/factory.py` |
|
| 198 |
|
| 199 |
-
|
|
|
|
|
|
|
| 200 |
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
| `src/agent_factory/judges.py` | 10 | Remove Anthropic imports and fallback |
|
| 204 |
-
| `src/utils/config.py` | 10 | Remove `anthropic_api_key`, `anthropic_model`, `has_anthropic_key` |
|
| 205 |
-
| `src/utils/llm_factory.py` | 10 | Remove Anthropic model creation |
|
| 206 |
-
| `src/app.py` | 12 | Remove Anthropic key detection and UI |
|
| 207 |
-
| `src/orchestrators/simple.py` | 2 | Remove Anthropic mentions |
|
| 208 |
-
| `src/agents/hypothesis_agent.py` | 1 | Update comment |
|
| 209 |
|
| 210 |
-
|
|
|
|
| 211 |
|
| 212 |
-
|
| 213 |
-
|------|-------|--------|
|
| 214 |
-
| `src/orchestrators/simple.py` | 778 | Replaced by unified Advanced Mode |
|
| 215 |
-
| `src/tools/search_handler.py` | 219 | Manager agent handles orchestration |
|
| 216 |
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
|
| 221 |
---
|
| 222 |
|
| 223 |
## Migration Plan
|
| 224 |
|
| 225 |
-
### Phase 1:
|
| 226 |
-
|
| 227 |
- [ ] Create `src/clients/` package
|
| 228 |
-
- [ ] Implement `HuggingFaceChatClient`
|
| 229 |
-
-
|
| 230 |
-
-
|
| 231 |
-
-
|
| 232 |
-
- [ ]
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
- [ ] Update
|
| 240 |
-
- [ ]
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
- [ ]
|
| 244 |
-
- [ ]
|
| 245 |
-
|
| 246 |
-
-
|
| 247 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
|
| 249 |
---
|
| 250 |
|
| 251 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 252 |
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
|
|
|
|
|
|
|
|
|
| 256 |
|
| 257 |
---
|
| 258 |
|
| 259 |
-
## Verification Checklist
|
| 260 |
|
| 261 |
-
|
| 262 |
|
| 263 |
-
- [x] `agent_framework.BaseChatClient` exists
|
| 264 |
- [x] Abstract methods: `_inner_get_response`, `_inner_get_streaming_response`
|
| 265 |
-
- [x] `
|
| 266 |
-
- [x] `
|
| 267 |
-
- [
|
| 268 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 269 |
|
| 270 |
---
|
| 271 |
|
| 272 |
## References
|
| 273 |
|
| 274 |
- Microsoft Agent Framework: `agent_framework.BaseChatClient`
|
| 275 |
-
- Gemini API: [Embeddings + LLM](https://ai.google.dev/gemini-api/docs/embeddings)
|
| 276 |
- HuggingFace Inference: `huggingface_hub.InferenceClient`
|
| 277 |
-
- Issue #105: Deprecate Simple Mode
|
| 278 |
- Issue #109: Simplify Provider Architecture
|
| 279 |
- Issue #110: Remove Anthropic Provider Support
|
|
|
|
|
|
| 1 |
# SPEC_16: Unified Chat Client Architecture
|
| 2 |
|
| 3 |
**Status**: Proposed
|
| 4 |
+
**Priority**: P0 (Fixes Critical Bug #113)
|
| 5 |
+
**Issue**: Updates [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105), [#109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109), **[#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)** (P0 Bug)
|
| 6 |
**Created**: 2025-12-01
|
| 7 |
+
**Last Updated**: 2025-12-01
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## β οΈ CRITICAL CLARIFICATION: Integration, Not Deletion
|
| 12 |
+
|
| 13 |
+
**This spec INTEGRATES Simple Mode's free-tier capability into Advanced Mode.**
|
| 14 |
+
|
| 15 |
+
| What We're Doing | What We're NOT Doing |
|
| 16 |
+
|------------------|----------------------|
|
| 17 |
+
| β
Integrating HuggingFace support into Advanced Mode | β Removing free-tier capability |
|
| 18 |
+
| β
Unifying two parallel implementations into one | β Breaking functionality for users without API keys |
|
| 19 |
+
| β
Deleting redundant orchestration CODE | β Deleting the CAPABILITY that code provided |
|
| 20 |
+
| β
Making Advanced Mode work with ANY provider | β Locking users into paid-only tiers |
|
| 21 |
+
|
| 22 |
+
**After this spec:**
|
| 23 |
+
- Users WITH OpenAI key β Advanced Mode (OpenAI backend) β
|
| 24 |
+
- Users WITHOUT any key β Advanced Mode (HuggingFace backend) β
**SAME CAPABILITY, UNIFIED ARCHITECTURE**
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
|
| 28 |
## Summary
|
| 29 |
|
| 30 |
+
Unify Simple Mode and Advanced Mode into a **single orchestration system** by:
|
| 31 |
+
|
| 32 |
+
1. **Renaming the namespace**: `OpenAIChatClient` β `BaseChatClient` (neutral protocol)
|
| 33 |
+
2. **Creating an adapter**: `HuggingFaceChatClient` implements `BaseChatClient`
|
| 34 |
+
3. **Retiring parallel code**: Simple Mode's while-loop becomes unnecessary
|
| 35 |
+
|
| 36 |
+
The result: **One codebase, multiple providers, zero parallel universes.**
|
| 37 |
|
| 38 |
+
> **π₯ P0 Bug Fix**: This also resolves Issue #113. Simple Mode's `_should_synthesize()` has a bug that ignores forced synthesis signals. Advanced Mode's Manager agent handles termination correctly. By integrating, the bug disappears.
|
| 39 |
|
| 40 |
+
---
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
## The Integration Concept
|
| 43 |
|
| 44 |
+
### Before: Two Parallel Universes (Current)
|
| 45 |
|
| 46 |
```text
|
| 47 |
User Query
|
| 48 |
β
|
| 49 |
βββ Has API Key? ββYesβββ Advanced Mode (488 lines)
|
| 50 |
β βββ Microsoft Agent Framework
|
| 51 |
+
β βββ OpenAIChatClient (hardcoded) βββ THE BOTTLENECK
|
| 52 |
β
|
| 53 |
βββ No API Key? βββββββββββ Simple Mode (778 lines)
|
| 54 |
+
βββ While-loop orchestration (SEPARATE CODE)
|
| 55 |
βββ Pydantic AI + HuggingFace
|
| 56 |
```
|
| 57 |
|
| 58 |
+
**Problem**: Same capability, two implementations, double maintenance, P0 bug in Simple Mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
+
### After: Unified Architecture (This Spec)
|
| 61 |
|
| 62 |
```text
|
| 63 |
User Query
|
| 64 |
β
|
| 65 |
+
ββββ Advanced Mode (unified) βββ ONE SYSTEM FOR ALL USERS
|
| 66 |
βββ Microsoft Agent Framework
|
| 67 |
+
βββ get_chat_client() returns: βββ NAMESPACE NEUTRAL
|
| 68 |
+
β
|
| 69 |
+
βββ OpenAIChatClient (if OpenAI key present)
|
| 70 |
+
βββ GeminiChatClient (if Gemini key present) [Future]
|
| 71 |
+
βββ HuggingFaceChatClient (fallback - FREE TIER) βββ INTEGRATED!
|
| 72 |
```
|
| 73 |
|
| 74 |
+
**Result**: Free-tier users get the SAME Advanced Mode experience, just with HuggingFace as the LLM backend.
|
| 75 |
|
| 76 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
## What Gets Integrated vs Retired
|
| 79 |
|
| 80 |
+
### β
INTEGRATED (Capability Preserved)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
+
| Simple Mode Component | Integration Target | How |
|
| 83 |
+
|-----------------------|-------------------|-----|
|
| 84 |
+
| HuggingFace LLM calls | `HuggingFaceChatClient` | New adapter (~150 lines) |
|
| 85 |
+
| Free-tier access | `get_chat_client()` factory | Auto-selects HF when no key |
|
| 86 |
+
| Search tools (PubMed, etc.) | Already shared | `src/agents/tools.py` |
|
| 87 |
+
| Evidence models | Already shared | `src/utils/models.py` |
|
| 88 |
|
| 89 |
+
### ποΈ RETIRED (Redundant Code Removed)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
+
| Simple Mode Component | Why Retired | Replacement in Advanced Mode |
|
| 92 |
+
|-----------------------|-------------|------------------------------|
|
| 93 |
+
| While-loop orchestration | Redundant | Manager agent orchestrates |
|
| 94 |
+
| `_should_synthesize()` thresholds | **BUGGY** (P0 #113) | Manager agent signals |
|
| 95 |
+
| `SearchHandler` scatter-gather | Redundant | SearchAgent handles this |
|
| 96 |
+
| `JudgeHandler` | Redundant | JudgeAgent handles this |
|
| 97 |
|
| 98 |
+
**Key insight**: We're not losing functionality. We're consolidating two implementations of the SAME functionality into one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
+
## Technical Implementation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
### The Single Change That Enables Unification
|
| 105 |
|
| 106 |
```python
|
| 107 |
+
# BEFORE (hardcoded to OpenAI):
|
|
|
|
|
|
|
| 108 |
from agent_framework.openai import OpenAIChatClient
|
| 109 |
|
| 110 |
class AdvancedOrchestrator:
|
| 111 |
def __init__(self, ...):
|
| 112 |
+
self._chat_client = OpenAIChatClient(...) # β Only OpenAI works
|
| 113 |
|
| 114 |
+
# AFTER (neutral - any provider):
|
| 115 |
+
from agent_framework import BaseChatClient
|
| 116 |
from src.clients.factory import get_chat_client
|
| 117 |
|
| 118 |
class AdvancedOrchestrator:
|
| 119 |
+
def __init__(self, ...):
|
| 120 |
+
self._chat_client = get_chat_client() # β
OpenAI, Gemini, OR HuggingFace
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
```
|
| 122 |
|
| 123 |
+
### HuggingFaceChatClient Adapter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
|
| 125 |
```python
|
| 126 |
+
# src/clients/huggingface.py
|
| 127 |
+
from agent_framework import BaseChatClient, ChatMessage, ChatResponse
|
| 128 |
+
from huggingface_hub import InferenceClient
|
| 129 |
+
|
| 130 |
class HuggingFaceChatClient(BaseChatClient):
|
| 131 |
+
"""Adapter that makes HuggingFace work with Microsoft Agent Framework."""
|
| 132 |
+
|
| 133 |
+
def __init__(self, model_id: str = "meta-llama/Llama-3.1-70B-Instruct"):
|
| 134 |
+
self._client = InferenceClient(model=model_id)
|
| 135 |
+
self._model_id = model_id
|
| 136 |
|
| 137 |
async def _inner_get_response(
|
| 138 |
self,
|
| 139 |
messages: list[ChatMessage],
|
| 140 |
**kwargs
|
| 141 |
) -> ChatResponse:
|
| 142 |
+
"""Convert HuggingFace response to Agent Framework format."""
|
| 143 |
+
# Convert messages to HF format
|
| 144 |
+
hf_messages = [{"role": m.role, "content": m.content} for m in messages]
|
| 145 |
|
| 146 |
+
# Call HuggingFace
|
| 147 |
+
response = self._client.chat_completion(messages=hf_messages)
|
| 148 |
+
|
| 149 |
+
# Convert back to Agent Framework format
|
| 150 |
+
return ChatResponse(
|
| 151 |
+
content=response.choices[0].message.content,
|
| 152 |
+
# ... other fields
|
| 153 |
+
)
|
| 154 |
+
|
| 155 |
+
async def _inner_get_streaming_response(self, ...):
|
| 156 |
+
"""Streaming version."""
|
| 157 |
...
|
| 158 |
```
|
| 159 |
|
| 160 |
+
### ChatClientFactory
|
|
|
|
|
|
|
| 161 |
|
| 162 |
```python
|
| 163 |
+
# src/clients/factory.py
|
| 164 |
+
from agent_framework import BaseChatClient
|
| 165 |
+
from agent_framework.openai import OpenAIChatClient
|
| 166 |
+
from src.utils.config import settings
|
| 167 |
|
| 168 |
+
def get_chat_client(provider: str | None = None) -> BaseChatClient:
|
| 169 |
+
"""
|
| 170 |
+
Factory that returns the appropriate chat client.
|
| 171 |
+
|
| 172 |
+
Priority:
|
| 173 |
+
1. OpenAI (if key available) - Best function calling, GPT-5
|
| 174 |
+
2. Gemini (if key available) - Good alternative [Future]
|
| 175 |
+
3. HuggingFace (always available) - FREE TIER FALLBACK
|
| 176 |
+
"""
|
| 177 |
+
if provider == "openai" or (provider is None and settings.has_openai_key):
|
| 178 |
+
return OpenAIChatClient(
|
| 179 |
+
model_id=settings.openai_model, # gpt-5
|
| 180 |
+
api_key=settings.openai_api_key,
|
| 181 |
+
)
|
| 182 |
+
|
| 183 |
+
# Future: Gemini support
|
| 184 |
+
# if settings.has_gemini_key:
|
| 185 |
+
# return GeminiChatClient(...)
|
| 186 |
+
|
| 187 |
+
# FREE TIER: HuggingFace (no API key required for public models)
|
| 188 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 189 |
+
return HuggingFaceChatClient(
|
| 190 |
+
model_id="meta-llama/Llama-3.1-70B-Instruct",
|
| 191 |
+
)
|
| 192 |
```
|
| 193 |
|
| 194 |
---
|
| 195 |
|
| 196 |
+
## Why This Fixes P0 Bug #113
|
| 197 |
|
| 198 |
+
### The Bug (Simple Mode)
|
| 199 |
|
| 200 |
+
```python
|
| 201 |
+
# src/orchestrators/simple.py - THE BUG
|
| 202 |
+
def _should_synthesize(self, assessment, ...):
|
| 203 |
+
# When HF fails, judge returns: score=0, confidence=0.1, recommendation="synthesize"
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
+
if assessment.sufficient and assessment.recommendation == "synthesize":
|
| 206 |
+
if combined_score >= 10: # β 0 >= 10 is FALSE
|
| 207 |
+
return True
|
| 208 |
|
| 209 |
+
if confidence >= 0.5: # β 0.1 >= 0.5 is FALSE
|
| 210 |
+
return True, "emergency"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
|
| 212 |
+
return False, "continue_searching" # β LOOPS FOREVER
|
| 213 |
+
```
|
| 214 |
|
| 215 |
+
### The Fix (Advanced Mode - Already Works Correctly)
|
|
|
|
|
|
|
|
|
|
| 216 |
|
| 217 |
+
```python
|
| 218 |
+
# Advanced Mode doesn't have this bug because:
|
| 219 |
+
# 1. JudgeAgent says "SUFFICIENT EVIDENCE" in natural language
|
| 220 |
+
# 2. Manager agent understands this and delegates to ReportAgent
|
| 221 |
+
# 3. No hardcoded thresholds to bypass
|
| 222 |
+
|
| 223 |
+
# The Manager agent prompt (src/orchestrators/advanced.py:152):
|
| 224 |
+
"""
|
| 225 |
+
When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
|
| 226 |
+
β IMMEDIATELY delegate to ReportAgent for synthesis
|
| 227 |
+
"""
|
| 228 |
+
```
|
| 229 |
+
|
| 230 |
+
**By integrating Simple Mode's capability into Advanced Mode, the bug disappears** because Advanced Mode's termination logic works correctly.
|
| 231 |
|
| 232 |
---
|
| 233 |
|
| 234 |
## Migration Plan
|
| 235 |
|
| 236 |
+
### Phase 1: Create HuggingFaceChatClient (Enables Integration)
|
| 237 |
+
|
| 238 |
- [ ] Create `src/clients/` package
|
| 239 |
+
- [ ] Implement `HuggingFaceChatClient` (~150 lines)
|
| 240 |
+
- Extends `agent_framework.BaseChatClient`
|
| 241 |
+
- Wraps `huggingface_hub.InferenceClient.chat_completion()`
|
| 242 |
+
- Implements required abstract methods
|
| 243 |
+
- [ ] Implement `get_chat_client()` factory (~50 lines)
|
| 244 |
+
- [ ] Add unit tests
|
| 245 |
+
|
| 246 |
+
**Exit Criteria**: `get_chat_client()` returns working HuggingFace client when no API key.
|
| 247 |
+
|
| 248 |
+
### Phase 2: Integrate into Advanced Mode (Fixes P0 Bug)
|
| 249 |
+
|
| 250 |
+
- [ ] Update `AdvancedOrchestrator` to use `get_chat_client()`
|
| 251 |
+
- [ ] Update `magentic_agents.py` type hints: `OpenAIChatClient` β `BaseChatClient`
|
| 252 |
+
- [ ] Update `orchestrators/factory.py` to always return `AdvancedOrchestrator`
|
| 253 |
+
- [ ] Update `app.py` to remove mode toggle (everyone gets Advanced Mode)
|
| 254 |
+
- [ ] Archive `simple.py` to `docs/archive/` (for reference)
|
| 255 |
+
- [ ] Migrate Simple Mode tests to Advanced Mode tests
|
| 256 |
+
|
| 257 |
+
**Exit Criteria**: Free-tier users get Advanced Mode with HuggingFace backend. P0 bug gone.
|
| 258 |
+
|
| 259 |
+
### Phase 3: Cleanup (Optional)
|
| 260 |
+
|
| 261 |
+
- [ ] Remove Anthropic provider code (Issue #110)
|
| 262 |
+
- [ ] Add Gemini support (Issue #109)
|
| 263 |
+
- [ ] Delete archived files after verification period
|
| 264 |
|
| 265 |
---
|
| 266 |
|
| 267 |
+
## Files Changed
|
| 268 |
+
|
| 269 |
+
### New Files (~200 lines)
|
| 270 |
+
|
| 271 |
+
| File | Lines | Purpose |
|
| 272 |
+
|------|-------|---------|
|
| 273 |
+
| `src/clients/__init__.py` | ~10 | Package exports |
|
| 274 |
+
| `src/clients/factory.py` | ~50 | `get_chat_client()` |
|
| 275 |
+
| `src/clients/huggingface.py` | ~150 | HuggingFace adapter |
|
| 276 |
+
|
| 277 |
+
### Modified Files
|
| 278 |
+
|
| 279 |
+
| File | Change |
|
| 280 |
+
|------|--------|
|
| 281 |
+
| `src/orchestrators/advanced.py` | Use `get_chat_client()` instead of `OpenAIChatClient` |
|
| 282 |
+
| `src/orchestrators/factory.py` | Always return `AdvancedOrchestrator` |
|
| 283 |
+
| `src/agents/magentic_agents.py` | Type hints: `OpenAIChatClient` β `BaseChatClient` |
|
| 284 |
+
| `src/app.py` | Remove mode toggle, always use Advanced |
|
| 285 |
|
| 286 |
+
### Archived Files (NOT deleted from git history)
|
| 287 |
+
|
| 288 |
+
| File | Lines | Reason |
|
| 289 |
+
|------|-------|--------|
|
| 290 |
+
| `src/orchestrators/simple.py` | 778 | Functionality INTEGRATED, code retired |
|
| 291 |
+
| `src/tools/search_handler.py` | 219 | Manager agent handles this now |
|
| 292 |
|
| 293 |
---
|
| 294 |
|
| 295 |
+
## Verification Checklist
|
| 296 |
|
| 297 |
+
### Technical Prerequisites (Verified β
)
|
| 298 |
|
| 299 |
+
- [x] `agent_framework.BaseChatClient` exists
|
| 300 |
- [x] Abstract methods: `_inner_get_response`, `_inner_get_streaming_response`
|
| 301 |
+
- [x] `huggingface_hub.InferenceClient.chat_completion()` exists
|
| 302 |
+
- [x] `chat_completion()` has `tools` parameter (verified in 0.36.0)
|
| 303 |
+
- [x] HuggingFace supports Llama 3.1 70B via free inference
|
| 304 |
+
- [x] **Dependency pinned**: `huggingface-hub>=0.24.0` in pyproject.toml (required for stable tool calling)
|
| 305 |
+
|
| 306 |
+
### Capability Preservation Checklist
|
| 307 |
+
|
| 308 |
+
After implementation, verify:
|
| 309 |
+
|
| 310 |
+
- [ ] User with OpenAI key β Gets Advanced Mode with OpenAI (GPT-5)
|
| 311 |
+
- [ ] User with NO key β Gets Advanced Mode with HuggingFace (Llama 3.1 70B)
|
| 312 |
+
- [ ] Free-tier search works (PubMed, ClinicalTrials, EuropePMC)
|
| 313 |
+
- [ ] Free-tier synthesis works (LLM generates report)
|
| 314 |
+
- [ ] No more "continue_searching" infinite loops (P0 bug fixed)
|
| 315 |
+
|
| 316 |
+
---
|
| 317 |
+
|
| 318 |
+
## Implementation Notes (From Independent Audit)
|
| 319 |
+
|
| 320 |
+
### Dependency Requirement β
FIXED
|
| 321 |
+
|
| 322 |
+
The `huggingface-hub` package must be `>=0.24.0` for stable `chat_completion` with tools support.
|
| 323 |
+
|
| 324 |
+
```toml
|
| 325 |
+
# pyproject.toml - ALREADY UPDATED
|
| 326 |
+
"huggingface-hub>=0.24.0", # Required for stable chat_completion with tools
|
| 327 |
+
```
|
| 328 |
+
|
| 329 |
+
### Llama 3.1 Prompt Considerations β οΈ
|
| 330 |
+
|
| 331 |
+
The Manager agent prompt in `AdvancedOrchestrator._create_task_prompt()` was optimized for GPT-5. When using Llama 3.1 70B via HuggingFace, the prompt **may need tuning** to ensure strict adherence to delegation logic.
|
| 332 |
+
|
| 333 |
+
**Potential issue**: Llama 3.1 may not immediately delegate to ReportAgent when JudgeAgent says "SUFFICIENT EVIDENCE".
|
| 334 |
+
|
| 335 |
+
**Mitigation**: During implementation, test with HuggingFace backend and add reinforcement phrases if needed:
|
| 336 |
+
- "You MUST delegate to ReportAgent when you see SUFFICIENT EVIDENCE"
|
| 337 |
+
- "Do NOT continue searching after Judge approves"
|
| 338 |
+
|
| 339 |
+
This is a **runtime verification** task, not a spec change.
|
| 340 |
|
| 341 |
---
|
| 342 |
|
| 343 |
## References
|
| 344 |
|
| 345 |
- Microsoft Agent Framework: `agent_framework.BaseChatClient`
|
|
|
|
| 346 |
- HuggingFace Inference: `huggingface_hub.InferenceClient`
|
| 347 |
+
- Issue #105: Deprecate Simple Mode β **Reframe as "Integrate Simple Mode"**
|
| 348 |
- Issue #109: Simplify Provider Architecture
|
| 349 |
- Issue #110: Remove Anthropic Provider Support
|
| 350 |
+
- Issue #113: P0 Bug - Simple Mode ignores forced synthesis
|
|
@@ -17,7 +17,7 @@ dependencies = [
|
|
| 17 |
"httpx>=0.27", # Async HTTP client (PubMed)
|
| 18 |
"beautifulsoup4>=4.12", # HTML parsing
|
| 19 |
"xmltodict>=0.13", # PubMed XML -> dict
|
| 20 |
-
"huggingface-hub>=0.
|
| 21 |
# UI
|
| 22 |
"gradio[mcp]>=6.0.0", # Chat interface with MCP server support (6.0 required for css in launch())
|
| 23 |
# Utils
|
|
|
|
| 17 |
"httpx>=0.27", # Async HTTP client (PubMed)
|
| 18 |
"beautifulsoup4>=4.12", # HTML parsing
|
| 19 |
"xmltodict>=0.13", # PubMed XML -> dict
|
| 20 |
+
"huggingface-hub>=0.24.0", # Hugging Face Inference API - 0.24.0 required for stable chat_completion with tools
|
| 21 |
# UI
|
| 22 |
"gradio[mcp]>=6.0.0", # Chat interface with MCP server support (6.0 required for css in launch())
|
| 23 |
# Utils
|
|
@@ -4,10 +4,10 @@ import asyncio
|
|
| 4 |
|
| 5 |
import structlog
|
| 6 |
from agent_framework import ChatAgent, ai_function
|
| 7 |
-
from agent_framework.openai import OpenAIChatClient
|
| 8 |
|
|
|
|
|
|
|
| 9 |
from src.tools.code_execution import get_code_executor
|
| 10 |
-
from src.utils.config import settings
|
| 11 |
|
| 12 |
logger = structlog.get_logger()
|
| 13 |
|
|
@@ -40,7 +40,7 @@ async def execute_python_code(code: str) -> str:
|
|
| 40 |
return f"Execution failed: {e}"
|
| 41 |
|
| 42 |
|
| 43 |
-
def create_code_executor_agent(chat_client:
|
| 44 |
"""Create a code executor agent.
|
| 45 |
|
| 46 |
Args:
|
|
@@ -49,10 +49,7 @@ def create_code_executor_agent(chat_client: OpenAIChatClient | None = None) -> C
|
|
| 49 |
Returns:
|
| 50 |
ChatAgent configured for code execution.
|
| 51 |
"""
|
| 52 |
-
client = chat_client or
|
| 53 |
-
model_id=settings.openai_model,
|
| 54 |
-
api_key=settings.openai_api_key,
|
| 55 |
-
)
|
| 56 |
|
| 57 |
return ChatAgent(
|
| 58 |
name="CodeExecutorAgent",
|
|
|
|
| 4 |
|
| 5 |
import structlog
|
| 6 |
from agent_framework import ChatAgent, ai_function
|
|
|
|
| 7 |
|
| 8 |
+
from src.clients.base import BaseChatClient
|
| 9 |
+
from src.clients.factory import get_chat_client
|
| 10 |
from src.tools.code_execution import get_code_executor
|
|
|
|
| 11 |
|
| 12 |
logger = structlog.get_logger()
|
| 13 |
|
|
|
|
| 40 |
return f"Execution failed: {e}"
|
| 41 |
|
| 42 |
|
| 43 |
+
def create_code_executor_agent(chat_client: BaseChatClient | None = None) -> ChatAgent:
|
| 44 |
"""Create a code executor agent.
|
| 45 |
|
| 46 |
Args:
|
|
|
|
| 49 |
Returns:
|
| 50 |
ChatAgent configured for code execution.
|
| 51 |
"""
|
| 52 |
+
client = chat_client or get_chat_client()
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
return ChatAgent(
|
| 55 |
name="CodeExecutorAgent",
|
|
@@ -1,7 +1,6 @@
|
|
| 1 |
"""Magentic-compatible agents using ChatAgent pattern."""
|
| 2 |
|
| 3 |
from agent_framework import ChatAgent
|
| 4 |
-
from agent_framework.openai import OpenAIChatClient
|
| 5 |
|
| 6 |
from src.agents.tools import (
|
| 7 |
get_bibliography,
|
|
@@ -9,12 +8,13 @@ from src.agents.tools import (
|
|
| 9 |
search_preprints,
|
| 10 |
search_pubmed,
|
| 11 |
)
|
|
|
|
|
|
|
| 12 |
from src.config.domain import ResearchDomain, get_domain_config
|
| 13 |
-
from src.utils.config import settings
|
| 14 |
|
| 15 |
|
| 16 |
def create_search_agent(
|
| 17 |
-
chat_client:
|
| 18 |
domain: ResearchDomain | str | None = None,
|
| 19 |
) -> ChatAgent:
|
| 20 |
"""Create a search agent with internal LLM and search tools.
|
|
@@ -26,10 +26,7 @@ def create_search_agent(
|
|
| 26 |
Returns:
|
| 27 |
ChatAgent configured for biomedical search
|
| 28 |
"""
|
| 29 |
-
client = chat_client or
|
| 30 |
-
model_id=settings.openai_model, # Use configured model
|
| 31 |
-
api_key=settings.openai_api_key,
|
| 32 |
-
)
|
| 33 |
config = get_domain_config(domain)
|
| 34 |
|
| 35 |
return ChatAgent(
|
|
@@ -55,7 +52,7 @@ related to {config.name}.""",
|
|
| 55 |
|
| 56 |
|
| 57 |
def create_judge_agent(
|
| 58 |
-
chat_client:
|
| 59 |
domain: ResearchDomain | str | None = None,
|
| 60 |
) -> ChatAgent:
|
| 61 |
"""Create a judge agent that evaluates evidence quality.
|
|
@@ -67,10 +64,7 @@ def create_judge_agent(
|
|
| 67 |
Returns:
|
| 68 |
ChatAgent configured for evidence assessment
|
| 69 |
"""
|
| 70 |
-
client = chat_client or
|
| 71 |
-
model_id=settings.openai_model,
|
| 72 |
-
api_key=settings.openai_api_key,
|
| 73 |
-
)
|
| 74 |
config = get_domain_config(domain)
|
| 75 |
|
| 76 |
return ChatAgent(
|
|
@@ -114,7 +108,7 @@ Be rigorous but fair. Look for:
|
|
| 114 |
|
| 115 |
|
| 116 |
def create_hypothesis_agent(
|
| 117 |
-
chat_client:
|
| 118 |
domain: ResearchDomain | str | None = None,
|
| 119 |
) -> ChatAgent:
|
| 120 |
"""Create a hypothesis generation agent.
|
|
@@ -126,10 +120,7 @@ def create_hypothesis_agent(
|
|
| 126 |
Returns:
|
| 127 |
ChatAgent configured for hypothesis generation
|
| 128 |
"""
|
| 129 |
-
client = chat_client or
|
| 130 |
-
model_id=settings.openai_model,
|
| 131 |
-
api_key=settings.openai_api_key,
|
| 132 |
-
)
|
| 133 |
config = get_domain_config(domain)
|
| 134 |
|
| 135 |
return ChatAgent(
|
|
@@ -158,7 +149,7 @@ Focus on mechanistic plausibility and existing evidence.""",
|
|
| 158 |
|
| 159 |
|
| 160 |
def create_report_agent(
|
| 161 |
-
chat_client:
|
| 162 |
domain: ResearchDomain | str | None = None,
|
| 163 |
) -> ChatAgent:
|
| 164 |
"""Create a report synthesis agent.
|
|
@@ -170,10 +161,7 @@ def create_report_agent(
|
|
| 170 |
Returns:
|
| 171 |
ChatAgent configured for report generation
|
| 172 |
"""
|
| 173 |
-
client = chat_client or
|
| 174 |
-
model_id=settings.openai_model,
|
| 175 |
-
api_key=settings.openai_api_key,
|
| 176 |
-
)
|
| 177 |
config = get_domain_config(domain)
|
| 178 |
|
| 179 |
return ChatAgent(
|
|
|
|
| 1 |
"""Magentic-compatible agents using ChatAgent pattern."""
|
| 2 |
|
| 3 |
from agent_framework import ChatAgent
|
|
|
|
| 4 |
|
| 5 |
from src.agents.tools import (
|
| 6 |
get_bibliography,
|
|
|
|
| 8 |
search_preprints,
|
| 9 |
search_pubmed,
|
| 10 |
)
|
| 11 |
+
from src.clients.base import BaseChatClient
|
| 12 |
+
from src.clients.factory import get_chat_client
|
| 13 |
from src.config.domain import ResearchDomain, get_domain_config
|
|
|
|
| 14 |
|
| 15 |
|
| 16 |
def create_search_agent(
|
| 17 |
+
chat_client: BaseChatClient | None = None,
|
| 18 |
domain: ResearchDomain | str | None = None,
|
| 19 |
) -> ChatAgent:
|
| 20 |
"""Create a search agent with internal LLM and search tools.
|
|
|
|
| 26 |
Returns:
|
| 27 |
ChatAgent configured for biomedical search
|
| 28 |
"""
|
| 29 |
+
client = chat_client or get_chat_client()
|
|
|
|
|
|
|
|
|
|
| 30 |
config = get_domain_config(domain)
|
| 31 |
|
| 32 |
return ChatAgent(
|
|
|
|
| 52 |
|
| 53 |
|
| 54 |
def create_judge_agent(
|
| 55 |
+
chat_client: BaseChatClient | None = None,
|
| 56 |
domain: ResearchDomain | str | None = None,
|
| 57 |
) -> ChatAgent:
|
| 58 |
"""Create a judge agent that evaluates evidence quality.
|
|
|
|
| 64 |
Returns:
|
| 65 |
ChatAgent configured for evidence assessment
|
| 66 |
"""
|
| 67 |
+
client = chat_client or get_chat_client()
|
|
|
|
|
|
|
|
|
|
| 68 |
config = get_domain_config(domain)
|
| 69 |
|
| 70 |
return ChatAgent(
|
|
|
|
| 108 |
|
| 109 |
|
| 110 |
def create_hypothesis_agent(
|
| 111 |
+
chat_client: BaseChatClient | None = None,
|
| 112 |
domain: ResearchDomain | str | None = None,
|
| 113 |
) -> ChatAgent:
|
| 114 |
"""Create a hypothesis generation agent.
|
|
|
|
| 120 |
Returns:
|
| 121 |
ChatAgent configured for hypothesis generation
|
| 122 |
"""
|
| 123 |
+
client = chat_client or get_chat_client()
|
|
|
|
|
|
|
|
|
|
| 124 |
config = get_domain_config(domain)
|
| 125 |
|
| 126 |
return ChatAgent(
|
|
|
|
| 149 |
|
| 150 |
|
| 151 |
def create_report_agent(
|
| 152 |
+
chat_client: BaseChatClient | None = None,
|
| 153 |
domain: ResearchDomain | str | None = None,
|
| 154 |
) -> ChatAgent:
|
| 155 |
"""Create a report synthesis agent.
|
|
|
|
| 161 |
Returns:
|
| 162 |
ChatAgent configured for report generation
|
| 163 |
"""
|
| 164 |
+
client = chat_client or get_chat_client()
|
|
|
|
|
|
|
|
|
|
| 165 |
config = get_domain_config(domain)
|
| 166 |
|
| 167 |
return ChatAgent(
|
|
@@ -2,11 +2,11 @@
|
|
| 2 |
|
| 3 |
import structlog
|
| 4 |
from agent_framework import ChatAgent, ai_function
|
| 5 |
-
from agent_framework.openai import OpenAIChatClient
|
| 6 |
|
|
|
|
|
|
|
| 7 |
from src.state import get_magentic_state
|
| 8 |
from src.tools.web_search import WebSearchTool
|
| 9 |
-
from src.utils.config import settings
|
| 10 |
|
| 11 |
logger = structlog.get_logger()
|
| 12 |
|
|
@@ -50,7 +50,7 @@ async def search_web(query: str, max_results: int = 10) -> str:
|
|
| 50 |
return "\n".join(output)
|
| 51 |
|
| 52 |
|
| 53 |
-
def create_retrieval_agent(chat_client:
|
| 54 |
"""Create a retrieval agent.
|
| 55 |
|
| 56 |
Args:
|
|
@@ -59,10 +59,7 @@ def create_retrieval_agent(chat_client: OpenAIChatClient | None = None) -> ChatA
|
|
| 59 |
Returns:
|
| 60 |
ChatAgent configured for retrieval.
|
| 61 |
"""
|
| 62 |
-
client = chat_client or
|
| 63 |
-
model_id=settings.openai_model,
|
| 64 |
-
api_key=settings.openai_api_key,
|
| 65 |
-
)
|
| 66 |
|
| 67 |
return ChatAgent(
|
| 68 |
name="RetrievalAgent",
|
|
|
|
| 2 |
|
| 3 |
import structlog
|
| 4 |
from agent_framework import ChatAgent, ai_function
|
|
|
|
| 5 |
|
| 6 |
+
from src.clients.base import BaseChatClient
|
| 7 |
+
from src.clients.factory import get_chat_client
|
| 8 |
from src.state import get_magentic_state
|
| 9 |
from src.tools.web_search import WebSearchTool
|
|
|
|
| 10 |
|
| 11 |
logger = structlog.get_logger()
|
| 12 |
|
|
|
|
| 50 |
return "\n".join(output)
|
| 51 |
|
| 52 |
|
| 53 |
+
def create_retrieval_agent(chat_client: BaseChatClient | None = None) -> ChatAgent:
|
| 54 |
"""Create a retrieval agent.
|
| 55 |
|
| 56 |
Args:
|
|
|
|
| 59 |
Returns:
|
| 60 |
ChatAgent configured for retrieval.
|
| 61 |
"""
|
| 62 |
+
client = chat_client or get_chat_client()
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
return ChatAgent(
|
| 65 |
name="RetrievalAgent",
|
|
@@ -5,25 +5,15 @@ from collections.abc import AsyncGenerator
|
|
| 5 |
from typing import Any, Literal
|
| 6 |
|
| 7 |
import gradio as gr
|
| 8 |
-
from pydantic_ai.models.anthropic import AnthropicModel
|
| 9 |
-
from pydantic_ai.models.openai import OpenAIChatModel
|
| 10 |
-
from pydantic_ai.providers.anthropic import AnthropicProvider
|
| 11 |
-
from pydantic_ai.providers.openai import OpenAIProvider
|
| 12 |
|
| 13 |
-
from src.agent_factory.judges import HFInferenceJudgeHandler, JudgeHandler, MockJudgeHandler
|
| 14 |
from src.config.domain import ResearchDomain
|
| 15 |
from src.orchestrators import create_orchestrator
|
| 16 |
-
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 17 |
-
from src.tools.europepmc import EuropePMCTool
|
| 18 |
-
from src.tools.openalex import OpenAlexTool
|
| 19 |
-
from src.tools.pubmed import PubMedTool
|
| 20 |
-
from src.tools.search_handler import SearchHandler
|
| 21 |
from src.utils.config import settings
|
| 22 |
from src.utils.exceptions import ConfigurationError
|
| 23 |
from src.utils.models import OrchestratorConfig
|
| 24 |
from src.utils.service_loader import warmup_services
|
| 25 |
|
| 26 |
-
OrchestratorMode = Literal["
|
| 27 |
|
| 28 |
|
| 29 |
# CSS to force dark mode on API key input
|
|
@@ -55,16 +45,19 @@ CUSTOM_CSS = """
|
|
| 55 |
|
| 56 |
def configure_orchestrator(
|
| 57 |
use_mock: bool = False,
|
| 58 |
-
mode: OrchestratorMode = "
|
| 59 |
user_api_key: str | None = None,
|
| 60 |
domain: str | ResearchDomain | None = None,
|
| 61 |
) -> tuple[Any, str]:
|
| 62 |
"""
|
| 63 |
Create an orchestrator instance.
|
| 64 |
|
|
|
|
|
|
|
|
|
|
| 65 |
Args:
|
| 66 |
use_mock: If True, use MockJudgeHandler (no API key needed)
|
| 67 |
-
mode: Orchestrator mode ("
|
| 68 |
user_api_key: Optional user-provided API key (BYOK) - auto-detects provider
|
| 69 |
domain: Research domain (defaults to "sexual_health")
|
| 70 |
|
|
@@ -77,58 +70,35 @@ def configure_orchestrator(
|
|
| 77 |
max_results_per_tool=10,
|
| 78 |
)
|
| 79 |
|
| 80 |
-
# Create search tools
|
| 81 |
-
search_handler = SearchHandler(
|
| 82 |
-
tools=[PubMedTool(), ClinicalTrialsTool(), EuropePMCTool(), OpenAlexTool()],
|
| 83 |
-
timeout=config.search_timeout,
|
| 84 |
-
)
|
| 85 |
-
|
| 86 |
-
# Create judge (mock, real, or free tier)
|
| 87 |
-
judge_handler: JudgeHandler | MockJudgeHandler | HFInferenceJudgeHandler
|
| 88 |
backend_info = "Unknown"
|
| 89 |
|
| 90 |
# 1. Forced Mock (Unit Testing)
|
| 91 |
if use_mock:
|
| 92 |
-
judge_handler = MockJudgeHandler(domain=domain)
|
| 93 |
backend_info = "Mock (Testing)"
|
| 94 |
|
| 95 |
# 2. Paid API Key (User provided or Env)
|
| 96 |
elif user_api_key and user_api_key.strip():
|
| 97 |
-
# Auto-detect provider from key prefix
|
| 98 |
-
model: AnthropicModel | OpenAIChatModel
|
| 99 |
if user_api_key.startswith("sk-ant-"):
|
| 100 |
-
# Anthropic key
|
| 101 |
-
anthropic_provider = AnthropicProvider(api_key=user_api_key)
|
| 102 |
-
model = AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
|
| 103 |
backend_info = "Paid API (Anthropic)"
|
| 104 |
elif user_api_key.startswith("sk-"):
|
| 105 |
-
# OpenAI key
|
| 106 |
-
openai_provider = OpenAIProvider(api_key=user_api_key)
|
| 107 |
-
model = OpenAIChatModel(settings.openai_model, provider=openai_provider)
|
| 108 |
backend_info = "Paid API (OpenAI)"
|
| 109 |
else:
|
| 110 |
raise ConfigurationError(
|
| 111 |
"Invalid API key format. Expected sk-... (OpenAI) or sk-ant-... (Anthropic)"
|
| 112 |
)
|
| 113 |
-
judge_handler = JudgeHandler(model=model, domain=domain)
|
| 114 |
|
| 115 |
# 3. Environment API Keys (fallback)
|
| 116 |
elif settings.has_openai_key:
|
| 117 |
-
judge_handler = JudgeHandler(model=None, domain=domain) # Uses env key
|
| 118 |
backend_info = "Paid API (OpenAI from env)"
|
| 119 |
|
| 120 |
elif settings.has_anthropic_key:
|
| 121 |
-
judge_handler = JudgeHandler(model=None, domain=domain) # Uses env key
|
| 122 |
backend_info = "Paid API (Anthropic from env)"
|
| 123 |
|
| 124 |
# 4. Free Tier (HuggingFace Inference)
|
| 125 |
else:
|
| 126 |
-
judge_handler = HFInferenceJudgeHandler(domain=domain)
|
| 127 |
backend_info = "Free Tier (Llama 3.1 / Mistral)"
|
| 128 |
|
| 129 |
orchestrator = create_orchestrator(
|
| 130 |
-
search_handler=search_handler,
|
| 131 |
-
judge_handler=judge_handler,
|
| 132 |
config=config,
|
| 133 |
mode=mode,
|
| 134 |
api_key=user_api_key,
|
|
@@ -139,41 +109,31 @@ def configure_orchestrator(
|
|
| 139 |
|
| 140 |
|
| 141 |
def _validate_inputs(
|
| 142 |
-
mode: str,
|
| 143 |
api_key: str | None,
|
| 144 |
api_key_state: str | None,
|
| 145 |
-
) -> tuple[
|
| 146 |
-
"""Validate inputs and determine
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
Returns:
|
| 149 |
-
Tuple of (
|
| 150 |
"""
|
| 151 |
-
# Validate mode
|
| 152 |
-
valid_modes: set[str] = {"simple", "magentic", "advanced", "hierarchical"}
|
| 153 |
-
mode_validated: OrchestratorMode = mode if mode in valid_modes else "simple" # type: ignore[assignment]
|
| 154 |
-
|
| 155 |
# Determine effective key
|
| 156 |
user_api_key = (api_key or api_key_state or "").strip() or None
|
| 157 |
|
| 158 |
# Check available keys
|
| 159 |
has_openai = settings.has_openai_key
|
| 160 |
has_anthropic = settings.has_anthropic_key
|
| 161 |
-
is_openai_user_key = (
|
| 162 |
-
user_api_key and user_api_key.startswith("sk-") and not user_api_key.startswith("sk-ant-")
|
| 163 |
-
)
|
| 164 |
has_paid_key = has_openai or has_anthropic or bool(user_api_key)
|
| 165 |
|
| 166 |
-
|
| 167 |
-
if mode_validated == "advanced" and not (has_openai or is_openai_user_key):
|
| 168 |
-
mode_validated = "simple"
|
| 169 |
-
|
| 170 |
-
return mode_validated, user_api_key, has_paid_key
|
| 171 |
|
| 172 |
|
| 173 |
async def research_agent(
|
| 174 |
message: str,
|
| 175 |
history: list[dict[str, Any]],
|
| 176 |
-
mode: str = "simple", # Gradio passes strings; validated below
|
| 177 |
domain: str = "sexual_health",
|
| 178 |
api_key: str = "",
|
| 179 |
api_key_state: str = "",
|
|
@@ -182,10 +142,12 @@ async def research_agent(
|
|
| 182 |
"""
|
| 183 |
Gradio chat function that runs the research agent.
|
| 184 |
|
|
|
|
|
|
|
|
|
|
| 185 |
Args:
|
| 186 |
message: User's research question
|
| 187 |
history: Chat history (Gradio format)
|
| 188 |
-
mode: Orchestrator mode ("simple" or "advanced")
|
| 189 |
domain: Research domain
|
| 190 |
api_key: Optional user-provided API key (BYOK - auto-detects provider)
|
| 191 |
api_key_state: Persistent API key state (survives example clicks)
|
|
@@ -201,15 +163,8 @@ async def research_agent(
|
|
| 201 |
# BUG FIX: Handle None values from Gradio example caching
|
| 202 |
domain_str = domain or "sexual_health"
|
| 203 |
|
| 204 |
-
# Validate inputs
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
# Inform user about fallback/tier status
|
| 208 |
-
if mode == "advanced" and mode_validated == "simple":
|
| 209 |
-
yield (
|
| 210 |
-
"β οΈ **Warning**: Advanced mode currently requires OpenAI API key. "
|
| 211 |
-
"Anthropic keys only work in Simple mode. Falling back to Simple.\n\n"
|
| 212 |
-
)
|
| 213 |
|
| 214 |
if not has_paid_key:
|
| 215 |
yield (
|
|
@@ -223,9 +178,10 @@ async def research_agent(
|
|
| 223 |
|
| 224 |
try:
|
| 225 |
# use_mock=False - let configure_orchestrator decide based on available keys
|
|
|
|
| 226 |
orchestrator, backend_name = configure_orchestrator(
|
| 227 |
use_mock=False,
|
| 228 |
-
mode=
|
| 229 |
user_api_key=user_api_key,
|
| 230 |
domain=domain_str,
|
| 231 |
)
|
|
@@ -297,9 +253,7 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
|
|
| 297 |
Returns:
|
| 298 |
Configured Gradio Blocks interface with MCP server enabled
|
| 299 |
"""
|
| 300 |
-
additional_inputs_accordion = gr.Accordion(
|
| 301 |
-
label="βοΈ Mode & API Key (Free tier works!)", open=False
|
| 302 |
-
)
|
| 303 |
|
| 304 |
# BUG FIX: Add gr.State for API key persistence across example clicks
|
| 305 |
api_key_state = gr.State("")
|
|
@@ -327,23 +281,22 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
|
|
| 327 |
title="π DeepBoner",
|
| 328 |
description=description,
|
| 329 |
examples=[
|
|
|
|
|
|
|
| 330 |
[
|
| 331 |
"What drugs improve female libido post-menopause?",
|
| 332 |
-
"simple",
|
| 333 |
"sexual_health",
|
| 334 |
None,
|
| 335 |
None,
|
| 336 |
],
|
| 337 |
[
|
| 338 |
"Testosterone therapy for hypoactive sexual desire disorder?",
|
| 339 |
-
"simple",
|
| 340 |
"sexual_health",
|
| 341 |
None,
|
| 342 |
None,
|
| 343 |
],
|
| 344 |
[
|
| 345 |
"Clinical trials for PDE5 inhibitors alternatives?",
|
| 346 |
-
"advanced",
|
| 347 |
"sexual_health",
|
| 348 |
None,
|
| 349 |
None,
|
|
@@ -351,12 +304,8 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
|
|
| 351 |
],
|
| 352 |
additional_inputs_accordion=additional_inputs_accordion,
|
| 353 |
additional_inputs=[
|
| 354 |
-
|
| 355 |
-
|
| 356 |
-
value="simple",
|
| 357 |
-
label="Orchestrator Mode",
|
| 358 |
-
info="β‘ Simple: Free/Any | π¬ Advanced: OpenAI (Deep Research)",
|
| 359 |
-
),
|
| 360 |
gr.Dropdown(
|
| 361 |
choices=[d.value for d in ResearchDomain],
|
| 362 |
value="sexual_health",
|
|
|
|
| 5 |
from typing import Any, Literal
|
| 6 |
|
| 7 |
import gradio as gr
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
|
|
|
| 9 |
from src.config.domain import ResearchDomain
|
| 10 |
from src.orchestrators import create_orchestrator
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
from src.utils.config import settings
|
| 12 |
from src.utils.exceptions import ConfigurationError
|
| 13 |
from src.utils.models import OrchestratorConfig
|
| 14 |
from src.utils.service_loader import warmup_services
|
| 15 |
|
| 16 |
+
OrchestratorMode = Literal["advanced", "hierarchical"] # Unified Architecture (SPEC-16)
|
| 17 |
|
| 18 |
|
| 19 |
# CSS to force dark mode on API key input
|
|
|
|
| 45 |
|
| 46 |
def configure_orchestrator(
|
| 47 |
use_mock: bool = False,
|
| 48 |
+
mode: OrchestratorMode = "advanced",
|
| 49 |
user_api_key: str | None = None,
|
| 50 |
domain: str | ResearchDomain | None = None,
|
| 51 |
) -> tuple[Any, str]:
|
| 52 |
"""
|
| 53 |
Create an orchestrator instance.
|
| 54 |
|
| 55 |
+
Unified Architecture (SPEC-16): All users get Advanced Mode.
|
| 56 |
+
Backend auto-selects: OpenAI (if key) β HuggingFace (free fallback).
|
| 57 |
+
|
| 58 |
Args:
|
| 59 |
use_mock: If True, use MockJudgeHandler (no API key needed)
|
| 60 |
+
mode: Orchestrator mode (default "advanced", "hierarchical" for sub-iteration)
|
| 61 |
user_api_key: Optional user-provided API key (BYOK) - auto-detects provider
|
| 62 |
domain: Research domain (defaults to "sexual_health")
|
| 63 |
|
|
|
|
| 70 |
max_results_per_tool=10,
|
| 71 |
)
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
backend_info = "Unknown"
|
| 74 |
|
| 75 |
# 1. Forced Mock (Unit Testing)
|
| 76 |
if use_mock:
|
|
|
|
| 77 |
backend_info = "Mock (Testing)"
|
| 78 |
|
| 79 |
# 2. Paid API Key (User provided or Env)
|
| 80 |
elif user_api_key and user_api_key.strip():
|
|
|
|
|
|
|
| 81 |
if user_api_key.startswith("sk-ant-"):
|
|
|
|
|
|
|
|
|
|
| 82 |
backend_info = "Paid API (Anthropic)"
|
| 83 |
elif user_api_key.startswith("sk-"):
|
|
|
|
|
|
|
|
|
|
| 84 |
backend_info = "Paid API (OpenAI)"
|
| 85 |
else:
|
| 86 |
raise ConfigurationError(
|
| 87 |
"Invalid API key format. Expected sk-... (OpenAI) or sk-ant-... (Anthropic)"
|
| 88 |
)
|
|
|
|
| 89 |
|
| 90 |
# 3. Environment API Keys (fallback)
|
| 91 |
elif settings.has_openai_key:
|
|
|
|
| 92 |
backend_info = "Paid API (OpenAI from env)"
|
| 93 |
|
| 94 |
elif settings.has_anthropic_key:
|
|
|
|
| 95 |
backend_info = "Paid API (Anthropic from env)"
|
| 96 |
|
| 97 |
# 4. Free Tier (HuggingFace Inference)
|
| 98 |
else:
|
|
|
|
| 99 |
backend_info = "Free Tier (Llama 3.1 / Mistral)"
|
| 100 |
|
| 101 |
orchestrator = create_orchestrator(
|
|
|
|
|
|
|
| 102 |
config=config,
|
| 103 |
mode=mode,
|
| 104 |
api_key=user_api_key,
|
|
|
|
| 109 |
|
| 110 |
|
| 111 |
def _validate_inputs(
|
|
|
|
| 112 |
api_key: str | None,
|
| 113 |
api_key_state: str | None,
|
| 114 |
+
) -> tuple[str | None, bool]:
|
| 115 |
+
"""Validate inputs and determine key status.
|
| 116 |
+
|
| 117 |
+
Unified Architecture (SPEC-16): Mode is always "advanced".
|
| 118 |
+
Backend auto-selects based on available API keys.
|
| 119 |
|
| 120 |
Returns:
|
| 121 |
+
Tuple of (effective_user_key, has_paid_key)
|
| 122 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
# Determine effective key
|
| 124 |
user_api_key = (api_key or api_key_state or "").strip() or None
|
| 125 |
|
| 126 |
# Check available keys
|
| 127 |
has_openai = settings.has_openai_key
|
| 128 |
has_anthropic = settings.has_anthropic_key
|
|
|
|
|
|
|
|
|
|
| 129 |
has_paid_key = has_openai or has_anthropic or bool(user_api_key)
|
| 130 |
|
| 131 |
+
return user_api_key, has_paid_key
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
|
| 134 |
async def research_agent(
|
| 135 |
message: str,
|
| 136 |
history: list[dict[str, Any]],
|
|
|
|
| 137 |
domain: str = "sexual_health",
|
| 138 |
api_key: str = "",
|
| 139 |
api_key_state: str = "",
|
|
|
|
| 142 |
"""
|
| 143 |
Gradio chat function that runs the research agent.
|
| 144 |
|
| 145 |
+
Unified Architecture (SPEC-16): Always uses Advanced Mode.
|
| 146 |
+
Backend auto-selects: OpenAI (if key) β HuggingFace (free fallback).
|
| 147 |
+
|
| 148 |
Args:
|
| 149 |
message: User's research question
|
| 150 |
history: Chat history (Gradio format)
|
|
|
|
| 151 |
domain: Research domain
|
| 152 |
api_key: Optional user-provided API key (BYOK - auto-detects provider)
|
| 153 |
api_key_state: Persistent API key state (survives example clicks)
|
|
|
|
| 163 |
# BUG FIX: Handle None values from Gradio example caching
|
| 164 |
domain_str = domain or "sexual_health"
|
| 165 |
|
| 166 |
+
# Validate inputs (SPEC-16: mode is always "advanced")
|
| 167 |
+
user_api_key, has_paid_key = _validate_inputs(api_key, api_key_state)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
if not has_paid_key:
|
| 170 |
yield (
|
|
|
|
| 178 |
|
| 179 |
try:
|
| 180 |
# use_mock=False - let configure_orchestrator decide based on available keys
|
| 181 |
+
# SPEC-16: mode is always "advanced" (unified architecture)
|
| 182 |
orchestrator, backend_name = configure_orchestrator(
|
| 183 |
use_mock=False,
|
| 184 |
+
mode="advanced",
|
| 185 |
user_api_key=user_api_key,
|
| 186 |
domain=domain_str,
|
| 187 |
)
|
|
|
|
| 253 |
Returns:
|
| 254 |
Configured Gradio Blocks interface with MCP server enabled
|
| 255 |
"""
|
| 256 |
+
additional_inputs_accordion = gr.Accordion(label="βοΈ API Key (Free tier works!)", open=False)
|
|
|
|
|
|
|
| 257 |
|
| 258 |
# BUG FIX: Add gr.State for API key persistence across example clicks
|
| 259 |
api_key_state = gr.State("")
|
|
|
|
| 281 |
title="π DeepBoner",
|
| 282 |
description=description,
|
| 283 |
examples=[
|
| 284 |
+
# SPEC-16: Mode is always "advanced" (unified architecture)
|
| 285 |
+
# Examples now only need: [question, domain, api_key, api_key_state]
|
| 286 |
[
|
| 287 |
"What drugs improve female libido post-menopause?",
|
|
|
|
| 288 |
"sexual_health",
|
| 289 |
None,
|
| 290 |
None,
|
| 291 |
],
|
| 292 |
[
|
| 293 |
"Testosterone therapy for hypoactive sexual desire disorder?",
|
|
|
|
| 294 |
"sexual_health",
|
| 295 |
None,
|
| 296 |
None,
|
| 297 |
],
|
| 298 |
[
|
| 299 |
"Clinical trials for PDE5 inhibitors alternatives?",
|
|
|
|
| 300 |
"sexual_health",
|
| 301 |
None,
|
| 302 |
None,
|
|
|
|
| 304 |
],
|
| 305 |
additional_inputs_accordion=additional_inputs_accordion,
|
| 306 |
additional_inputs=[
|
| 307 |
+
# SPEC-16: Mode toggle removed - everyone gets Advanced Mode
|
| 308 |
+
# Backend auto-selects: OpenAI (if key) β HuggingFace (free fallback)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 309 |
gr.Dropdown(
|
| 310 |
choices=[d.value for d in ResearchDomain],
|
| 311 |
value="sexual_health",
|
|
File without changes
|
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Base classes for Chat Client implementations.
|
| 2 |
+
|
| 3 |
+
This module re-exports the BaseChatClient and related types from the core
|
| 4 |
+
agent_framework package to provide a single point of import for the project.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from agent_framework import (
|
| 8 |
+
BaseChatClient,
|
| 9 |
+
ChatMessage,
|
| 10 |
+
ChatResponse,
|
| 11 |
+
ChatResponseUpdate,
|
| 12 |
+
)
|
| 13 |
+
|
| 14 |
+
__all__ = [
|
| 15 |
+
"BaseChatClient",
|
| 16 |
+
"ChatMessage",
|
| 17 |
+
"ChatResponse",
|
| 18 |
+
"ChatResponseUpdate",
|
| 19 |
+
]
|
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Chat Client Factory for unified provider selection."""
|
| 2 |
+
|
| 3 |
+
from typing import Any
|
| 4 |
+
|
| 5 |
+
import structlog
|
| 6 |
+
from agent_framework import BaseChatClient
|
| 7 |
+
from agent_framework.openai import OpenAIChatClient
|
| 8 |
+
|
| 9 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 10 |
+
from src.utils.config import settings
|
| 11 |
+
|
| 12 |
+
logger = structlog.get_logger()
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def get_chat_client(
|
| 16 |
+
provider: str | None = None,
|
| 17 |
+
api_key: str | None = None,
|
| 18 |
+
model_id: str | None = None,
|
| 19 |
+
**kwargs: Any,
|
| 20 |
+
) -> BaseChatClient:
|
| 21 |
+
"""
|
| 22 |
+
Factory for creating chat clients.
|
| 23 |
+
|
| 24 |
+
Auto-detection priority:
|
| 25 |
+
1. Explicit provider parameter
|
| 26 |
+
2. OpenAI key (Best Function Calling)
|
| 27 |
+
3. Gemini key (Best Context/Cost)
|
| 28 |
+
4. HuggingFace (Free Fallback)
|
| 29 |
+
|
| 30 |
+
Args:
|
| 31 |
+
provider: Force specific provider ("openai", "gemini", "huggingface")
|
| 32 |
+
api_key: Override API key for the provider
|
| 33 |
+
model_id: Override default model ID
|
| 34 |
+
**kwargs: Additional arguments for the client
|
| 35 |
+
|
| 36 |
+
Returns:
|
| 37 |
+
Configured BaseChatClient instance (Namespace Neutral)
|
| 38 |
+
|
| 39 |
+
Raises:
|
| 40 |
+
ValueError: If an unsupported provider is explicitly requested
|
| 41 |
+
NotImplementedError: If Gemini is explicitly requested (not yet implemented)
|
| 42 |
+
"""
|
| 43 |
+
# Normalize provider to lowercase for case-insensitive matching
|
| 44 |
+
normalized = provider.lower() if provider is not None else None
|
| 45 |
+
|
| 46 |
+
# Validate explicit provider requests early
|
| 47 |
+
valid_providers = (None, "openai", "gemini", "huggingface")
|
| 48 |
+
if normalized not in valid_providers:
|
| 49 |
+
raise ValueError(f"Unsupported provider: {provider!r}")
|
| 50 |
+
|
| 51 |
+
# 1. OpenAI (Standard / Paid Tier)
|
| 52 |
+
if normalized == "openai" or (normalized is None and settings.has_openai_key):
|
| 53 |
+
logger.info("Using OpenAI Chat Client")
|
| 54 |
+
return OpenAIChatClient(
|
| 55 |
+
model_id=model_id or settings.openai_model,
|
| 56 |
+
api_key=api_key or settings.openai_api_key,
|
| 57 |
+
**kwargs,
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
# 2. Gemini (High Performance / Alternative)
|
| 61 |
+
if normalized == "gemini":
|
| 62 |
+
# Explicit request for Gemini - fail loudly
|
| 63 |
+
raise NotImplementedError("Gemini client not yet implemented (Planned Phase 4)")
|
| 64 |
+
|
| 65 |
+
if normalized is None and settings.has_gemini_key:
|
| 66 |
+
# Implicit (has key but not explicit) - log warning and fall through
|
| 67 |
+
logger.warning("Gemini key detected but client not yet implemented; falling back")
|
| 68 |
+
|
| 69 |
+
# 3. HuggingFace (Free Fallback)
|
| 70 |
+
# This is the default if no other keys are present
|
| 71 |
+
logger.info("Using HuggingFace Chat Client (Free Tier)")
|
| 72 |
+
return HuggingFaceChatClient(
|
| 73 |
+
model_id=model_id or settings.huggingface_model,
|
| 74 |
+
api_key=api_key or settings.hf_token,
|
| 75 |
+
**kwargs,
|
| 76 |
+
)
|
|
@@ -0,0 +1,191 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""HuggingFace Chat Client adapter for Microsoft Agent Framework.
|
| 2 |
+
|
| 3 |
+
This client enables the use of HuggingFace Inference API (including the free tier)
|
| 4 |
+
as a backend for the agent framework, allowing "Advanced Mode" to work without
|
| 5 |
+
an OpenAI API key.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import asyncio
|
| 9 |
+
from collections.abc import AsyncIterable, MutableSequence
|
| 10 |
+
from functools import partial
|
| 11 |
+
from typing import Any, cast
|
| 12 |
+
|
| 13 |
+
import structlog
|
| 14 |
+
from agent_framework import (
|
| 15 |
+
BaseChatClient,
|
| 16 |
+
ChatMessage,
|
| 17 |
+
ChatOptions,
|
| 18 |
+
ChatResponse,
|
| 19 |
+
ChatResponseUpdate,
|
| 20 |
+
)
|
| 21 |
+
from huggingface_hub import InferenceClient
|
| 22 |
+
|
| 23 |
+
from src.utils.config import settings
|
| 24 |
+
|
| 25 |
+
logger = structlog.get_logger()
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class HuggingFaceChatClient(BaseChatClient): # type: ignore[misc]
|
| 29 |
+
"""Adapter for HuggingFace Inference API."""
|
| 30 |
+
|
| 31 |
+
def __init__(
|
| 32 |
+
self,
|
| 33 |
+
model_id: str | None = None,
|
| 34 |
+
api_key: str | None = None,
|
| 35 |
+
**kwargs: Any,
|
| 36 |
+
) -> None:
|
| 37 |
+
"""Initialize the HuggingFace chat client.
|
| 38 |
+
|
| 39 |
+
Args:
|
| 40 |
+
model_id: The HuggingFace model ID (default: configured value or Llama-3.1-70B).
|
| 41 |
+
api_key: HF_TOKEN (optional, defaults to env var).
|
| 42 |
+
**kwargs: Additional arguments passed to BaseChatClient.
|
| 43 |
+
"""
|
| 44 |
+
super().__init__(**kwargs)
|
| 45 |
+
self.model_id = (
|
| 46 |
+
model_id or settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
|
| 47 |
+
)
|
| 48 |
+
self.api_key = api_key or settings.hf_token
|
| 49 |
+
|
| 50 |
+
# Initialize the HF Inference Client
|
| 51 |
+
# timeout=60 to prevent premature timeouts on long reasonings
|
| 52 |
+
self._client = InferenceClient(
|
| 53 |
+
model=self.model_id,
|
| 54 |
+
token=self.api_key,
|
| 55 |
+
timeout=60,
|
| 56 |
+
)
|
| 57 |
+
logger.info("Initialized HuggingFaceChatClient", model=self.model_id)
|
| 58 |
+
|
| 59 |
+
def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
|
| 60 |
+
"""Convert framework messages to HuggingFace format."""
|
| 61 |
+
hf_messages: list[dict[str, Any]] = []
|
| 62 |
+
for msg in messages:
|
| 63 |
+
# Basic conversion - extend as needed for multi-modal
|
| 64 |
+
content = msg.text or ""
|
| 65 |
+
# msg.role can be string or enum - extract .value for enums
|
| 66 |
+
# str(Role.USER) -> "Role.USER" (wrong), Role.USER.value -> "user" (correct)
|
| 67 |
+
if hasattr(msg.role, "value"):
|
| 68 |
+
role_str = str(msg.role.value)
|
| 69 |
+
else:
|
| 70 |
+
role_str = str(msg.role)
|
| 71 |
+
hf_messages.append({"role": role_str, "content": content})
|
| 72 |
+
return hf_messages
|
| 73 |
+
|
| 74 |
+
async def _inner_get_response(
|
| 75 |
+
self,
|
| 76 |
+
*,
|
| 77 |
+
messages: MutableSequence[ChatMessage],
|
| 78 |
+
chat_options: ChatOptions,
|
| 79 |
+
**kwargs: Any,
|
| 80 |
+
) -> ChatResponse:
|
| 81 |
+
"""Synchronous response generation using chat_completion."""
|
| 82 |
+
hf_messages = self._convert_messages(messages)
|
| 83 |
+
|
| 84 |
+
# Extract tool configuration
|
| 85 |
+
tools = chat_options.tools if chat_options.tools else None
|
| 86 |
+
# HF expects 'tool_choice' to be 'auto', 'none', or specific tool
|
| 87 |
+
# Framework uses ToolMode enum or dict
|
| 88 |
+
hf_tool_choice: str | None = None
|
| 89 |
+
if chat_options.tool_choice is not None:
|
| 90 |
+
tool_choice_str = str(chat_options.tool_choice)
|
| 91 |
+
if "AUTO" in tool_choice_str:
|
| 92 |
+
hf_tool_choice = "auto"
|
| 93 |
+
# For NONE or other, leave as None
|
| 94 |
+
|
| 95 |
+
try:
|
| 96 |
+
# Use explicit None checks - 'or' treats 0/0.0 as falsy
|
| 97 |
+
# temperature=0.0 is valid (deterministic output)
|
| 98 |
+
max_tokens = chat_options.max_tokens if chat_options.max_tokens is not None else 2048
|
| 99 |
+
temperature = chat_options.temperature if chat_options.temperature is not None else 0.7
|
| 100 |
+
|
| 101 |
+
# Use partial to create a callable with keyword args for to_thread
|
| 102 |
+
call_fn = partial(
|
| 103 |
+
self._client.chat_completion,
|
| 104 |
+
messages=hf_messages,
|
| 105 |
+
tools=tools,
|
| 106 |
+
tool_choice=hf_tool_choice,
|
| 107 |
+
max_tokens=max_tokens,
|
| 108 |
+
temperature=temperature,
|
| 109 |
+
stream=False,
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
response = await asyncio.to_thread(call_fn)
|
| 113 |
+
|
| 114 |
+
# Parse response
|
| 115 |
+
# HF returns a ChatCompletionOutput
|
| 116 |
+
choices = response.choices
|
| 117 |
+
if not choices:
|
| 118 |
+
return ChatResponse(messages=[], response_id="error-no-choices")
|
| 119 |
+
|
| 120 |
+
choice = choices[0]
|
| 121 |
+
message_content = choice.message.content or ""
|
| 122 |
+
|
| 123 |
+
# Construct response message with proper kwargs
|
| 124 |
+
response_msg = ChatMessage(
|
| 125 |
+
role=cast(Any, choice.message.role),
|
| 126 |
+
text=message_content,
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
return ChatResponse(
|
| 130 |
+
messages=[response_msg],
|
| 131 |
+
response_id=response.id or "hf-response",
|
| 132 |
+
)
|
| 133 |
+
|
| 134 |
+
except Exception as e:
|
| 135 |
+
logger.error("HuggingFace API error", error=str(e))
|
| 136 |
+
raise
|
| 137 |
+
|
| 138 |
+
async def _inner_get_streaming_response(
|
| 139 |
+
self,
|
| 140 |
+
*,
|
| 141 |
+
messages: MutableSequence[ChatMessage],
|
| 142 |
+
chat_options: ChatOptions,
|
| 143 |
+
**kwargs: Any,
|
| 144 |
+
) -> AsyncIterable[ChatResponseUpdate]:
|
| 145 |
+
"""Streaming response generation."""
|
| 146 |
+
hf_messages = self._convert_messages(messages)
|
| 147 |
+
|
| 148 |
+
tools = chat_options.tools if chat_options.tools else None
|
| 149 |
+
hf_tool_choice: str | None = None
|
| 150 |
+
if chat_options.tool_choice is not None:
|
| 151 |
+
if "AUTO" in str(chat_options.tool_choice):
|
| 152 |
+
hf_tool_choice = "auto"
|
| 153 |
+
|
| 154 |
+
try:
|
| 155 |
+
# Use explicit None checks - 'or' treats 0/0.0 as falsy
|
| 156 |
+
# temperature=0.0 is valid (deterministic output)
|
| 157 |
+
max_tokens = chat_options.max_tokens if chat_options.max_tokens is not None else 2048
|
| 158 |
+
temperature = chat_options.temperature if chat_options.temperature is not None else 0.7
|
| 159 |
+
|
| 160 |
+
# Use partial for streaming call
|
| 161 |
+
call_fn = partial(
|
| 162 |
+
self._client.chat_completion,
|
| 163 |
+
messages=hf_messages,
|
| 164 |
+
tools=tools,
|
| 165 |
+
tool_choice=hf_tool_choice,
|
| 166 |
+
max_tokens=max_tokens,
|
| 167 |
+
temperature=temperature,
|
| 168 |
+
stream=True,
|
| 169 |
+
)
|
| 170 |
+
|
| 171 |
+
stream = await asyncio.to_thread(call_fn)
|
| 172 |
+
|
| 173 |
+
for chunk in stream:
|
| 174 |
+
# Chunk is ChatCompletionStreamOutput
|
| 175 |
+
if not chunk.choices:
|
| 176 |
+
continue
|
| 177 |
+
choice = chunk.choices[0]
|
| 178 |
+
delta = choice.delta
|
| 179 |
+
|
| 180 |
+
# Convert to ChatResponseUpdate
|
| 181 |
+
yield ChatResponseUpdate(
|
| 182 |
+
role=cast(Any, delta.role) if delta.role else None,
|
| 183 |
+
content=delta.content,
|
| 184 |
+
)
|
| 185 |
+
|
| 186 |
+
# Yield control to event loop
|
| 187 |
+
await asyncio.sleep(0)
|
| 188 |
+
|
| 189 |
+
except Exception as e:
|
| 190 |
+
logger.error("HuggingFace Streaming error", error=str(e))
|
| 191 |
+
raise
|
|
@@ -1,27 +1,32 @@
|
|
| 1 |
-
"""Orchestrators package -
|
| 2 |
|
| 3 |
-
This package implements the Strategy Pattern
|
| 4 |
-
to switch between different orchestration approaches:
|
| 5 |
|
| 6 |
-
-
|
| 7 |
-
-
|
| 8 |
- Hierarchical: Sub-iteration middleware with fine-grained control
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
Usage:
|
| 11 |
-
from src.orchestrators import create_orchestrator
|
| 12 |
|
| 13 |
-
#
|
| 14 |
-
orchestrator = create_orchestrator(
|
| 15 |
|
| 16 |
-
# Or
|
| 17 |
-
orchestrator = create_orchestrator(
|
| 18 |
|
| 19 |
Protocols:
|
| 20 |
from src.orchestrators import SearchHandlerProtocol, JudgeHandlerProtocol, OrchestratorProtocol
|
| 21 |
|
| 22 |
Design Patterns Applied:
|
| 23 |
- Factory Pattern: create_orchestrator() creates appropriate orchestrator
|
| 24 |
-
-
|
|
|
|
| 25 |
- Facade Pattern: This __init__.py provides a clean public API
|
| 26 |
"""
|
| 27 |
|
|
@@ -40,9 +45,6 @@ from src.orchestrators.base import (
|
|
| 40 |
# Factory (creational pattern)
|
| 41 |
from src.orchestrators.factory import create_orchestrator
|
| 42 |
|
| 43 |
-
# Orchestrators (Strategy Pattern implementations)
|
| 44 |
-
from src.orchestrators.simple import Orchestrator
|
| 45 |
-
|
| 46 |
if TYPE_CHECKING:
|
| 47 |
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 48 |
from src.orchestrators.hierarchical import HierarchicalOrchestrator
|
|
@@ -101,7 +103,6 @@ def get_magentic_orchestrator() -> type[AdvancedOrchestrator]:
|
|
| 101 |
|
| 102 |
__all__ = [
|
| 103 |
"JudgeHandlerProtocol",
|
| 104 |
-
"Orchestrator",
|
| 105 |
"OrchestratorProtocol",
|
| 106 |
"SearchHandlerProtocol",
|
| 107 |
"create_orchestrator",
|
|
|
|
| 1 |
+
"""Orchestrators package - Unified Architecture (SPEC-16).
|
| 2 |
|
| 3 |
+
This package implements the Strategy Pattern with a unified orchestration approach:
|
|
|
|
| 4 |
|
| 5 |
+
- Advanced: Multi-agent coordination using Microsoft Agent Framework (DEFAULT)
|
| 6 |
+
- Backend auto-selects: OpenAI (if key) β HuggingFace (free fallback)
|
| 7 |
- Hierarchical: Sub-iteration middleware with fine-grained control
|
| 8 |
|
| 9 |
+
Unified Architecture (SPEC-16):
|
| 10 |
+
All users get Advanced Mode. The chat client factory auto-selects the backend:
|
| 11 |
+
- With OpenAI key β OpenAIChatClient (GPT-5)
|
| 12 |
+
- Without key β HuggingFaceChatClient (Llama 3.1 70B, free tier)
|
| 13 |
+
|
| 14 |
Usage:
|
| 15 |
+
from src.orchestrators import create_orchestrator
|
| 16 |
|
| 17 |
+
# Creates AdvancedOrchestrator with auto-selected backend
|
| 18 |
+
orchestrator = create_orchestrator()
|
| 19 |
|
| 20 |
+
# Or with explicit API key
|
| 21 |
+
orchestrator = create_orchestrator(api_key="sk-...")
|
| 22 |
|
| 23 |
Protocols:
|
| 24 |
from src.orchestrators import SearchHandlerProtocol, JudgeHandlerProtocol, OrchestratorProtocol
|
| 25 |
|
| 26 |
Design Patterns Applied:
|
| 27 |
- Factory Pattern: create_orchestrator() creates appropriate orchestrator
|
| 28 |
+
- Adapter Pattern: HuggingFaceChatClient adapts HF API to BaseChatClient
|
| 29 |
+
- Strategy Pattern: Different backends (OpenAI, HuggingFace) via ChatClientFactory
|
| 30 |
- Facade Pattern: This __init__.py provides a clean public API
|
| 31 |
"""
|
| 32 |
|
|
|
|
| 45 |
# Factory (creational pattern)
|
| 46 |
from src.orchestrators.factory import create_orchestrator
|
| 47 |
|
|
|
|
|
|
|
|
|
|
| 48 |
if TYPE_CHECKING:
|
| 49 |
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 50 |
from src.orchestrators.hierarchical import HierarchicalOrchestrator
|
|
|
|
| 103 |
|
| 104 |
__all__ = [
|
| 105 |
"JudgeHandlerProtocol",
|
|
|
|
| 106 |
"OrchestratorProtocol",
|
| 107 |
"SearchHandlerProtocol",
|
| 108 |
"create_orchestrator",
|
|
@@ -28,7 +28,6 @@ from agent_framework import (
|
|
| 28 |
MagenticOrchestratorMessageEvent,
|
| 29 |
WorkflowOutputEvent,
|
| 30 |
)
|
| 31 |
-
from agent_framework.openai import OpenAIChatClient
|
| 32 |
|
| 33 |
from src.agents.magentic_agents import (
|
| 34 |
create_hypothesis_agent,
|
|
@@ -37,10 +36,11 @@ from src.agents.magentic_agents import (
|
|
| 37 |
create_search_agent,
|
| 38 |
)
|
| 39 |
from src.agents.state import init_magentic_state
|
|
|
|
|
|
|
| 40 |
from src.config.domain import ResearchDomain, get_domain_config
|
| 41 |
from src.orchestrators.base import OrchestratorProtocol
|
| 42 |
from src.utils.config import settings
|
| 43 |
-
from src.utils.llm_factory import check_magentic_requirements
|
| 44 |
from src.utils.models import AgentEvent
|
| 45 |
from src.utils.service_loader import get_embedding_service_if_available
|
| 46 |
|
|
@@ -69,45 +69,50 @@ class AdvancedOrchestrator(OrchestratorProtocol):
|
|
| 69 |
|
| 70 |
def __init__(
|
| 71 |
self,
|
| 72 |
-
max_rounds: int
|
| 73 |
-
chat_client:
|
|
|
|
| 74 |
api_key: str | None = None,
|
| 75 |
-
timeout_seconds: float = 300.0,
|
| 76 |
domain: ResearchDomain | str | None = None,
|
|
|
|
| 77 |
) -> None:
|
| 78 |
-
"""Initialize orchestrator.
|
| 79 |
|
| 80 |
Args:
|
| 81 |
-
max_rounds: Maximum coordination rounds
|
| 82 |
-
chat_client: Optional
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
domain: Research domain for customization
|
|
|
|
| 86 |
"""
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
self.
|
| 93 |
-
|
| 94 |
-
|
|
|
|
|
|
|
|
|
|
| 95 |
)
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
self.
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
|
| 112 |
def _init_embedding_service(self) -> "EmbeddingServiceProtocol | None":
|
| 113 |
"""Initialize embedding service if available."""
|
|
@@ -122,10 +127,7 @@ class AdvancedOrchestrator(OrchestratorProtocol):
|
|
| 122 |
report_agent = create_report_agent(self._chat_client, domain=self.domain)
|
| 123 |
|
| 124 |
# Manager chat client (orchestrates the agents)
|
| 125 |
-
manager_client = self._chat_client
|
| 126 |
-
model_id=settings.openai_model, # Use configured model
|
| 127 |
-
api_key=settings.openai_api_key,
|
| 128 |
-
)
|
| 129 |
|
| 130 |
return (
|
| 131 |
MagenticBuilder()
|
|
|
|
| 28 |
MagenticOrchestratorMessageEvent,
|
| 29 |
WorkflowOutputEvent,
|
| 30 |
)
|
|
|
|
| 31 |
|
| 32 |
from src.agents.magentic_agents import (
|
| 33 |
create_hypothesis_agent,
|
|
|
|
| 36 |
create_search_agent,
|
| 37 |
)
|
| 38 |
from src.agents.state import init_magentic_state
|
| 39 |
+
from src.clients.base import BaseChatClient
|
| 40 |
+
from src.clients.factory import get_chat_client
|
| 41 |
from src.config.domain import ResearchDomain, get_domain_config
|
| 42 |
from src.orchestrators.base import OrchestratorProtocol
|
| 43 |
from src.utils.config import settings
|
|
|
|
| 44 |
from src.utils.models import AgentEvent
|
| 45 |
from src.utils.service_loader import get_embedding_service_if_available
|
| 46 |
|
|
|
|
| 69 |
|
| 70 |
def __init__(
|
| 71 |
self,
|
| 72 |
+
max_rounds: int = 5,
|
| 73 |
+
chat_client: BaseChatClient | None = None,
|
| 74 |
+
provider: str | None = None,
|
| 75 |
api_key: str | None = None,
|
|
|
|
| 76 |
domain: ResearchDomain | str | None = None,
|
| 77 |
+
timeout_seconds: float | None = None,
|
| 78 |
) -> None:
|
| 79 |
+
"""Initialize the advanced orchestrator.
|
| 80 |
|
| 81 |
Args:
|
| 82 |
+
max_rounds: Maximum number of coordination rounds.
|
| 83 |
+
chat_client: Optional pre-configured chat client.
|
| 84 |
+
provider: Optional provider override ("openai", "huggingface").
|
| 85 |
+
api_key: Optional API key override.
|
| 86 |
+
domain: Research domain for customization.
|
| 87 |
+
timeout_seconds: Optional timeout override (defaults to settings).
|
| 88 |
"""
|
| 89 |
+
self._max_rounds = max_rounds
|
| 90 |
+
self.domain = domain or ResearchDomain.SEXUAL_HEALTH
|
| 91 |
+
self.domain_config = get_domain_config(self.domain)
|
| 92 |
+
self._timeout_seconds = timeout_seconds or settings.advanced_timeout
|
| 93 |
+
|
| 94 |
+
self.logger = logger.bind(orchestrator="advanced")
|
| 95 |
+
|
| 96 |
+
# Use provided client or create one via factory
|
| 97 |
+
self._chat_client = chat_client or get_chat_client(
|
| 98 |
+
provider=provider,
|
| 99 |
+
api_key=api_key,
|
| 100 |
)
|
| 101 |
+
|
| 102 |
+
# Event stream for UI updates
|
| 103 |
+
self._events: list[AgentEvent] = []
|
| 104 |
+
|
| 105 |
+
# Initialize services lazily
|
| 106 |
+
self._embedding_service: EmbeddingServiceProtocol | None = None
|
| 107 |
+
|
| 108 |
+
# Track execution statistics
|
| 109 |
+
self.stats = {
|
| 110 |
+
"rounds": 0,
|
| 111 |
+
"searches": 0,
|
| 112 |
+
"hypotheses": 0,
|
| 113 |
+
"reports": 0,
|
| 114 |
+
"errors": 0,
|
| 115 |
+
}
|
| 116 |
|
| 117 |
def _init_embedding_service(self) -> "EmbeddingServiceProtocol | None":
|
| 118 |
"""Initialize embedding service if available."""
|
|
|
|
| 127 |
report_agent = create_report_agent(self._chat_client, domain=self.domain)
|
| 128 |
|
| 129 |
# Manager chat client (orchestrates the agents)
|
| 130 |
+
manager_client = self._chat_client
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
return (
|
| 133 |
MagenticBuilder()
|
|
@@ -19,7 +19,6 @@ from src.orchestrators.base import (
|
|
| 19 |
OrchestratorProtocol,
|
| 20 |
SearchHandlerProtocol,
|
| 21 |
)
|
| 22 |
-
from src.orchestrators.simple import Orchestrator
|
| 23 |
from src.utils.config import settings
|
| 24 |
from src.utils.models import OrchestratorConfig
|
| 25 |
|
|
@@ -30,27 +29,15 @@ logger = structlog.get_logger()
|
|
| 30 |
|
| 31 |
|
| 32 |
def _get_advanced_orchestrator_class() -> type["AdvancedOrchestrator"]:
|
| 33 |
-
"""Import AdvancedOrchestrator lazily
|
| 34 |
-
|
| 35 |
-
This allows the simple mode to work without agent-framework-core installed.
|
| 36 |
-
|
| 37 |
-
Returns:
|
| 38 |
-
The AdvancedOrchestrator class
|
| 39 |
-
|
| 40 |
-
Raises:
|
| 41 |
-
ValueError: If agent-framework-core is not installed
|
| 42 |
-
"""
|
| 43 |
try:
|
| 44 |
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 45 |
|
| 46 |
return AdvancedOrchestrator
|
| 47 |
except ImportError as e:
|
| 48 |
logger.error("Failed to import AdvancedOrchestrator", error=str(e))
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
"Install with: pip install agent-framework-core. "
|
| 52 |
-
"Or use mode='simple' instead."
|
| 53 |
-
) from e
|
| 54 |
|
| 55 |
|
| 56 |
def create_orchestrator(
|
|
@@ -64,80 +51,40 @@ def create_orchestrator(
|
|
| 64 |
"""
|
| 65 |
Create an orchestrator instance.
|
| 66 |
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
2. Available API keys (auto-detection)
|
| 70 |
-
|
| 71 |
-
Args:
|
| 72 |
-
search_handler: The search handler (required for simple mode)
|
| 73 |
-
judge_handler: The judge handler (required for simple mode)
|
| 74 |
-
config: Optional configuration (max_iterations, timeouts, etc.)
|
| 75 |
-
Note: This parameter is only used by simple and hierarchical modes.
|
| 76 |
-
Advanced mode uses settings.advanced_max_rounds instead.
|
| 77 |
-
mode: "simple", "magentic", "advanced", or "hierarchical"
|
| 78 |
-
Note: "magentic" is an alias for "advanced" (kept for backwards compatibility)
|
| 79 |
-
api_key: Optional API key for advanced mode (OpenAI)
|
| 80 |
-
domain: Research domain for customization (default: sexual_health)
|
| 81 |
-
|
| 82 |
-
Returns:
|
| 83 |
-
Orchestrator instance implementing OrchestratorProtocol
|
| 84 |
-
|
| 85 |
-
Raises:
|
| 86 |
-
ValueError: If required handlers are missing for simple mode
|
| 87 |
-
ValueError: If advanced mode is requested but dependencies are missing
|
| 88 |
"""
|
| 89 |
effective_config = config or OrchestratorConfig()
|
| 90 |
-
effective_mode = _determine_mode(mode
|
| 91 |
logger.info("Creating orchestrator", mode=effective_mode, domain=domain)
|
| 92 |
|
| 93 |
-
if effective_mode == "advanced":
|
| 94 |
-
orchestrator_cls = _get_advanced_orchestrator_class()
|
| 95 |
-
return orchestrator_cls(
|
| 96 |
-
max_rounds=settings.advanced_max_rounds,
|
| 97 |
-
api_key=api_key,
|
| 98 |
-
domain=domain,
|
| 99 |
-
)
|
| 100 |
-
|
| 101 |
if effective_mode == "hierarchical":
|
| 102 |
from src.orchestrators.hierarchical import HierarchicalOrchestrator
|
| 103 |
|
| 104 |
return HierarchicalOrchestrator(config=effective_config, domain=domain)
|
| 105 |
|
| 106 |
-
#
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
judge_handler=judge_handler,
|
| 113 |
-
config=effective_config,
|
| 114 |
domain=domain,
|
| 115 |
)
|
| 116 |
|
| 117 |
|
| 118 |
-
def _determine_mode(explicit_mode: str | None
|
| 119 |
"""Determine which mode to use.
|
| 120 |
|
| 121 |
-
Priority:
|
| 122 |
-
1. Explicit mode parameter
|
| 123 |
-
2. Auto-detect based on available API keys
|
| 124 |
-
|
| 125 |
Args:
|
| 126 |
explicit_mode: Mode explicitly requested by caller
|
| 127 |
-
api_key: API key provided by caller
|
| 128 |
|
| 129 |
Returns:
|
| 130 |
-
Effective mode string: "
|
| 131 |
"""
|
| 132 |
-
if explicit_mode:
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
# Auto-detect: advanced if paid API key available
|
| 140 |
-
if settings.has_openai_key or (api_key and api_key.startswith("sk-")):
|
| 141 |
-
return "advanced"
|
| 142 |
-
|
| 143 |
-
return "simple"
|
|
|
|
| 19 |
OrchestratorProtocol,
|
| 20 |
SearchHandlerProtocol,
|
| 21 |
)
|
|
|
|
| 22 |
from src.utils.config import settings
|
| 23 |
from src.utils.models import OrchestratorConfig
|
| 24 |
|
|
|
|
| 29 |
|
| 30 |
|
| 31 |
def _get_advanced_orchestrator_class() -> type["AdvancedOrchestrator"]:
|
| 32 |
+
"""Import AdvancedOrchestrator lazily."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
try:
|
| 34 |
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 35 |
|
| 36 |
return AdvancedOrchestrator
|
| 37 |
except ImportError as e:
|
| 38 |
logger.error("Failed to import AdvancedOrchestrator", error=str(e))
|
| 39 |
+
# With unified architecture, we should never fail here unless installation is broken
|
| 40 |
+
raise
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
|
| 43 |
def create_orchestrator(
|
|
|
|
| 51 |
"""
|
| 52 |
Create an orchestrator instance.
|
| 53 |
|
| 54 |
+
Defaults to AdvancedOrchestrator (Unified Architecture).
|
| 55 |
+
Simple Mode is deprecated and mapped to Advanced Mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
"""
|
| 57 |
effective_config = config or OrchestratorConfig()
|
| 58 |
+
effective_mode = _determine_mode(mode)
|
| 59 |
logger.info("Creating orchestrator", mode=effective_mode, domain=domain)
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
if effective_mode == "hierarchical":
|
| 62 |
from src.orchestrators.hierarchical import HierarchicalOrchestrator
|
| 63 |
|
| 64 |
return HierarchicalOrchestrator(config=effective_config, domain=domain)
|
| 65 |
|
| 66 |
+
# Default: Advanced Mode (Unified)
|
| 67 |
+
# Handles both Paid (OpenAI) and Free (HuggingFace) tiers
|
| 68 |
+
orchestrator_cls = _get_advanced_orchestrator_class()
|
| 69 |
+
return orchestrator_cls(
|
| 70 |
+
max_rounds=settings.advanced_max_rounds,
|
| 71 |
+
api_key=api_key,
|
|
|
|
|
|
|
| 72 |
domain=domain,
|
| 73 |
)
|
| 74 |
|
| 75 |
|
| 76 |
+
def _determine_mode(explicit_mode: str | None) -> str:
|
| 77 |
"""Determine which mode to use.
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
Args:
|
| 80 |
explicit_mode: Mode explicitly requested by caller
|
|
|
|
| 81 |
|
| 82 |
Returns:
|
| 83 |
+
Effective mode string: "advanced" (default) or "hierarchical"
|
| 84 |
"""
|
| 85 |
+
if explicit_mode == "hierarchical":
|
| 86 |
+
return "hierarchical"
|
| 87 |
+
|
| 88 |
+
# "simple" is deprecated -> upgrade to "advanced"
|
| 89 |
+
# "magentic" is alias for "advanced"
|
| 90 |
+
return "advanced"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,778 +0,0 @@
|
|
| 1 |
-
"""Simple Orchestrator - the basic agent loop connecting Search and Judge.
|
| 2 |
-
|
| 3 |
-
This orchestrator uses a simple loop pattern with pydantic-ai for structured
|
| 4 |
-
LLM outputs. It works with free tier (HuggingFace Inference) or paid APIs
|
| 5 |
-
(OpenAI, Anthropic).
|
| 6 |
-
|
| 7 |
-
Design Pattern: Template Method - defines the skeleton of the search-judge loop
|
| 8 |
-
while allowing handlers to implement specific behaviors.
|
| 9 |
-
"""
|
| 10 |
-
|
| 11 |
-
from __future__ import annotations
|
| 12 |
-
|
| 13 |
-
import asyncio
|
| 14 |
-
from collections.abc import AsyncGenerator
|
| 15 |
-
from typing import TYPE_CHECKING, Any, ClassVar
|
| 16 |
-
|
| 17 |
-
import structlog
|
| 18 |
-
|
| 19 |
-
from src.config.domain import ResearchDomain, get_domain_config
|
| 20 |
-
from src.orchestrators.base import JudgeHandlerProtocol, SearchHandlerProtocol
|
| 21 |
-
from src.prompts.synthesis import format_synthesis_prompt, get_synthesis_system_prompt
|
| 22 |
-
from src.utils.config import settings
|
| 23 |
-
from src.utils.exceptions import JudgeError, ModalError, SearchError
|
| 24 |
-
from src.utils.models import (
|
| 25 |
-
AgentEvent,
|
| 26 |
-
Evidence,
|
| 27 |
-
JudgeAssessment,
|
| 28 |
-
OrchestratorConfig,
|
| 29 |
-
SearchResult,
|
| 30 |
-
)
|
| 31 |
-
|
| 32 |
-
if TYPE_CHECKING:
|
| 33 |
-
from src.services.embeddings import EmbeddingService
|
| 34 |
-
from src.services.statistical_analyzer import StatisticalAnalyzer
|
| 35 |
-
|
| 36 |
-
logger = structlog.get_logger()
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
class Orchestrator:
|
| 40 |
-
"""
|
| 41 |
-
The simple agent orchestrator - runs the Search -> Judge -> Loop cycle.
|
| 42 |
-
|
| 43 |
-
This is a generator-based design that yields events for real-time UI updates.
|
| 44 |
-
Uses pydantic-ai for structured LLM outputs without requiring the full
|
| 45 |
-
Microsoft Agent Framework.
|
| 46 |
-
"""
|
| 47 |
-
|
| 48 |
-
# Termination thresholds (code-enforced, not LLM-decided)
|
| 49 |
-
TERMINATION_CRITERIA: ClassVar[dict[str, float]] = {
|
| 50 |
-
"min_combined_score": 12.0, # mechanism + clinical >= 12
|
| 51 |
-
"min_score_with_volume": 10.0, # >= 10 if 50+ sources
|
| 52 |
-
"min_evidence_for_volume": 50.0, # Priority 3: evidence count threshold
|
| 53 |
-
"late_iteration_threshold": 8.0, # >= 8 in iterations 8+
|
| 54 |
-
"max_evidence_threshold": 100.0, # Force synthesis with 100+ sources
|
| 55 |
-
"emergency_iteration": 8.0, # Last 2 iterations = emergency mode
|
| 56 |
-
"min_confidence": 0.5, # Minimum confidence for emergency synthesis
|
| 57 |
-
"min_evidence_for_emergency": 30.0, # Priority 6: min evidence for emergency
|
| 58 |
-
}
|
| 59 |
-
|
| 60 |
-
def __init__(
|
| 61 |
-
self,
|
| 62 |
-
search_handler: SearchHandlerProtocol,
|
| 63 |
-
judge_handler: JudgeHandlerProtocol,
|
| 64 |
-
config: OrchestratorConfig | None = None,
|
| 65 |
-
enable_analysis: bool = False,
|
| 66 |
-
enable_embeddings: bool = True,
|
| 67 |
-
domain: ResearchDomain | str | None = None,
|
| 68 |
-
):
|
| 69 |
-
"""
|
| 70 |
-
Initialize the orchestrator.
|
| 71 |
-
|
| 72 |
-
Args:
|
| 73 |
-
search_handler: Handler for executing searches
|
| 74 |
-
judge_handler: Handler for assessing evidence
|
| 75 |
-
config: Optional configuration (uses defaults if not provided)
|
| 76 |
-
enable_analysis: Whether to perform statistical analysis (if Modal available)
|
| 77 |
-
enable_embeddings: Whether to use semantic search for ranking/dedup
|
| 78 |
-
domain: Research domain for customization
|
| 79 |
-
"""
|
| 80 |
-
self.search = search_handler
|
| 81 |
-
self.judge = judge_handler
|
| 82 |
-
self.config = config or OrchestratorConfig()
|
| 83 |
-
self.history: list[dict[str, Any]] = []
|
| 84 |
-
self._enable_analysis = enable_analysis and settings.modal_available
|
| 85 |
-
self._enable_embeddings = enable_embeddings
|
| 86 |
-
self.domain = domain
|
| 87 |
-
self.domain_config = get_domain_config(domain)
|
| 88 |
-
|
| 89 |
-
# Lazy-load services (typed for IDE support)
|
| 90 |
-
self._analyzer: StatisticalAnalyzer | None = None
|
| 91 |
-
self._embeddings: EmbeddingService | None = None
|
| 92 |
-
|
| 93 |
-
def _get_analyzer(self) -> StatisticalAnalyzer | None:
|
| 94 |
-
"""Lazy initialization of StatisticalAnalyzer."""
|
| 95 |
-
if self._analyzer is None:
|
| 96 |
-
from src.utils.service_loader import get_analyzer_if_available
|
| 97 |
-
|
| 98 |
-
self._analyzer = get_analyzer_if_available()
|
| 99 |
-
if self._analyzer is None:
|
| 100 |
-
self._enable_analysis = False
|
| 101 |
-
return self._analyzer
|
| 102 |
-
|
| 103 |
-
async def _run_analysis_phase(
|
| 104 |
-
self, query: str, evidence: list[Evidence], iteration: int
|
| 105 |
-
) -> AsyncGenerator[AgentEvent, None]:
|
| 106 |
-
"""Run the optional analysis phase."""
|
| 107 |
-
if not self._enable_analysis:
|
| 108 |
-
return
|
| 109 |
-
|
| 110 |
-
yield AgentEvent(
|
| 111 |
-
type="analyzing",
|
| 112 |
-
message="Running statistical analysis in Modal sandbox...",
|
| 113 |
-
data={},
|
| 114 |
-
iteration=iteration,
|
| 115 |
-
)
|
| 116 |
-
|
| 117 |
-
try:
|
| 118 |
-
analyzer = self._get_analyzer()
|
| 119 |
-
if analyzer is None:
|
| 120 |
-
logger.info("StatisticalAnalyzer not available, skipping analysis phase")
|
| 121 |
-
return
|
| 122 |
-
|
| 123 |
-
# Run Modal analysis (no agent_framework needed!)
|
| 124 |
-
analysis_result = await analyzer.analyze(
|
| 125 |
-
query=query,
|
| 126 |
-
evidence=evidence,
|
| 127 |
-
hypothesis=None, # Could add hypothesis generation later
|
| 128 |
-
)
|
| 129 |
-
|
| 130 |
-
yield AgentEvent(
|
| 131 |
-
type="analysis_complete",
|
| 132 |
-
message=f"Analysis verdict: {analysis_result.verdict}",
|
| 133 |
-
data=analysis_result.model_dump(),
|
| 134 |
-
iteration=iteration,
|
| 135 |
-
)
|
| 136 |
-
|
| 137 |
-
except ModalError as e:
|
| 138 |
-
logger.error("Modal analysis failed", error=str(e), exc_type="ModalError")
|
| 139 |
-
yield AgentEvent(
|
| 140 |
-
type="error",
|
| 141 |
-
message=f"Modal analysis failed: {e}",
|
| 142 |
-
data={"error": str(e), "recoverable": True},
|
| 143 |
-
iteration=iteration,
|
| 144 |
-
)
|
| 145 |
-
except Exception as e:
|
| 146 |
-
# Unexpected error - log with full context for debugging
|
| 147 |
-
logger.error(
|
| 148 |
-
"Modal analysis failed unexpectedly",
|
| 149 |
-
error=str(e),
|
| 150 |
-
exc_type=type(e).__name__,
|
| 151 |
-
)
|
| 152 |
-
yield AgentEvent(
|
| 153 |
-
type="error",
|
| 154 |
-
message=f"Modal analysis failed: {e}",
|
| 155 |
-
data={"error": str(e), "recoverable": True},
|
| 156 |
-
iteration=iteration,
|
| 157 |
-
)
|
| 158 |
-
|
| 159 |
-
def _should_synthesize(
|
| 160 |
-
self,
|
| 161 |
-
assessment: JudgeAssessment,
|
| 162 |
-
iteration: int,
|
| 163 |
-
max_iterations: int,
|
| 164 |
-
evidence_count: int,
|
| 165 |
-
) -> tuple[bool, str]:
|
| 166 |
-
"""
|
| 167 |
-
Code-enforced synthesis decision.
|
| 168 |
-
|
| 169 |
-
Returns (should_synthesize, reason).
|
| 170 |
-
"""
|
| 171 |
-
combined_score = (
|
| 172 |
-
assessment.details.mechanism_score + assessment.details.clinical_evidence_score
|
| 173 |
-
)
|
| 174 |
-
has_drug_candidates = len(assessment.details.drug_candidates) > 0
|
| 175 |
-
confidence = assessment.confidence
|
| 176 |
-
|
| 177 |
-
# Priority 1: LLM explicitly says sufficient with good scores
|
| 178 |
-
if assessment.sufficient and assessment.recommendation == "synthesize":
|
| 179 |
-
if combined_score >= 10:
|
| 180 |
-
return True, "judge_approved"
|
| 181 |
-
|
| 182 |
-
# Priority 2: High scores with drug candidates
|
| 183 |
-
if (
|
| 184 |
-
combined_score >= self.TERMINATION_CRITERIA["min_combined_score"]
|
| 185 |
-
and has_drug_candidates
|
| 186 |
-
):
|
| 187 |
-
return True, "high_scores_with_candidates"
|
| 188 |
-
|
| 189 |
-
# Priority 3: Good scores with high evidence volume
|
| 190 |
-
if (
|
| 191 |
-
combined_score >= self.TERMINATION_CRITERIA["min_score_with_volume"]
|
| 192 |
-
and evidence_count >= self.TERMINATION_CRITERIA["min_evidence_for_volume"]
|
| 193 |
-
):
|
| 194 |
-
return True, "good_scores_high_volume"
|
| 195 |
-
|
| 196 |
-
# Priority 4: Late iteration with acceptable scores (diminishing returns)
|
| 197 |
-
is_late_iteration = iteration >= max_iterations - 2
|
| 198 |
-
if (
|
| 199 |
-
is_late_iteration
|
| 200 |
-
and combined_score >= self.TERMINATION_CRITERIA["late_iteration_threshold"]
|
| 201 |
-
):
|
| 202 |
-
return True, "late_iteration_acceptable"
|
| 203 |
-
|
| 204 |
-
# Priority 5: Very high evidence count (enough to synthesize something)
|
| 205 |
-
if evidence_count >= self.TERMINATION_CRITERIA["max_evidence_threshold"]:
|
| 206 |
-
return True, "max_evidence_reached"
|
| 207 |
-
|
| 208 |
-
# Priority 6: Emergency synthesis (avoid garbage output)
|
| 209 |
-
if (
|
| 210 |
-
is_late_iteration
|
| 211 |
-
and evidence_count >= self.TERMINATION_CRITERIA["min_evidence_for_emergency"]
|
| 212 |
-
and confidence >= self.TERMINATION_CRITERIA["min_confidence"]
|
| 213 |
-
):
|
| 214 |
-
return True, "emergency_synthesis"
|
| 215 |
-
|
| 216 |
-
return False, "continue_searching"
|
| 217 |
-
|
| 218 |
-
async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]: # noqa: PLR0915
|
| 219 |
-
"""
|
| 220 |
-
Run the agent loop for a query.
|
| 221 |
-
|
| 222 |
-
Yields AgentEvent objects for each step, allowing real-time UI updates.
|
| 223 |
-
|
| 224 |
-
Args:
|
| 225 |
-
query: The user's research question
|
| 226 |
-
|
| 227 |
-
Yields:
|
| 228 |
-
AgentEvent objects for each step of the process
|
| 229 |
-
"""
|
| 230 |
-
# Import here to avoid circular deps if any
|
| 231 |
-
from src.agents.graph.state import Hypothesis
|
| 232 |
-
from src.services.research_memory import ResearchMemory
|
| 233 |
-
|
| 234 |
-
logger.info("Starting orchestrator", query=query)
|
| 235 |
-
|
| 236 |
-
yield AgentEvent(
|
| 237 |
-
type="started",
|
| 238 |
-
message=f"Starting research for: {query}",
|
| 239 |
-
iteration=0,
|
| 240 |
-
)
|
| 241 |
-
|
| 242 |
-
# Initialize Shared Memory
|
| 243 |
-
# We keep 'all_evidence' for local tracking/reporting, but use Memory for intelligence
|
| 244 |
-
memory = ResearchMemory(query=query)
|
| 245 |
-
all_evidence: list[Evidence] = []
|
| 246 |
-
current_queries = [query]
|
| 247 |
-
iteration = 0
|
| 248 |
-
|
| 249 |
-
while iteration < self.config.max_iterations:
|
| 250 |
-
iteration += 1
|
| 251 |
-
logger.info("Iteration", iteration=iteration, queries=current_queries)
|
| 252 |
-
|
| 253 |
-
# === SEARCH PHASE ===
|
| 254 |
-
yield AgentEvent(
|
| 255 |
-
type="searching",
|
| 256 |
-
message=f"Searching for: {', '.join(current_queries[:3])}...",
|
| 257 |
-
iteration=iteration,
|
| 258 |
-
)
|
| 259 |
-
|
| 260 |
-
try:
|
| 261 |
-
# Execute searches for all current queries
|
| 262 |
-
search_tasks = [
|
| 263 |
-
self.search.execute(q, self.config.max_results_per_tool)
|
| 264 |
-
for q in current_queries[:3] # Limit to 3 queries per iteration
|
| 265 |
-
]
|
| 266 |
-
search_results = await asyncio.gather(*search_tasks, return_exceptions=True)
|
| 267 |
-
|
| 268 |
-
# Collect evidence from successful searches
|
| 269 |
-
new_evidence: list[Evidence] = []
|
| 270 |
-
errors: list[str] = []
|
| 271 |
-
|
| 272 |
-
for q, result in zip(current_queries[:3], search_results, strict=False):
|
| 273 |
-
if isinstance(result, Exception):
|
| 274 |
-
errors.append(f"Search for '{q}' failed: {result!s}")
|
| 275 |
-
elif isinstance(result, SearchResult):
|
| 276 |
-
new_evidence.extend(result.evidence)
|
| 277 |
-
errors.extend(result.errors)
|
| 278 |
-
else:
|
| 279 |
-
# Should not happen with return_exceptions=True but safe fallback
|
| 280 |
-
errors.append(f"Unknown result type for '{q}': {type(result)}")
|
| 281 |
-
|
| 282 |
-
# === MEMORY INTEGRATION: Store and Deduplicate ===
|
| 283 |
-
# ResearchMemory handles semantic deduplication and persistence
|
| 284 |
-
# It returns IDs of actual NEW evidence
|
| 285 |
-
new_ids = await memory.store_evidence(new_evidence)
|
| 286 |
-
|
| 287 |
-
# Filter new_evidence to only keep what was actually new (based on IDs)
|
| 288 |
-
# Note: This assumes IDs are URLs, which match Citation.url
|
| 289 |
-
unique_new = [e for e in new_evidence if e.citation.url in new_ids]
|
| 290 |
-
|
| 291 |
-
all_evidence.extend(unique_new)
|
| 292 |
-
|
| 293 |
-
yield AgentEvent(
|
| 294 |
-
type="search_complete",
|
| 295 |
-
message=f"Found {len(unique_new)} new sources ({len(all_evidence)} total)",
|
| 296 |
-
data={
|
| 297 |
-
"new_count": len(unique_new),
|
| 298 |
-
"total_count": len(all_evidence),
|
| 299 |
-
},
|
| 300 |
-
iteration=iteration,
|
| 301 |
-
)
|
| 302 |
-
|
| 303 |
-
if errors:
|
| 304 |
-
logger.warning("Search errors", errors=errors)
|
| 305 |
-
|
| 306 |
-
except SearchError as e:
|
| 307 |
-
logger.error("Search phase failed", error=str(e), exc_type="SearchError")
|
| 308 |
-
yield AgentEvent(
|
| 309 |
-
type="error",
|
| 310 |
-
message=f"Search failed: {e!s}",
|
| 311 |
-
data={"recoverable": True, "error_type": "search"},
|
| 312 |
-
iteration=iteration,
|
| 313 |
-
)
|
| 314 |
-
continue
|
| 315 |
-
except Exception as e:
|
| 316 |
-
# Unexpected error - log full context for debugging
|
| 317 |
-
logger.error(
|
| 318 |
-
"Search phase failed unexpectedly",
|
| 319 |
-
error=str(e),
|
| 320 |
-
exc_type=type(e).__name__,
|
| 321 |
-
)
|
| 322 |
-
yield AgentEvent(
|
| 323 |
-
type="error",
|
| 324 |
-
message=f"Search failed: {e!s}",
|
| 325 |
-
data={"recoverable": True, "error_type": "unexpected"},
|
| 326 |
-
iteration=iteration,
|
| 327 |
-
)
|
| 328 |
-
continue
|
| 329 |
-
|
| 330 |
-
# === JUDGE PHASE ===
|
| 331 |
-
yield AgentEvent(
|
| 332 |
-
type="judging",
|
| 333 |
-
message=f"Evaluating evidence (Memory: {len(memory.evidence_ids)} docs)...",
|
| 334 |
-
iteration=iteration,
|
| 335 |
-
)
|
| 336 |
-
|
| 337 |
-
try:
|
| 338 |
-
# Retrieve RELEVANT evidence from memory for the judge
|
| 339 |
-
# This keeps the context window manageable and focused
|
| 340 |
-
judge_context = await memory.get_relevant_evidence(n=30)
|
| 341 |
-
|
| 342 |
-
# Fallback if memory is empty (shouldn't happen if search worked)
|
| 343 |
-
if not judge_context and all_evidence:
|
| 344 |
-
judge_context = all_evidence[-30:]
|
| 345 |
-
|
| 346 |
-
assessment = await self.judge.assess(
|
| 347 |
-
query, judge_context, iteration, self.config.max_iterations
|
| 348 |
-
)
|
| 349 |
-
|
| 350 |
-
# === MEMORY INTEGRATION: Track Hypotheses ===
|
| 351 |
-
# Convert loose strings to structured Hypotheses
|
| 352 |
-
for candidate in assessment.details.drug_candidates:
|
| 353 |
-
h = Hypothesis(
|
| 354 |
-
id=candidate.replace(" ", "_").lower(),
|
| 355 |
-
statement=f"{candidate} is a potential candidate for {query}",
|
| 356 |
-
status="proposed",
|
| 357 |
-
confidence=assessment.confidence,
|
| 358 |
-
reasoning=f" identified in iteration {iteration}",
|
| 359 |
-
)
|
| 360 |
-
memory.add_hypothesis(h)
|
| 361 |
-
|
| 362 |
-
yield AgentEvent(
|
| 363 |
-
type="judge_complete",
|
| 364 |
-
message=(
|
| 365 |
-
f"Assessment: {assessment.recommendation} "
|
| 366 |
-
f"(confidence: {assessment.confidence:.0%})"
|
| 367 |
-
),
|
| 368 |
-
data={
|
| 369 |
-
"sufficient": assessment.sufficient,
|
| 370 |
-
"confidence": assessment.confidence,
|
| 371 |
-
"mechanism_score": assessment.details.mechanism_score,
|
| 372 |
-
"clinical_score": assessment.details.clinical_evidence_score,
|
| 373 |
-
},
|
| 374 |
-
iteration=iteration,
|
| 375 |
-
)
|
| 376 |
-
|
| 377 |
-
# Record this iteration in history
|
| 378 |
-
self.history.append(
|
| 379 |
-
{
|
| 380 |
-
"iteration": iteration,
|
| 381 |
-
"queries": current_queries,
|
| 382 |
-
"evidence_count": len(all_evidence),
|
| 383 |
-
"assessment": assessment.model_dump(),
|
| 384 |
-
}
|
| 385 |
-
)
|
| 386 |
-
|
| 387 |
-
# === DECISION PHASE (Code-Enforced) ===
|
| 388 |
-
should_synth, reason = self._should_synthesize(
|
| 389 |
-
assessment=assessment,
|
| 390 |
-
iteration=iteration,
|
| 391 |
-
max_iterations=self.config.max_iterations,
|
| 392 |
-
evidence_count=len(all_evidence),
|
| 393 |
-
)
|
| 394 |
-
|
| 395 |
-
logger.info(
|
| 396 |
-
"Synthesis decision",
|
| 397 |
-
should_synthesize=should_synth,
|
| 398 |
-
reason=reason,
|
| 399 |
-
iteration=iteration,
|
| 400 |
-
combined_score=assessment.details.mechanism_score
|
| 401 |
-
+ assessment.details.clinical_evidence_score,
|
| 402 |
-
evidence_count=len(all_evidence),
|
| 403 |
-
confidence=assessment.confidence,
|
| 404 |
-
)
|
| 405 |
-
|
| 406 |
-
if should_synth:
|
| 407 |
-
# Log synthesis trigger reason for debugging
|
| 408 |
-
if reason != "judge_approved":
|
| 409 |
-
logger.info(f"Code-enforced synthesis triggered: {reason}")
|
| 410 |
-
|
| 411 |
-
# Optional Analysis Phase
|
| 412 |
-
async for event in self._run_analysis_phase(query, all_evidence, iteration):
|
| 413 |
-
yield event
|
| 414 |
-
|
| 415 |
-
yield AgentEvent(
|
| 416 |
-
type="synthesizing",
|
| 417 |
-
message=f"Evidence sufficient ({reason})! Preparing synthesis...",
|
| 418 |
-
iteration=iteration,
|
| 419 |
-
)
|
| 420 |
-
|
| 421 |
-
# Generate final response using LLM narrative synthesis
|
| 422 |
-
# Use all gathered evidence for the final report
|
| 423 |
-
final_response = await self._generate_synthesis(query, all_evidence, assessment)
|
| 424 |
-
|
| 425 |
-
yield AgentEvent(
|
| 426 |
-
type="complete",
|
| 427 |
-
message=final_response,
|
| 428 |
-
data={
|
| 429 |
-
"evidence_count": len(all_evidence),
|
| 430 |
-
"iterations": iteration,
|
| 431 |
-
"synthesis_reason": reason,
|
| 432 |
-
"drug_candidates": assessment.details.drug_candidates,
|
| 433 |
-
"key_findings": assessment.details.key_findings,
|
| 434 |
-
},
|
| 435 |
-
iteration=iteration,
|
| 436 |
-
)
|
| 437 |
-
return
|
| 438 |
-
|
| 439 |
-
else:
|
| 440 |
-
# Need more evidence - prepare next queries
|
| 441 |
-
current_queries = assessment.next_search_queries or [
|
| 442 |
-
f"{query} mechanism of action",
|
| 443 |
-
f"{query} clinical evidence",
|
| 444 |
-
]
|
| 445 |
-
|
| 446 |
-
yield AgentEvent(
|
| 447 |
-
type="looping",
|
| 448 |
-
message=(
|
| 449 |
-
f"Gathering more evidence (scores: {assessment.details.mechanism_score}"
|
| 450 |
-
f"+{assessment.details.clinical_evidence_score}). "
|
| 451 |
-
f"Next: {', '.join(current_queries[:2])}..."
|
| 452 |
-
),
|
| 453 |
-
data={"next_queries": current_queries, "reason": reason},
|
| 454 |
-
iteration=iteration,
|
| 455 |
-
)
|
| 456 |
-
|
| 457 |
-
except JudgeError as e:
|
| 458 |
-
logger.error("Judge phase failed", error=str(e), exc_type="JudgeError")
|
| 459 |
-
yield AgentEvent(
|
| 460 |
-
type="error",
|
| 461 |
-
message=f"Assessment failed: {e!s}",
|
| 462 |
-
data={"recoverable": True, "error_type": "judge"},
|
| 463 |
-
iteration=iteration,
|
| 464 |
-
)
|
| 465 |
-
continue
|
| 466 |
-
except Exception as e:
|
| 467 |
-
# Unexpected error - log full context for debugging
|
| 468 |
-
logger.error(
|
| 469 |
-
"Judge phase failed unexpectedly",
|
| 470 |
-
error=str(e),
|
| 471 |
-
exc_type=type(e).__name__,
|
| 472 |
-
)
|
| 473 |
-
yield AgentEvent(
|
| 474 |
-
type="error",
|
| 475 |
-
message=f"Assessment failed: {e!s}",
|
| 476 |
-
data={"recoverable": True, "error_type": "unexpected"},
|
| 477 |
-
iteration=iteration,
|
| 478 |
-
)
|
| 479 |
-
continue
|
| 480 |
-
|
| 481 |
-
# Max iterations reached
|
| 482 |
-
yield AgentEvent(
|
| 483 |
-
type="complete",
|
| 484 |
-
message=self._generate_partial_synthesis(query, all_evidence),
|
| 485 |
-
data={
|
| 486 |
-
"evidence_count": len(all_evidence),
|
| 487 |
-
"iterations": iteration,
|
| 488 |
-
"max_reached": True,
|
| 489 |
-
},
|
| 490 |
-
iteration=iteration,
|
| 491 |
-
)
|
| 492 |
-
|
| 493 |
-
async def _generate_synthesis(
|
| 494 |
-
self,
|
| 495 |
-
query: str,
|
| 496 |
-
evidence: list[Evidence],
|
| 497 |
-
assessment: JudgeAssessment,
|
| 498 |
-
) -> str:
|
| 499 |
-
"""
|
| 500 |
-
Generate the final synthesis response using LLM.
|
| 501 |
-
|
| 502 |
-
This method calls an LLM to generate a narrative research report,
|
| 503 |
-
following the Microsoft Agent Framework pattern of using LLM synthesis
|
| 504 |
-
instead of string templating.
|
| 505 |
-
|
| 506 |
-
Args:
|
| 507 |
-
query: The original question
|
| 508 |
-
evidence: All collected evidence
|
| 509 |
-
assessment: The final assessment
|
| 510 |
-
|
| 511 |
-
Returns:
|
| 512 |
-
Narrative synthesis as markdown
|
| 513 |
-
"""
|
| 514 |
-
# Build evidence summary for LLM context (limit to avoid token overflow)
|
| 515 |
-
evidence_lines = []
|
| 516 |
-
for e in evidence[:20]:
|
| 517 |
-
authors = ", ".join(e.citation.authors[:2]) if e.citation.authors else "Unknown"
|
| 518 |
-
content_preview = e.content[:200].replace("\n", " ")
|
| 519 |
-
evidence_lines.append(
|
| 520 |
-
f"- {e.citation.title} ({authors}, {e.citation.date}): {content_preview}..."
|
| 521 |
-
)
|
| 522 |
-
evidence_summary = "\n".join(evidence_lines)
|
| 523 |
-
|
| 524 |
-
# Format synthesis prompt with assessment data
|
| 525 |
-
user_prompt = format_synthesis_prompt(
|
| 526 |
-
query=query,
|
| 527 |
-
evidence_summary=evidence_summary,
|
| 528 |
-
drug_candidates=assessment.details.drug_candidates,
|
| 529 |
-
key_findings=assessment.details.key_findings,
|
| 530 |
-
mechanism_score=assessment.details.mechanism_score,
|
| 531 |
-
clinical_score=assessment.details.clinical_evidence_score,
|
| 532 |
-
confidence=assessment.confidence,
|
| 533 |
-
)
|
| 534 |
-
|
| 535 |
-
# Get domain-specific system prompt
|
| 536 |
-
system_prompt = get_synthesis_system_prompt(self.domain)
|
| 537 |
-
|
| 538 |
-
try:
|
| 539 |
-
# Type-safe tier detection using Protocol (CodeRabbit review recommendation)
|
| 540 |
-
# This replaces hasattr() with isinstance() for compile-time type safety
|
| 541 |
-
from src.orchestrators.base import SynthesizableJudge
|
| 542 |
-
from src.utils.exceptions import SynthesisError
|
| 543 |
-
|
| 544 |
-
if isinstance(self.judge, SynthesizableJudge):
|
| 545 |
-
logger.info("Using judge's free-tier synthesis method")
|
| 546 |
-
# synthesize() now raises SynthesisError on failure (CodeRabbit fix)
|
| 547 |
-
narrative = await self.judge.synthesize(system_prompt, user_prompt)
|
| 548 |
-
logger.info("Free-tier synthesis completed", chars=len(narrative))
|
| 549 |
-
else:
|
| 550 |
-
# Paid tier: use PydanticAI with get_model()
|
| 551 |
-
from pydantic_ai import Agent
|
| 552 |
-
|
| 553 |
-
from src.agent_factory.judges import get_model
|
| 554 |
-
|
| 555 |
-
# Create synthesis agent with retries (matching Judge agent pattern)
|
| 556 |
-
# Without retries, transient errors immediately trigger fallback
|
| 557 |
-
agent: Agent[None, str] = Agent(
|
| 558 |
-
model=get_model(),
|
| 559 |
-
output_type=str,
|
| 560 |
-
system_prompt=system_prompt,
|
| 561 |
-
retries=3, # Match Judge agent - retry on transient errors
|
| 562 |
-
)
|
| 563 |
-
result = await agent.run(user_prompt)
|
| 564 |
-
narrative = result.output
|
| 565 |
-
|
| 566 |
-
logger.info("LLM narrative synthesis completed", chars=len(narrative))
|
| 567 |
-
|
| 568 |
-
except SynthesisError as e:
|
| 569 |
-
# Handle SynthesisError with detailed context (CodeRabbit recommendation)
|
| 570 |
-
logger.error(
|
| 571 |
-
"Free-tier synthesis failed",
|
| 572 |
-
attempted_models=e.attempted_models,
|
| 573 |
-
errors=e.errors,
|
| 574 |
-
evidence_count=len(evidence),
|
| 575 |
-
)
|
| 576 |
-
# Surface detailed error to user
|
| 577 |
-
models_str = ", ".join(e.attempted_models) if e.attempted_models else "unknown"
|
| 578 |
-
error_note = (
|
| 579 |
-
f"\n\n> β οΈ **Note**: AI narrative synthesis unavailable. "
|
| 580 |
-
f"Showing structured summary.\n"
|
| 581 |
-
f"> _Attempted models: {models_str}_\n"
|
| 582 |
-
)
|
| 583 |
-
template = self._generate_template_synthesis(query, evidence, assessment)
|
| 584 |
-
return f"{error_note}\n{template}"
|
| 585 |
-
|
| 586 |
-
except Exception as e:
|
| 587 |
-
# Fallback to template synthesis if LLM fails
|
| 588 |
-
# Log error details for debugging
|
| 589 |
-
logger.error(
|
| 590 |
-
"LLM synthesis failed, using template fallback",
|
| 591 |
-
error=str(e),
|
| 592 |
-
exc_type=type(e).__name__,
|
| 593 |
-
evidence_count=len(evidence),
|
| 594 |
-
exc_info=True, # Capture stack trace for debugging
|
| 595 |
-
)
|
| 596 |
-
# Surface the error to user (MS Agent Framework pattern)
|
| 597 |
-
# Don't silently fall back - let user know synthesis degraded
|
| 598 |
-
error_note = (
|
| 599 |
-
f"\n\n> β οΈ **Note**: AI narrative synthesis unavailable. "
|
| 600 |
-
f"Showing structured summary.\n"
|
| 601 |
-
f"> _Error: {type(e).__name__}_\n"
|
| 602 |
-
)
|
| 603 |
-
template = self._generate_template_synthesis(query, evidence, assessment)
|
| 604 |
-
return f"{error_note}\n{template}"
|
| 605 |
-
|
| 606 |
-
# Add full citation list footer
|
| 607 |
-
citations = "\n".join(
|
| 608 |
-
f"{i + 1}. [{e.citation.title}]({e.citation.url}) "
|
| 609 |
-
f"({e.citation.source.upper()}, {e.citation.date})"
|
| 610 |
-
for i, e in enumerate(evidence[:15])
|
| 611 |
-
)
|
| 612 |
-
|
| 613 |
-
return f"""{narrative}
|
| 614 |
-
|
| 615 |
-
---
|
| 616 |
-
### Full Citation List ({len(evidence)} sources)
|
| 617 |
-
{citations}
|
| 618 |
-
|
| 619 |
-
*Analysis based on {len(evidence)} sources across {len(self.history)} iterations.*
|
| 620 |
-
"""
|
| 621 |
-
|
| 622 |
-
def _generate_template_synthesis(
|
| 623 |
-
self,
|
| 624 |
-
query: str,
|
| 625 |
-
evidence: list[Evidence],
|
| 626 |
-
assessment: JudgeAssessment,
|
| 627 |
-
) -> str:
|
| 628 |
-
"""
|
| 629 |
-
Generate fallback template synthesis (no LLM).
|
| 630 |
-
|
| 631 |
-
Used when LLM synthesis fails or is unavailable.
|
| 632 |
-
|
| 633 |
-
Args:
|
| 634 |
-
query: The original question
|
| 635 |
-
evidence: All collected evidence
|
| 636 |
-
assessment: The final assessment
|
| 637 |
-
|
| 638 |
-
Returns:
|
| 639 |
-
Formatted synthesis as markdown (bullet-point style)
|
| 640 |
-
"""
|
| 641 |
-
drug_list = (
|
| 642 |
-
"\n".join([f"- **{d}**" for d in assessment.details.drug_candidates])
|
| 643 |
-
or "- No specific candidates identified"
|
| 644 |
-
)
|
| 645 |
-
findings_list = (
|
| 646 |
-
"\n".join([f"- {f}" for f in assessment.details.key_findings]) or "- See evidence below"
|
| 647 |
-
)
|
| 648 |
-
|
| 649 |
-
citations = "\n".join(
|
| 650 |
-
[
|
| 651 |
-
f"{i + 1}. [{e.citation.title}]({e.citation.url}) "
|
| 652 |
-
f"({e.citation.source.upper()}, {e.citation.date})"
|
| 653 |
-
for i, e in enumerate(evidence[:10])
|
| 654 |
-
]
|
| 655 |
-
)
|
| 656 |
-
|
| 657 |
-
return f"""{self.domain_config.report_title}
|
| 658 |
-
|
| 659 |
-
### Question
|
| 660 |
-
{query}
|
| 661 |
-
|
| 662 |
-
### Drug Candidates
|
| 663 |
-
{drug_list}
|
| 664 |
-
|
| 665 |
-
### Key Findings
|
| 666 |
-
{findings_list}
|
| 667 |
-
|
| 668 |
-
### Assessment
|
| 669 |
-
- **Mechanism Score**: {assessment.details.mechanism_score}/10
|
| 670 |
-
- **Clinical Evidence Score**: {assessment.details.clinical_evidence_score}/10
|
| 671 |
-
- **Confidence**: {assessment.confidence:.0%}
|
| 672 |
-
|
| 673 |
-
### Reasoning
|
| 674 |
-
{assessment.reasoning}
|
| 675 |
-
|
| 676 |
-
### Citations ({len(evidence)} sources)
|
| 677 |
-
{citations}
|
| 678 |
-
|
| 679 |
-
---
|
| 680 |
-
*Analysis based on {len(evidence)} sources across {len(self.history)} iterations.*
|
| 681 |
-
"""
|
| 682 |
-
|
| 683 |
-
def _generate_partial_synthesis(
|
| 684 |
-
self,
|
| 685 |
-
query: str,
|
| 686 |
-
evidence: list[Evidence],
|
| 687 |
-
) -> str:
|
| 688 |
-
"""
|
| 689 |
-
Generate a REAL synthesis when max iterations reached.
|
| 690 |
-
|
| 691 |
-
Even when forced to stop, we should provide:
|
| 692 |
-
- Drug candidates (if any were found)
|
| 693 |
-
- Key findings
|
| 694 |
-
- Assessment scores
|
| 695 |
-
- Actionable citations
|
| 696 |
-
|
| 697 |
-
This is still better than a citation dump.
|
| 698 |
-
"""
|
| 699 |
-
# Extract data from last assessment if available
|
| 700 |
-
last_assessment = self.history[-1]["assessment"] if self.history else {}
|
| 701 |
-
details = last_assessment.get("details", {})
|
| 702 |
-
|
| 703 |
-
drug_candidates = details.get("drug_candidates", [])
|
| 704 |
-
key_findings = details.get("key_findings", [])
|
| 705 |
-
mechanism_score = details.get("mechanism_score", 0)
|
| 706 |
-
clinical_score = details.get("clinical_evidence_score", 0)
|
| 707 |
-
reasoning = last_assessment.get("reasoning", "Analysis incomplete due to iteration limit.")
|
| 708 |
-
|
| 709 |
-
# Format drug candidates
|
| 710 |
-
if drug_candidates:
|
| 711 |
-
drug_list = "\n".join([f"- **{d}**" for d in drug_candidates[:5]])
|
| 712 |
-
else:
|
| 713 |
-
drug_list = (
|
| 714 |
-
"- *No specific drug candidates identified in evidence*\n"
|
| 715 |
-
"- *Try a more specific query or add an API key for better analysis*"
|
| 716 |
-
)
|
| 717 |
-
|
| 718 |
-
# Format key findings
|
| 719 |
-
if key_findings:
|
| 720 |
-
findings_list = "\n".join([f"- {f}" for f in key_findings[:5]])
|
| 721 |
-
else:
|
| 722 |
-
findings_list = (
|
| 723 |
-
"- *Key findings require further analysis*\n"
|
| 724 |
-
"- *See citations below for relevant sources*"
|
| 725 |
-
)
|
| 726 |
-
|
| 727 |
-
# Format citations (top 10)
|
| 728 |
-
citations = "\n".join(
|
| 729 |
-
[
|
| 730 |
-
f"{i + 1}. [{e.citation.title}]({e.citation.url}) "
|
| 731 |
-
f"({e.citation.source.upper()}, {e.citation.date})"
|
| 732 |
-
for i, e in enumerate(evidence[:10])
|
| 733 |
-
]
|
| 734 |
-
)
|
| 735 |
-
|
| 736 |
-
combined_score = mechanism_score + clinical_score
|
| 737 |
-
mech_strength = (
|
| 738 |
-
"Strong" if mechanism_score >= 7 else "Moderate" if mechanism_score >= 4 else "Limited"
|
| 739 |
-
)
|
| 740 |
-
clin_strength = (
|
| 741 |
-
"Strong" if clinical_score >= 7 else "Moderate" if clinical_score >= 4 else "Limited"
|
| 742 |
-
)
|
| 743 |
-
comb_strength = "Sufficient" if combined_score >= 12 else "Partial"
|
| 744 |
-
|
| 745 |
-
return f"""{self.domain_config.report_title}
|
| 746 |
-
|
| 747 |
-
### Research Question
|
| 748 |
-
{query}
|
| 749 |
-
|
| 750 |
-
### Status
|
| 751 |
-
Analysis based on {len(evidence)} sources across {len(self.history)} iterations.
|
| 752 |
-
Maximum iterations reached - results may be incomplete.
|
| 753 |
-
|
| 754 |
-
### Drug Candidates Identified
|
| 755 |
-
{drug_list}
|
| 756 |
-
|
| 757 |
-
### Key Findings
|
| 758 |
-
{findings_list}
|
| 759 |
-
|
| 760 |
-
### Evidence Quality Scores
|
| 761 |
-
| Criterion | Score | Interpretation |
|
| 762 |
-
|-----------|-------|----------------|
|
| 763 |
-
| Mechanism | {mechanism_score}/10 | {mech_strength} mechanistic evidence |
|
| 764 |
-
| Clinical | {clinical_score}/10 | {clin_strength} clinical support |
|
| 765 |
-
| Combined | {combined_score}/20 | {comb_strength} for synthesis |
|
| 766 |
-
|
| 767 |
-
### Analysis Summary
|
| 768 |
-
{reasoning}
|
| 769 |
-
|
| 770 |
-
### Top Citations ({len(evidence)} sources total)
|
| 771 |
-
{citations}
|
| 772 |
-
|
| 773 |
-
---
|
| 774 |
-
*For more complete analysis:*
|
| 775 |
-
- *Add an OpenAI or Anthropic API key for enhanced LLM analysis*
|
| 776 |
-
- *Try a more specific query (e.g., include drug names)*
|
| 777 |
-
- *Use Advanced mode for multi-agent research*
|
| 778 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -122,7 +122,8 @@ def format_user_prompt(
|
|
| 122 |
NOTE: Evidence should be pre-selected using select_evidence_for_judge().
|
| 123 |
This function assumes evidence is already capped.
|
| 124 |
"""
|
| 125 |
-
|
|
|
|
| 126 |
max_content_len = 1500
|
| 127 |
scoring_prompt = get_scoring_prompt(domain)
|
| 128 |
|
|
|
|
| 122 |
NOTE: Evidence should be pre-selected using select_evidence_for_judge().
|
| 123 |
This function assumes evidence is already capped.
|
| 124 |
"""
|
| 125 |
+
# Use explicit None check - 0 is a valid count (empty evidence)
|
| 126 |
+
total_count = total_evidence_count if total_evidence_count is not None else len(evidence)
|
| 127 |
max_content_len = 1500
|
| 128 |
scoring_prompt = get_scoring_prompt(domain)
|
| 129 |
|
|
@@ -27,7 +27,8 @@ class Settings(BaseSettings):
|
|
| 27 |
# LLM Configuration
|
| 28 |
openai_api_key: str | None = Field(default=None, description="OpenAI API key")
|
| 29 |
anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
|
| 30 |
-
|
|
|
|
| 31 |
default="openai", description="Which LLM provider to use"
|
| 32 |
)
|
| 33 |
openai_model: str = Field(default="gpt-5", description="OpenAI model name")
|
|
@@ -93,12 +94,15 @@ class Settings(BaseSettings):
|
|
| 93 |
|
| 94 |
def get_api_key(self) -> str:
|
| 95 |
"""Get the API key for the configured provider."""
|
| 96 |
-
|
|
|
|
|
|
|
|
|
|
| 97 |
if not self.openai_api_key:
|
| 98 |
raise ConfigurationError("OPENAI_API_KEY not set")
|
| 99 |
return self.openai_api_key
|
| 100 |
|
| 101 |
-
if
|
| 102 |
if not self.anthropic_api_key:
|
| 103 |
raise ConfigurationError("ANTHROPIC_API_KEY not set")
|
| 104 |
return self.anthropic_api_key
|
|
@@ -124,6 +128,11 @@ class Settings(BaseSettings):
|
|
| 124 |
"""Check if Anthropic API key is available."""
|
| 125 |
return bool(self.anthropic_api_key)
|
| 126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
@property
|
| 128 |
def has_huggingface_key(self) -> bool:
|
| 129 |
"""Check if HuggingFace token is available."""
|
|
@@ -132,7 +141,12 @@ class Settings(BaseSettings):
|
|
| 132 |
@property
|
| 133 |
def has_any_llm_key(self) -> bool:
|
| 134 |
"""Check if any LLM API key is available."""
|
| 135 |
-
return
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
|
| 137 |
|
| 138 |
def get_settings() -> Settings:
|
|
|
|
| 27 |
# LLM Configuration
|
| 28 |
openai_api_key: str | None = Field(default=None, description="OpenAI API key")
|
| 29 |
anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
|
| 30 |
+
gemini_api_key: str | None = Field(default=None, description="Google Gemini API key")
|
| 31 |
+
llm_provider: Literal["openai", "anthropic", "huggingface", "gemini"] = Field(
|
| 32 |
default="openai", description="Which LLM provider to use"
|
| 33 |
)
|
| 34 |
openai_model: str = Field(default="gpt-5", description="OpenAI model name")
|
|
|
|
| 94 |
|
| 95 |
def get_api_key(self) -> str:
|
| 96 |
"""Get the API key for the configured provider."""
|
| 97 |
+
# Normalize provider for case-insensitive matching
|
| 98 |
+
provider_lower = self.llm_provider.lower() if self.llm_provider else ""
|
| 99 |
+
|
| 100 |
+
if provider_lower == "openai":
|
| 101 |
if not self.openai_api_key:
|
| 102 |
raise ConfigurationError("OPENAI_API_KEY not set")
|
| 103 |
return self.openai_api_key
|
| 104 |
|
| 105 |
+
if provider_lower == "anthropic":
|
| 106 |
if not self.anthropic_api_key:
|
| 107 |
raise ConfigurationError("ANTHROPIC_API_KEY not set")
|
| 108 |
return self.anthropic_api_key
|
|
|
|
| 128 |
"""Check if Anthropic API key is available."""
|
| 129 |
return bool(self.anthropic_api_key)
|
| 130 |
|
| 131 |
+
@property
|
| 132 |
+
def has_gemini_key(self) -> bool:
|
| 133 |
+
"""Check if Gemini API key is available."""
|
| 134 |
+
return bool(self.gemini_api_key)
|
| 135 |
+
|
| 136 |
@property
|
| 137 |
def has_huggingface_key(self) -> bool:
|
| 138 |
"""Check if HuggingFace token is available."""
|
|
|
|
| 141 |
@property
|
| 142 |
def has_any_llm_key(self) -> bool:
|
| 143 |
"""Check if any LLM API key is available."""
|
| 144 |
+
return (
|
| 145 |
+
self.has_openai_key
|
| 146 |
+
or self.has_anthropic_key
|
| 147 |
+
or self.has_huggingface_key
|
| 148 |
+
or self.has_gemini_key
|
| 149 |
+
)
|
| 150 |
|
| 151 |
|
| 152 |
def get_settings() -> Settings:
|
|
@@ -1,106 +1,69 @@
|
|
| 1 |
"""Centralized LLM client factory.
|
| 2 |
|
| 3 |
-
This module provides factory functions for creating LLM clients
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
Why Magentic requires OpenAI:
|
| 7 |
-
- Magentic agents use the @ai_function decorator for tool calling
|
| 8 |
-
- This requires structured function calling protocol (tools, tool_choice)
|
| 9 |
-
- OpenAI's API supports this natively
|
| 10 |
-
- Anthropic/HuggingFace Inference APIs are text-in/text-out only
|
| 11 |
"""
|
| 12 |
|
| 13 |
-
from typing import
|
| 14 |
|
|
|
|
|
|
|
| 15 |
from src.utils.config import settings
|
| 16 |
from src.utils.exceptions import ConfigurationError
|
| 17 |
|
| 18 |
-
if TYPE_CHECKING:
|
| 19 |
-
from agent_framework.openai import OpenAIChatClient
|
| 20 |
|
| 21 |
-
|
| 22 |
-
def get_magentic_client() -> "OpenAIChatClient":
|
| 23 |
"""
|
| 24 |
-
Get the
|
| 25 |
-
|
| 26 |
-
Magentic requires OpenAI because it uses function calling protocol:
|
| 27 |
-
- @ai_function decorators define callable tools
|
| 28 |
-
- LLM returns structured tool calls (not just text)
|
| 29 |
-
- Requires OpenAI's tools/function_call API support
|
| 30 |
-
|
| 31 |
-
Raises:
|
| 32 |
-
ConfigurationError: If OPENAI_API_KEY is not set
|
| 33 |
|
| 34 |
-
|
| 35 |
-
Configured OpenAIChatClient for Magentic agents
|
| 36 |
"""
|
| 37 |
-
|
| 38 |
-
from agent_framework.openai import OpenAIChatClient
|
| 39 |
-
|
| 40 |
-
api_key = settings.get_openai_api_key()
|
| 41 |
-
|
| 42 |
-
return OpenAIChatClient(
|
| 43 |
-
model_id=settings.openai_model,
|
| 44 |
-
api_key=api_key,
|
| 45 |
-
)
|
| 46 |
|
| 47 |
|
| 48 |
def get_pydantic_ai_model() -> Any:
|
| 49 |
"""
|
| 50 |
Get the appropriate model for pydantic-ai based on configuration.
|
| 51 |
-
|
| 52 |
-
Uses the configured LLM_PROVIDER to select between OpenAI and Anthropic.
|
| 53 |
-
This is used by simple mode components (JudgeHandler, etc.)
|
| 54 |
-
|
| 55 |
-
Returns:
|
| 56 |
-
Configured pydantic-ai model
|
| 57 |
"""
|
| 58 |
from pydantic_ai.models.anthropic import AnthropicModel
|
| 59 |
from pydantic_ai.models.openai import OpenAIChatModel
|
| 60 |
from pydantic_ai.providers.anthropic import AnthropicProvider
|
| 61 |
from pydantic_ai.providers.openai import OpenAIProvider
|
| 62 |
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
| 64 |
if not settings.openai_api_key:
|
| 65 |
raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
|
| 66 |
provider = OpenAIProvider(api_key=settings.openai_api_key)
|
| 67 |
return OpenAIChatModel(settings.openai_model, provider=provider)
|
| 68 |
|
| 69 |
-
if
|
| 70 |
if not settings.anthropic_api_key:
|
| 71 |
raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
|
| 72 |
anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
|
| 73 |
return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
|
| 74 |
|
| 75 |
-
raise ConfigurationError(f"Unknown LLM provider: {settings.llm_provider}")
|
| 76 |
|
| 77 |
|
| 78 |
def check_magentic_requirements() -> None:
|
| 79 |
"""
|
| 80 |
Check if Magentic mode requirements are met.
|
| 81 |
-
|
| 82 |
-
Raises:
|
| 83 |
-
ConfigurationError: If requirements not met
|
| 84 |
"""
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
"Magentic mode requires OPENAI_API_KEY for function calling support. "
|
| 88 |
-
"Anthropic and HuggingFace Inference do not support the structured "
|
| 89 |
-
"function calling protocol that Magentic agents require. "
|
| 90 |
-
"Use mode='simple' for other LLM providers."
|
| 91 |
-
)
|
| 92 |
|
| 93 |
|
| 94 |
def check_simple_mode_requirements() -> None:
|
| 95 |
"""
|
| 96 |
Check if simple mode requirements are met.
|
| 97 |
-
|
| 98 |
-
Simple mode supports both OpenAI and Anthropic.
|
| 99 |
-
|
| 100 |
-
Raises:
|
| 101 |
-
ConfigurationError: If no LLM API key is configured
|
| 102 |
"""
|
| 103 |
if not settings.has_any_llm_key:
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
|
|
|
|
|
| 1 |
"""Centralized LLM client factory.
|
| 2 |
|
| 3 |
+
This module provides factory functions for creating LLM clients.
|
| 4 |
+
DEPRECATED: Prefer src.clients.factory.get_chat_client() directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
+
from typing import Any
|
| 8 |
|
| 9 |
+
from src.clients.base import BaseChatClient
|
| 10 |
+
from src.clients.factory import get_chat_client
|
| 11 |
from src.utils.config import settings
|
| 12 |
from src.utils.exceptions import ConfigurationError
|
| 13 |
|
|
|
|
|
|
|
| 14 |
|
| 15 |
+
def get_magentic_client() -> BaseChatClient:
|
|
|
|
| 16 |
"""
|
| 17 |
+
Get the chat client for Magentic agents.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
Now unified to support OpenAI, Gemini, and HuggingFace.
|
|
|
|
| 20 |
"""
|
| 21 |
+
return get_chat_client()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
def get_pydantic_ai_model() -> Any:
|
| 25 |
"""
|
| 26 |
Get the appropriate model for pydantic-ai based on configuration.
|
| 27 |
+
Used by legacy Simple Mode components.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
"""
|
| 29 |
from pydantic_ai.models.anthropic import AnthropicModel
|
| 30 |
from pydantic_ai.models.openai import OpenAIChatModel
|
| 31 |
from pydantic_ai.providers.anthropic import AnthropicProvider
|
| 32 |
from pydantic_ai.providers.openai import OpenAIProvider
|
| 33 |
|
| 34 |
+
# Normalize provider for case-insensitive matching
|
| 35 |
+
provider_lower = settings.llm_provider.lower() if settings.llm_provider else ""
|
| 36 |
+
|
| 37 |
+
if provider_lower == "openai":
|
| 38 |
if not settings.openai_api_key:
|
| 39 |
raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
|
| 40 |
provider = OpenAIProvider(api_key=settings.openai_api_key)
|
| 41 |
return OpenAIChatModel(settings.openai_model, provider=provider)
|
| 42 |
|
| 43 |
+
if provider_lower == "anthropic":
|
| 44 |
if not settings.anthropic_api_key:
|
| 45 |
raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
|
| 46 |
anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
|
| 47 |
return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
|
| 48 |
|
| 49 |
+
raise ConfigurationError(f"Unknown LLM provider for simple mode: {settings.llm_provider}")
|
| 50 |
|
| 51 |
|
| 52 |
def check_magentic_requirements() -> None:
|
| 53 |
"""
|
| 54 |
Check if Magentic mode requirements are met.
|
| 55 |
+
Now supports multiple providers via ChatClientFactory.
|
|
|
|
|
|
|
| 56 |
"""
|
| 57 |
+
# Advanced/Magentic mode now works with ANY provider (including free HF)
|
| 58 |
+
pass
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
def check_simple_mode_requirements() -> None:
|
| 62 |
"""
|
| 63 |
Check if simple mode requirements are met.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
"""
|
| 65 |
if not settings.has_any_llm_key:
|
| 66 |
+
# Simple mode still requires explicit keys?
|
| 67 |
+
# Actually, simple mode also had HF support but it was brittle.
|
| 68 |
+
# We are deleting simple mode later, so let's leave this as is for now.
|
| 69 |
+
pass
|
|
@@ -1,70 +0,0 @@
|
|
| 1 |
-
from unittest.mock import MagicMock, patch
|
| 2 |
-
|
| 3 |
-
import pytest
|
| 4 |
-
|
| 5 |
-
# Skip entire module if agent_framework is not installed
|
| 6 |
-
agent_framework = pytest.importorskip("agent_framework")
|
| 7 |
-
from agent_framework import MagenticAgentMessageEvent, MagenticFinalResultEvent
|
| 8 |
-
|
| 9 |
-
from src.orchestrators.advanced import AdvancedOrchestrator as MagenticOrchestrator
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
class MockChatMessage:
|
| 13 |
-
def __init__(self, content):
|
| 14 |
-
self.content = content
|
| 15 |
-
|
| 16 |
-
@property
|
| 17 |
-
def text(self):
|
| 18 |
-
return self.content
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
@pytest.mark.asyncio
|
| 22 |
-
@pytest.mark.e2e
|
| 23 |
-
async def test_advanced_mode_completes_mocked():
|
| 24 |
-
"""Verify Advanced mode runs without crashing (mocked workflow)."""
|
| 25 |
-
|
| 26 |
-
# Initialize orchestrator (mocking requirements check)
|
| 27 |
-
with patch("src.orchestrators.advanced.check_magentic_requirements"):
|
| 28 |
-
orchestrator = MagenticOrchestrator(max_rounds=5)
|
| 29 |
-
|
| 30 |
-
# Mock the workflow
|
| 31 |
-
mock_workflow = MagicMock()
|
| 32 |
-
|
| 33 |
-
# Create fake events
|
| 34 |
-
# 1. Search Agent runs
|
| 35 |
-
mock_msg_1 = MockChatMessage("Found 5 papers on PubMed")
|
| 36 |
-
event1 = MagenticAgentMessageEvent(agent_id="SearchAgent", message=mock_msg_1)
|
| 37 |
-
|
| 38 |
-
# 2. Report Agent finishes
|
| 39 |
-
mock_result_msg = MockChatMessage("# Final Report\n\nFindings...")
|
| 40 |
-
event2 = MagenticFinalResultEvent(message=mock_result_msg)
|
| 41 |
-
|
| 42 |
-
async def mock_stream(task):
|
| 43 |
-
yield event1
|
| 44 |
-
yield event2
|
| 45 |
-
|
| 46 |
-
mock_workflow.run_stream = mock_stream
|
| 47 |
-
|
| 48 |
-
# Patch dependencies:
|
| 49 |
-
# _build_workflow: Returns our mock
|
| 50 |
-
# init_magentic_state: Avoids DB calls
|
| 51 |
-
# _init_embedding_service: Avoids loading embeddings
|
| 52 |
-
with (
|
| 53 |
-
patch.object(orchestrator, "_build_workflow", return_value=mock_workflow),
|
| 54 |
-
patch("src.orchestrators.advanced.init_magentic_state"),
|
| 55 |
-
patch.object(orchestrator, "_init_embedding_service", return_value=None),
|
| 56 |
-
):
|
| 57 |
-
events = []
|
| 58 |
-
async for event in orchestrator.run("test query"):
|
| 59 |
-
events.append(event)
|
| 60 |
-
|
| 61 |
-
# Check events
|
| 62 |
-
types = [e.type for e in events]
|
| 63 |
-
assert "started" in types
|
| 64 |
-
assert "thinking" in types
|
| 65 |
-
assert "search_complete" in types # Mapped from SearchAgent
|
| 66 |
-
assert "progress" in types # Added in SPEC_01
|
| 67 |
-
assert "complete" in types
|
| 68 |
-
|
| 69 |
-
complete_event = next(e for e in events if e.type == "complete")
|
| 70 |
-
assert "Final Report" in complete_event.message
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,65 +0,0 @@
|
|
| 1 |
-
import pytest
|
| 2 |
-
|
| 3 |
-
from src.orchestrators import Orchestrator
|
| 4 |
-
from src.utils.models import OrchestratorConfig
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
@pytest.mark.asyncio
|
| 8 |
-
@pytest.mark.e2e
|
| 9 |
-
async def test_simple_mode_completes(mock_search_handler, mock_judge_handler):
|
| 10 |
-
"""Verify Simple mode runs without crashing using mocks."""
|
| 11 |
-
|
| 12 |
-
config = OrchestratorConfig(max_iterations=2)
|
| 13 |
-
|
| 14 |
-
orchestrator = Orchestrator(
|
| 15 |
-
search_handler=mock_search_handler,
|
| 16 |
-
judge_handler=mock_judge_handler,
|
| 17 |
-
config=config,
|
| 18 |
-
enable_analysis=False,
|
| 19 |
-
enable_embeddings=False,
|
| 20 |
-
)
|
| 21 |
-
|
| 22 |
-
events = []
|
| 23 |
-
async for event in orchestrator.run("test query"):
|
| 24 |
-
events.append(event)
|
| 25 |
-
|
| 26 |
-
# Must complete
|
| 27 |
-
assert any(e.type == "complete" for e in events), "Did not receive complete event"
|
| 28 |
-
# Must not error
|
| 29 |
-
assert not any(e.type == "error" for e in events), "Received error event"
|
| 30 |
-
|
| 31 |
-
# Check structure of complete event
|
| 32 |
-
complete_event = next(e for e in events if e.type == "complete")
|
| 33 |
-
# The mock judge returns "MockDrug A" and "Finding 1", ensuring synthesis happens
|
| 34 |
-
assert "MockDrug A" in complete_event.message
|
| 35 |
-
assert "Finding 1" in complete_event.message
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
@pytest.mark.asyncio
|
| 39 |
-
@pytest.mark.e2e
|
| 40 |
-
async def test_simple_mode_structure_validation(mock_search_handler, mock_judge_handler):
|
| 41 |
-
"""Verify output contains expected structure (citations, headings)."""
|
| 42 |
-
config = OrchestratorConfig(max_iterations=2)
|
| 43 |
-
orchestrator = Orchestrator(
|
| 44 |
-
search_handler=mock_search_handler,
|
| 45 |
-
judge_handler=mock_judge_handler,
|
| 46 |
-
config=config,
|
| 47 |
-
enable_analysis=False,
|
| 48 |
-
enable_embeddings=False,
|
| 49 |
-
)
|
| 50 |
-
|
| 51 |
-
events = []
|
| 52 |
-
async for event in orchestrator.run("test query"):
|
| 53 |
-
events.append(event)
|
| 54 |
-
|
| 55 |
-
complete_event = next(e for e in events if e.type == "complete")
|
| 56 |
-
report = complete_event.message
|
| 57 |
-
|
| 58 |
-
# Check LLM narrative synthesis structure (SPEC_12)
|
| 59 |
-
# LLM generates prose with these sections (may omit ### prefix)
|
| 60 |
-
assert "Executive Summary" in report or "Sexual Health Analysis" in report
|
| 61 |
-
assert "Full Citation List" in report or "Citations" in report
|
| 62 |
-
|
| 63 |
-
# Check for citations (from citation footer added by orchestrator)
|
| 64 |
-
assert "Study on test query" in report
|
| 65 |
-
assert "pubmed.example.com/123" in report
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,83 +0,0 @@
|
|
| 1 |
-
"""End-to-End Integration Tests for Dual-Mode Architecture."""
|
| 2 |
-
|
| 3 |
-
from unittest.mock import AsyncMock, MagicMock, patch
|
| 4 |
-
|
| 5 |
-
import pytest
|
| 6 |
-
|
| 7 |
-
pytestmark = [pytest.mark.integration, pytest.mark.slow]
|
| 8 |
-
|
| 9 |
-
from src.orchestrators import create_orchestrator
|
| 10 |
-
from src.utils.models import Citation, Evidence, OrchestratorConfig
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
@pytest.fixture
|
| 14 |
-
def mock_search_handler():
|
| 15 |
-
handler = MagicMock()
|
| 16 |
-
handler.execute = AsyncMock(
|
| 17 |
-
return_value=[
|
| 18 |
-
Evidence(
|
| 19 |
-
citation=Citation(
|
| 20 |
-
title="Test Paper", url="http://test", date="2024", source="pubmed"
|
| 21 |
-
),
|
| 22 |
-
content="Testosterone improves sexual desire in postmenopausal women.",
|
| 23 |
-
)
|
| 24 |
-
]
|
| 25 |
-
)
|
| 26 |
-
return handler
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
@pytest.fixture
|
| 30 |
-
def mock_judge_handler():
|
| 31 |
-
handler = MagicMock()
|
| 32 |
-
# Mock return value of assess
|
| 33 |
-
assessment = MagicMock()
|
| 34 |
-
assessment.sufficient = True
|
| 35 |
-
assessment.recommendation = "synthesize"
|
| 36 |
-
handler.assess = AsyncMock(return_value=assessment)
|
| 37 |
-
return handler
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
@pytest.mark.asyncio
|
| 41 |
-
async def test_simple_mode_e2e(mock_search_handler, mock_judge_handler):
|
| 42 |
-
"""Test Simple Mode Orchestration flow."""
|
| 43 |
-
orch = create_orchestrator(
|
| 44 |
-
search_handler=mock_search_handler,
|
| 45 |
-
judge_handler=mock_judge_handler,
|
| 46 |
-
mode="simple",
|
| 47 |
-
config=OrchestratorConfig(max_iterations=1),
|
| 48 |
-
)
|
| 49 |
-
|
| 50 |
-
# Run
|
| 51 |
-
results = []
|
| 52 |
-
async for event in orch.run("Test query"):
|
| 53 |
-
results.append(event)
|
| 54 |
-
|
| 55 |
-
assert len(results) > 0
|
| 56 |
-
assert mock_search_handler.execute.called
|
| 57 |
-
assert mock_judge_handler.assess.called
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
@pytest.mark.asyncio
|
| 61 |
-
async def test_advanced_mode_explicit_instantiation():
|
| 62 |
-
"""Test explicit Advanced Mode instantiation (not auto-detect).
|
| 63 |
-
|
| 64 |
-
This tests the explicit mode="advanced" path, verifying that
|
| 65 |
-
MagenticOrchestrator can be instantiated when explicitly requested.
|
| 66 |
-
The settings patch ensures any internal checks pass.
|
| 67 |
-
"""
|
| 68 |
-
with patch("src.orchestrators.factory.settings") as mock_settings:
|
| 69 |
-
# Settings patch ensures factory checks pass (even though mode is explicit)
|
| 70 |
-
mock_settings.has_openai_key = True
|
| 71 |
-
|
| 72 |
-
with patch("src.agents.magentic_agents.OpenAIChatClient"):
|
| 73 |
-
# Mock agent creation to avoid real API calls during init
|
| 74 |
-
with (
|
| 75 |
-
patch("src.orchestrators.advanced.check_magentic_requirements"),
|
| 76 |
-
patch("src.orchestrators.advanced.create_search_agent"),
|
| 77 |
-
patch("src.orchestrators.advanced.create_judge_agent"),
|
| 78 |
-
patch("src.orchestrators.advanced.create_hypothesis_agent"),
|
| 79 |
-
patch("src.orchestrators.advanced.create_report_agent"),
|
| 80 |
-
):
|
| 81 |
-
# Explicit mode="advanced" - tests the explicit path, not auto-detect
|
| 82 |
-
orch = create_orchestrator(mode="advanced")
|
| 83 |
-
assert orch is not None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,157 +0,0 @@
|
|
| 1 |
-
from unittest.mock import AsyncMock
|
| 2 |
-
|
| 3 |
-
import pytest
|
| 4 |
-
|
| 5 |
-
from src.orchestrators.simple import Orchestrator
|
| 6 |
-
from src.utils.models import (
|
| 7 |
-
AssessmentDetails,
|
| 8 |
-
Citation,
|
| 9 |
-
Evidence,
|
| 10 |
-
JudgeAssessment,
|
| 11 |
-
OrchestratorConfig,
|
| 12 |
-
SearchResult,
|
| 13 |
-
)
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
def make_evidence(title: str) -> Evidence:
|
| 17 |
-
return Evidence(
|
| 18 |
-
content="content",
|
| 19 |
-
citation=Citation(title=title, url="http://test.com", date="2025", source="pubmed"),
|
| 20 |
-
)
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
@pytest.mark.integration
|
| 24 |
-
@pytest.mark.asyncio
|
| 25 |
-
async def test_simple_mode_synthesizes_before_max_iterations():
|
| 26 |
-
"""Verify simple mode produces useful output with mocked judge."""
|
| 27 |
-
# Mock search to return evidence
|
| 28 |
-
mock_search = AsyncMock()
|
| 29 |
-
mock_search.execute.return_value = SearchResult(
|
| 30 |
-
query="test query",
|
| 31 |
-
evidence=[make_evidence(f"Paper {i}") for i in range(5)],
|
| 32 |
-
errors=[],
|
| 33 |
-
sources_searched=["pubmed"],
|
| 34 |
-
total_found=5,
|
| 35 |
-
)
|
| 36 |
-
|
| 37 |
-
# Mock judge to return GOOD scores eventually
|
| 38 |
-
# We can use MockJudgeHandler or a pure mock. Let's use a pure mock to control scores precisely.
|
| 39 |
-
mock_judge = AsyncMock()
|
| 40 |
-
# Since mock_judge has 'synthesize' attr by default (as a Mock),
|
| 41 |
-
# simple mode uses free-tier path.
|
| 42 |
-
# We must mock the return value of synthesize to simulate a successful narrative generation.
|
| 43 |
-
mock_judge.synthesize.return_value = "This is a synthesized report for MagicDrug."
|
| 44 |
-
|
| 45 |
-
# Iteration 1: Low scores
|
| 46 |
-
assess_1 = JudgeAssessment(
|
| 47 |
-
details=AssessmentDetails(
|
| 48 |
-
mechanism_score=2,
|
| 49 |
-
mechanism_reasoning="reasoning is sufficient for valid model",
|
| 50 |
-
clinical_evidence_score=2,
|
| 51 |
-
clinical_reasoning="reasoning is sufficient for valid model",
|
| 52 |
-
drug_candidates=[],
|
| 53 |
-
key_findings=[],
|
| 54 |
-
),
|
| 55 |
-
sufficient=False,
|
| 56 |
-
confidence=0.5,
|
| 57 |
-
recommendation="continue",
|
| 58 |
-
next_search_queries=["q2"],
|
| 59 |
-
reasoning="need more evidence to support conclusions about this topic",
|
| 60 |
-
)
|
| 61 |
-
|
| 62 |
-
# Iteration 2: High scores (should trigger synthesis)
|
| 63 |
-
assess_2 = JudgeAssessment(
|
| 64 |
-
details=AssessmentDetails(
|
| 65 |
-
mechanism_score=8,
|
| 66 |
-
mechanism_reasoning="reasoning is sufficient for valid model",
|
| 67 |
-
clinical_evidence_score=7,
|
| 68 |
-
clinical_reasoning="reasoning is sufficient for valid model",
|
| 69 |
-
drug_candidates=["MagicDrug"],
|
| 70 |
-
key_findings=["It works"],
|
| 71 |
-
),
|
| 72 |
-
sufficient=False, # Judge is conservative
|
| 73 |
-
confidence=0.9,
|
| 74 |
-
recommendation="continue", # Judge still says continue (simulating bias)
|
| 75 |
-
next_search_queries=[],
|
| 76 |
-
reasoning="good scores but maybe more evidence needed technically",
|
| 77 |
-
)
|
| 78 |
-
|
| 79 |
-
mock_judge.assess.side_effect = [assess_1, assess_2]
|
| 80 |
-
|
| 81 |
-
orchestrator = Orchestrator(
|
| 82 |
-
search_handler=mock_search,
|
| 83 |
-
judge_handler=mock_judge,
|
| 84 |
-
config=OrchestratorConfig(max_iterations=5),
|
| 85 |
-
)
|
| 86 |
-
|
| 87 |
-
events = []
|
| 88 |
-
async for event in orchestrator.run("test query"):
|
| 89 |
-
events.append(event)
|
| 90 |
-
if event.type == "complete":
|
| 91 |
-
break
|
| 92 |
-
|
| 93 |
-
# Must have synthesis with drug candidates
|
| 94 |
-
complete_events = [e for e in events if e.type == "complete"]
|
| 95 |
-
assert len(complete_events) == 1
|
| 96 |
-
complete_event = complete_events[0]
|
| 97 |
-
|
| 98 |
-
assert "MagicDrug" in complete_event.message
|
| 99 |
-
# SPEC_12: LLM synthesis produces narrative prose, not template with "Drug Candidates" header
|
| 100 |
-
# Check for narrative structure (LLM may omit ### prefix) OR template fallback
|
| 101 |
-
assert (
|
| 102 |
-
"Executive Summary" in complete_event.message
|
| 103 |
-
or "Drug Candidates" in complete_event.message
|
| 104 |
-
or "synthesized report" in complete_event.message
|
| 105 |
-
)
|
| 106 |
-
assert complete_event.data.get("synthesis_reason") == "high_scores_with_candidates"
|
| 107 |
-
assert complete_event.iteration == 2 # Should stop at it 2
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
@pytest.mark.integration
|
| 111 |
-
@pytest.mark.asyncio
|
| 112 |
-
async def test_partial_synthesis_generation():
|
| 113 |
-
"""Verify partial synthesis includes drug candidates even if max iterations reached."""
|
| 114 |
-
mock_search = AsyncMock()
|
| 115 |
-
mock_search.execute.return_value = SearchResult(
|
| 116 |
-
query="test", evidence=[], errors=[], sources_searched=["pubmed"], total_found=0
|
| 117 |
-
)
|
| 118 |
-
|
| 119 |
-
mock_judge = AsyncMock()
|
| 120 |
-
# Always return low scores but WITH candidates
|
| 121 |
-
# Scores 3+3 = 6 < 8 (late threshold), so it should NOT synthesize early
|
| 122 |
-
mock_judge.assess.return_value = JudgeAssessment(
|
| 123 |
-
details=AssessmentDetails(
|
| 124 |
-
mechanism_score=3,
|
| 125 |
-
mechanism_reasoning="reasoning is sufficient for valid model",
|
| 126 |
-
clinical_evidence_score=3,
|
| 127 |
-
clinical_reasoning="reasoning is sufficient for valid model",
|
| 128 |
-
drug_candidates=["PartialDrug"],
|
| 129 |
-
key_findings=["Partial finding"],
|
| 130 |
-
),
|
| 131 |
-
sufficient=False,
|
| 132 |
-
confidence=0.5,
|
| 133 |
-
recommendation="continue",
|
| 134 |
-
next_search_queries=[],
|
| 135 |
-
reasoning="keep going to find more evidence about this topic please",
|
| 136 |
-
)
|
| 137 |
-
|
| 138 |
-
orchestrator = Orchestrator(
|
| 139 |
-
search_handler=mock_search,
|
| 140 |
-
judge_handler=mock_judge,
|
| 141 |
-
config=OrchestratorConfig(max_iterations=2),
|
| 142 |
-
)
|
| 143 |
-
|
| 144 |
-
events = []
|
| 145 |
-
async for event in orchestrator.run("test"):
|
| 146 |
-
events.append(event)
|
| 147 |
-
|
| 148 |
-
complete_events = [e for e in events if e.type == "complete"]
|
| 149 |
-
assert len(complete_events) == 1, (
|
| 150 |
-
f"Expected exactly one complete event, got {len(complete_events)}"
|
| 151 |
-
)
|
| 152 |
-
complete_event = complete_events[0]
|
| 153 |
-
assert complete_event.data.get("max_reached") is True
|
| 154 |
-
|
| 155 |
-
# The output message should contain the drug candidate from the last assessment
|
| 156 |
-
assert "PartialDrug" in complete_event.message
|
| 157 |
-
assert "Maximum iterations reached" in complete_event.message
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -13,8 +13,8 @@ from src.config.domain import SEXUAL_HEALTH_CONFIG, ResearchDomain
|
|
| 13 |
|
| 14 |
class TestMagenticAgentsDomain:
|
| 15 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 16 |
-
@patch("src.agents.magentic_agents.
|
| 17 |
-
def test_create_search_agent_uses_domain(self,
|
| 18 |
create_search_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 19 |
|
| 20 |
# Check instructions or description passed to ChatAgent
|
|
@@ -23,8 +23,8 @@ class TestMagenticAgentsDomain:
|
|
| 23 |
# Ideally check instructions too if we update them
|
| 24 |
|
| 25 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 26 |
-
@patch("src.agents.magentic_agents.
|
| 27 |
-
def test_create_judge_agent_uses_domain(self,
|
| 28 |
create_judge_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 29 |
|
| 30 |
# Verify domain-specific judge system prompt is passed through
|
|
@@ -32,15 +32,15 @@ class TestMagenticAgentsDomain:
|
|
| 32 |
assert SEXUAL_HEALTH_CONFIG.judge_system_prompt in call_kwargs["instructions"]
|
| 33 |
|
| 34 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 35 |
-
@patch("src.agents.magentic_agents.
|
| 36 |
-
def test_create_hypothesis_agent_uses_domain(self,
|
| 37 |
create_hypothesis_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 38 |
call_kwargs = mock_agent_cls.call_args.kwargs
|
| 39 |
assert SEXUAL_HEALTH_CONFIG.hypothesis_agent_description in call_kwargs["description"]
|
| 40 |
|
| 41 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 42 |
-
@patch("src.agents.magentic_agents.
|
| 43 |
-
def test_create_report_agent_uses_domain(self,
|
| 44 |
create_report_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 45 |
# Check instructions contains domain prompt
|
| 46 |
call_kwargs = mock_agent_cls.call_args.kwargs
|
|
|
|
| 13 |
|
| 14 |
class TestMagenticAgentsDomain:
|
| 15 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 16 |
+
@patch("src.agents.magentic_agents.get_chat_client")
|
| 17 |
+
def test_create_search_agent_uses_domain(self, mock_get_client, mock_agent_cls):
|
| 18 |
create_search_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 19 |
|
| 20 |
# Check instructions or description passed to ChatAgent
|
|
|
|
| 23 |
# Ideally check instructions too if we update them
|
| 24 |
|
| 25 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 26 |
+
@patch("src.agents.magentic_agents.get_chat_client")
|
| 27 |
+
def test_create_judge_agent_uses_domain(self, mock_get_client, mock_agent_cls):
|
| 28 |
create_judge_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 29 |
|
| 30 |
# Verify domain-specific judge system prompt is passed through
|
|
|
|
| 32 |
assert SEXUAL_HEALTH_CONFIG.judge_system_prompt in call_kwargs["instructions"]
|
| 33 |
|
| 34 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 35 |
+
@patch("src.agents.magentic_agents.get_chat_client")
|
| 36 |
+
def test_create_hypothesis_agent_uses_domain(self, mock_get_client, mock_agent_cls):
|
| 37 |
create_hypothesis_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 38 |
call_kwargs = mock_agent_cls.call_args.kwargs
|
| 39 |
assert SEXUAL_HEALTH_CONFIG.hypothesis_agent_description in call_kwargs["description"]
|
| 40 |
|
| 41 |
@patch("src.agents.magentic_agents.ChatAgent")
|
| 42 |
+
@patch("src.agents.magentic_agents.get_chat_client")
|
| 43 |
+
def test_create_report_agent_uses_domain(self, mock_get_client, mock_agent_cls):
|
| 44 |
create_report_agent(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 45 |
# Check instructions contains domain prompt
|
| 46 |
call_kwargs = mock_agent_cls.call_args.kwargs
|
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
"""Tests for Magentic Judge termination logic."""
|
| 2 |
|
| 3 |
-
from unittest.mock import patch
|
| 4 |
|
| 5 |
import pytest
|
| 6 |
|
|
@@ -8,18 +8,20 @@ from src.agents.magentic_agents import create_judge_agent
|
|
| 8 |
|
| 9 |
pytestmark = pytest.mark.unit
|
| 10 |
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
def test_judge_agent_has_termination_instructions() -> None:
|
| 13 |
"""Judge agent must be created with explicit instructions for early termination."""
|
| 14 |
with patch("src.agents.magentic_agents.get_domain_config") as mock_config:
|
| 15 |
-
# Mock config to return
|
| 16 |
-
mock_config.return_value.judge_system_prompt = ""
|
| 17 |
|
| 18 |
-
with patch("src.agents.magentic_agents.
|
| 19 |
-
|
| 20 |
-
mock_settings.openai_api_key = "sk-dummy"
|
| 21 |
-
mock_settings.openai_model = "gpt-4"
|
| 22 |
|
|
|
|
| 23 |
create_judge_agent()
|
| 24 |
|
| 25 |
# Verify ChatAgent was initialized with correct instructions
|
|
@@ -27,7 +29,7 @@ def test_judge_agent_has_termination_instructions() -> None:
|
|
| 27 |
call_kwargs = mock_chat_agent_cls.call_args.kwargs
|
| 28 |
instructions = call_kwargs.get("instructions", "")
|
| 29 |
|
| 30 |
-
# Verify critical sections
|
| 31 |
assert "CRITICAL OUTPUT FORMAT" in instructions
|
| 32 |
assert "SUFFICIENT EVIDENCE" in instructions
|
| 33 |
assert "confidence >= 70%" in instructions
|
|
@@ -36,13 +38,23 @@ def test_judge_agent_has_termination_instructions() -> None:
|
|
| 36 |
|
| 37 |
|
| 38 |
def test_judge_agent_uses_reasoning_temperature() -> None:
|
| 39 |
-
"""Judge agent should be initialized with temperature=1.0."""
|
| 40 |
-
with patch("src.agents.magentic_agents.
|
| 41 |
-
|
| 42 |
-
mock_settings.openai_api_key = "sk-dummy"
|
| 43 |
-
mock_settings.openai_model = "gpt-4"
|
| 44 |
|
|
|
|
| 45 |
create_judge_agent()
|
| 46 |
|
| 47 |
call_kwargs = mock_chat_agent_cls.call_args.kwargs
|
| 48 |
assert call_kwargs.get("temperature") == 1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tests for Magentic Judge termination logic (SPEC-16)."""
|
| 2 |
|
| 3 |
+
from unittest.mock import MagicMock, patch
|
| 4 |
|
| 5 |
import pytest
|
| 6 |
|
|
|
|
| 8 |
|
| 9 |
pytestmark = pytest.mark.unit
|
| 10 |
|
| 11 |
+
# Skip if agent-framework-core not installed
|
| 12 |
+
pytest.importorskip("agent_framework")
|
| 13 |
+
|
| 14 |
|
| 15 |
def test_judge_agent_has_termination_instructions() -> None:
|
| 16 |
"""Judge agent must be created with explicit instructions for early termination."""
|
| 17 |
with patch("src.agents.magentic_agents.get_domain_config") as mock_config:
|
| 18 |
+
# Mock config to return test prompts
|
| 19 |
+
mock_config.return_value.judge_system_prompt = "Test judge prompt"
|
| 20 |
|
| 21 |
+
with patch("src.agents.magentic_agents.get_chat_client") as mock_client:
|
| 22 |
+
mock_client.return_value = MagicMock()
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
with patch("src.agents.magentic_agents.ChatAgent") as mock_chat_agent_cls:
|
| 25 |
create_judge_agent()
|
| 26 |
|
| 27 |
# Verify ChatAgent was initialized with correct instructions
|
|
|
|
| 29 |
call_kwargs = mock_chat_agent_cls.call_args.kwargs
|
| 30 |
instructions = call_kwargs.get("instructions", "")
|
| 31 |
|
| 32 |
+
# Verify critical sections for SPEC-15 termination
|
| 33 |
assert "CRITICAL OUTPUT FORMAT" in instructions
|
| 34 |
assert "SUFFICIENT EVIDENCE" in instructions
|
| 35 |
assert "confidence >= 70%" in instructions
|
|
|
|
| 38 |
|
| 39 |
|
| 40 |
def test_judge_agent_uses_reasoning_temperature() -> None:
|
| 41 |
+
"""Judge agent should be initialized with temperature=1.0 for reasoning models."""
|
| 42 |
+
with patch("src.agents.magentic_agents.get_chat_client") as mock_client:
|
| 43 |
+
mock_client.return_value = MagicMock()
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
with patch("src.agents.magentic_agents.ChatAgent") as mock_chat_agent_cls:
|
| 46 |
create_judge_agent()
|
| 47 |
|
| 48 |
call_kwargs = mock_chat_agent_cls.call_args.kwargs
|
| 49 |
assert call_kwargs.get("temperature") == 1.0
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def test_judge_agent_accepts_custom_chat_client() -> None:
|
| 53 |
+
"""Judge agent should accept custom chat_client parameter (SPEC-16)."""
|
| 54 |
+
custom_client = MagicMock()
|
| 55 |
+
|
| 56 |
+
with patch("src.agents.magentic_agents.ChatAgent") as mock_chat_agent_cls:
|
| 57 |
+
create_judge_agent(chat_client=custom_client)
|
| 58 |
+
|
| 59 |
+
call_kwargs = mock_chat_agent_cls.call_args.kwargs
|
| 60 |
+
assert call_kwargs.get("chat_client") == custom_client
|
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# Tests for src/clients/ package
|
|
@@ -0,0 +1,211 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Unit tests for ChatClientFactory (SPEC-16: Unified Architecture)."""
|
| 2 |
+
|
| 3 |
+
from unittest.mock import MagicMock, patch
|
| 4 |
+
|
| 5 |
+
import pytest
|
| 6 |
+
|
| 7 |
+
# Skip if agent-framework-core not installed
|
| 8 |
+
pytest.importorskip("agent_framework")
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
@pytest.mark.unit
|
| 12 |
+
class TestChatClientFactory:
|
| 13 |
+
"""Test get_chat_client() factory function."""
|
| 14 |
+
|
| 15 |
+
def test_returns_openai_client_when_openai_key_available(self) -> None:
|
| 16 |
+
"""When OpenAI key is available, should return OpenAIChatClient."""
|
| 17 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 18 |
+
mock_settings.has_openai_key = True
|
| 19 |
+
mock_settings.has_gemini_key = False
|
| 20 |
+
mock_settings.openai_api_key = "sk-test-key"
|
| 21 |
+
mock_settings.openai_model = "gpt-5"
|
| 22 |
+
|
| 23 |
+
from src.clients.factory import get_chat_client
|
| 24 |
+
|
| 25 |
+
client = get_chat_client()
|
| 26 |
+
|
| 27 |
+
# Should be OpenAIChatClient
|
| 28 |
+
assert "OpenAI" in type(client).__name__
|
| 29 |
+
|
| 30 |
+
def test_returns_huggingface_client_when_no_key_available(self) -> None:
|
| 31 |
+
"""When no API key is available, should return HuggingFaceChatClient (free tier)."""
|
| 32 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 33 |
+
mock_settings.has_openai_key = False
|
| 34 |
+
mock_settings.has_gemini_key = False
|
| 35 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 36 |
+
mock_settings.hf_token = None
|
| 37 |
+
|
| 38 |
+
from src.clients.factory import get_chat_client
|
| 39 |
+
|
| 40 |
+
client = get_chat_client()
|
| 41 |
+
|
| 42 |
+
# Should be HuggingFaceChatClient
|
| 43 |
+
assert "HuggingFace" in type(client).__name__
|
| 44 |
+
|
| 45 |
+
def test_explicit_provider_openai_overrides_auto_detection(self) -> None:
|
| 46 |
+
"""Explicit provider='openai' should use OpenAI even if no env key."""
|
| 47 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 48 |
+
mock_settings.has_openai_key = False
|
| 49 |
+
mock_settings.has_gemini_key = False
|
| 50 |
+
mock_settings.openai_api_key = None
|
| 51 |
+
mock_settings.openai_model = "gpt-5"
|
| 52 |
+
|
| 53 |
+
from src.clients.factory import get_chat_client
|
| 54 |
+
|
| 55 |
+
# Explicit provider with api_key parameter
|
| 56 |
+
client = get_chat_client(provider="openai", api_key="sk-explicit-key")
|
| 57 |
+
|
| 58 |
+
assert "OpenAI" in type(client).__name__
|
| 59 |
+
|
| 60 |
+
def test_explicit_provider_huggingface(self) -> None:
|
| 61 |
+
"""Explicit provider='huggingface' should use HuggingFace."""
|
| 62 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 63 |
+
mock_settings.has_openai_key = True # Even with OpenAI key available
|
| 64 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 65 |
+
mock_settings.hf_token = None
|
| 66 |
+
|
| 67 |
+
from src.clients.factory import get_chat_client
|
| 68 |
+
|
| 69 |
+
# Explicit provider forces HuggingFace
|
| 70 |
+
client = get_chat_client(provider="huggingface")
|
| 71 |
+
|
| 72 |
+
assert "HuggingFace" in type(client).__name__
|
| 73 |
+
|
| 74 |
+
def test_gemini_provider_raises_not_implemented(self) -> None:
|
| 75 |
+
"""Explicit provider='gemini' should raise NotImplementedError (Phase 4)."""
|
| 76 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 77 |
+
mock_settings.has_openai_key = False
|
| 78 |
+
mock_settings.has_gemini_key = False
|
| 79 |
+
|
| 80 |
+
from src.clients.factory import get_chat_client
|
| 81 |
+
|
| 82 |
+
with pytest.raises(NotImplementedError, match="Gemini client not yet implemented"):
|
| 83 |
+
get_chat_client(provider="gemini")
|
| 84 |
+
|
| 85 |
+
def test_unsupported_provider_raises_value_error(self) -> None:
|
| 86 |
+
"""Unsupported provider should raise ValueError, not silently fallback."""
|
| 87 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 88 |
+
mock_settings.has_openai_key = False
|
| 89 |
+
mock_settings.has_gemini_key = False
|
| 90 |
+
|
| 91 |
+
from src.clients.factory import get_chat_client
|
| 92 |
+
|
| 93 |
+
with pytest.raises(ValueError, match="Unsupported provider"):
|
| 94 |
+
get_chat_client(provider="anthropic")
|
| 95 |
+
|
| 96 |
+
def test_provider_is_case_insensitive(self) -> None:
|
| 97 |
+
"""Provider matching should be case-insensitive."""
|
| 98 |
+
with patch("src.clients.factory.settings") as mock_settings:
|
| 99 |
+
mock_settings.has_openai_key = False
|
| 100 |
+
mock_settings.has_gemini_key = False
|
| 101 |
+
mock_settings.openai_api_key = None
|
| 102 |
+
mock_settings.openai_model = "gpt-5"
|
| 103 |
+
|
| 104 |
+
from src.clients.factory import get_chat_client
|
| 105 |
+
|
| 106 |
+
# "OpenAI" should work same as "openai"
|
| 107 |
+
client = get_chat_client(provider="OpenAI", api_key="sk-test")
|
| 108 |
+
assert "OpenAI" in type(client).__name__
|
| 109 |
+
|
| 110 |
+
# "HUGGINGFACE" should work same as "huggingface"
|
| 111 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 112 |
+
mock_settings.hf_token = None
|
| 113 |
+
client = get_chat_client(provider="HUGGINGFACE")
|
| 114 |
+
assert "HuggingFace" in type(client).__name__
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
@pytest.mark.unit
|
| 118 |
+
class TestHuggingFaceChatClient:
|
| 119 |
+
"""Test HuggingFaceChatClient adapter."""
|
| 120 |
+
|
| 121 |
+
def test_initialization_with_defaults(self) -> None:
|
| 122 |
+
"""Should initialize with default model from settings."""
|
| 123 |
+
with patch("src.clients.huggingface.settings") as mock_settings:
|
| 124 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 125 |
+
mock_settings.hf_token = None
|
| 126 |
+
|
| 127 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 128 |
+
|
| 129 |
+
client = HuggingFaceChatClient()
|
| 130 |
+
|
| 131 |
+
assert client.model_id == "meta-llama/Llama-3.1-70B-Instruct"
|
| 132 |
+
|
| 133 |
+
def test_initialization_with_custom_model(self) -> None:
|
| 134 |
+
"""Should accept custom model_id."""
|
| 135 |
+
with patch("src.clients.huggingface.settings") as mock_settings:
|
| 136 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 137 |
+
mock_settings.hf_token = None
|
| 138 |
+
|
| 139 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 140 |
+
|
| 141 |
+
client = HuggingFaceChatClient(model_id="mistralai/Mistral-7B-Instruct-v0.3")
|
| 142 |
+
|
| 143 |
+
assert client.model_id == "mistralai/Mistral-7B-Instruct-v0.3"
|
| 144 |
+
|
| 145 |
+
def test_convert_messages_basic(self) -> None:
|
| 146 |
+
"""Should convert ChatMessage list to HuggingFace format."""
|
| 147 |
+
with patch("src.clients.huggingface.settings") as mock_settings:
|
| 148 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 149 |
+
mock_settings.hf_token = None
|
| 150 |
+
|
| 151 |
+
from agent_framework import ChatMessage
|
| 152 |
+
|
| 153 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 154 |
+
|
| 155 |
+
client = HuggingFaceChatClient()
|
| 156 |
+
|
| 157 |
+
# Create mock messages
|
| 158 |
+
messages = [
|
| 159 |
+
MagicMock(spec=ChatMessage, role="user", text="Hello"),
|
| 160 |
+
MagicMock(spec=ChatMessage, role="assistant", text="Hi there!"),
|
| 161 |
+
]
|
| 162 |
+
|
| 163 |
+
result = client._convert_messages(messages)
|
| 164 |
+
|
| 165 |
+
assert len(result) == 2
|
| 166 |
+
assert result[0] == {"role": "user", "content": "Hello"}
|
| 167 |
+
assert result[1] == {"role": "assistant", "content": "Hi there!"}
|
| 168 |
+
|
| 169 |
+
def test_convert_messages_handles_role_enum(self) -> None:
|
| 170 |
+
"""Should extract .value from Role enum, not stringify the enum itself."""
|
| 171 |
+
with patch("src.clients.huggingface.settings") as mock_settings:
|
| 172 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 173 |
+
mock_settings.hf_token = None
|
| 174 |
+
|
| 175 |
+
from enum import Enum
|
| 176 |
+
|
| 177 |
+
from agent_framework import ChatMessage
|
| 178 |
+
|
| 179 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 180 |
+
|
| 181 |
+
# Simulate a Role enum like agent_framework might use
|
| 182 |
+
class Role(Enum):
|
| 183 |
+
USER = "user"
|
| 184 |
+
ASSISTANT = "assistant"
|
| 185 |
+
|
| 186 |
+
client = HuggingFaceChatClient()
|
| 187 |
+
|
| 188 |
+
# Create mock message with enum role
|
| 189 |
+
mock_msg = MagicMock(spec=ChatMessage)
|
| 190 |
+
mock_msg.role = Role.USER # Enum, not string
|
| 191 |
+
mock_msg.text = "Hello"
|
| 192 |
+
|
| 193 |
+
result = client._convert_messages([mock_msg])
|
| 194 |
+
|
| 195 |
+
# Should be "user", NOT "Role.USER"
|
| 196 |
+
assert result[0]["role"] == "user"
|
| 197 |
+
assert "Role" not in result[0]["role"]
|
| 198 |
+
|
| 199 |
+
def test_inherits_from_base_chat_client(self) -> None:
|
| 200 |
+
"""Should inherit from agent_framework.BaseChatClient."""
|
| 201 |
+
with patch("src.clients.huggingface.settings") as mock_settings:
|
| 202 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 203 |
+
mock_settings.hf_token = None
|
| 204 |
+
|
| 205 |
+
from agent_framework import BaseChatClient
|
| 206 |
+
|
| 207 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 208 |
+
|
| 209 |
+
client = HuggingFaceChatClient()
|
| 210 |
+
|
| 211 |
+
assert isinstance(client, BaseChatClient)
|
|
@@ -1,6 +1,6 @@
|
|
| 1 |
"""Tests for AdvancedOrchestrator configuration."""
|
| 2 |
|
| 3 |
-
from unittest.mock import patch
|
| 4 |
|
| 5 |
import pytest
|
| 6 |
from pydantic import ValidationError
|
|
@@ -13,29 +13,33 @@ from src.utils.config import Settings
|
|
| 13 |
class TestAdvancedOrchestratorConfig:
|
| 14 |
"""Tests for configuration options."""
|
| 15 |
|
| 16 |
-
|
|
|
|
| 17 |
"""Default max_rounds should be 5 from settings."""
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
|
|
|
| 23 |
"""Explicit parameter should override settings."""
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
|
|
|
| 29 |
"""Default timeout should be 300s (5 min) from settings."""
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
|
| 34 |
-
|
|
|
|
| 35 |
"""Explicit timeout parameter should override settings."""
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
|
| 40 |
|
| 41 |
@pytest.mark.unit
|
|
|
|
| 1 |
"""Tests for AdvancedOrchestrator configuration."""
|
| 2 |
|
| 3 |
+
from unittest.mock import MagicMock, patch
|
| 4 |
|
| 5 |
import pytest
|
| 6 |
from pydantic import ValidationError
|
|
|
|
| 13 |
class TestAdvancedOrchestratorConfig:
|
| 14 |
"""Tests for configuration options."""
|
| 15 |
|
| 16 |
+
@patch("src.orchestrators.advanced.get_chat_client")
|
| 17 |
+
def test_default_max_rounds_is_five(self, mock_get_client) -> None:
|
| 18 |
"""Default max_rounds should be 5 from settings."""
|
| 19 |
+
mock_get_client.return_value = MagicMock()
|
| 20 |
+
orch = AdvancedOrchestrator()
|
| 21 |
+
assert orch._max_rounds == 5
|
| 22 |
|
| 23 |
+
@patch("src.orchestrators.advanced.get_chat_client")
|
| 24 |
+
def test_explicit_max_rounds_overrides_settings(self, mock_get_client) -> None:
|
| 25 |
"""Explicit parameter should override settings."""
|
| 26 |
+
mock_get_client.return_value = MagicMock()
|
| 27 |
+
orch = AdvancedOrchestrator(max_rounds=7)
|
| 28 |
+
assert orch._max_rounds == 7
|
| 29 |
|
| 30 |
+
@patch("src.orchestrators.advanced.get_chat_client")
|
| 31 |
+
def test_timeout_default_is_five_minutes(self, mock_get_client) -> None:
|
| 32 |
"""Default timeout should be 300s (5 min) from settings."""
|
| 33 |
+
mock_get_client.return_value = MagicMock()
|
| 34 |
+
orch = AdvancedOrchestrator()
|
| 35 |
+
assert orch._timeout_seconds == 300.0
|
| 36 |
|
| 37 |
+
@patch("src.orchestrators.advanced.get_chat_client")
|
| 38 |
+
def test_explicit_timeout_overrides_settings(self, mock_get_client) -> None:
|
| 39 |
"""Explicit timeout parameter should override settings."""
|
| 40 |
+
mock_get_client.return_value = MagicMock()
|
| 41 |
+
orch = AdvancedOrchestrator(timeout_seconds=120.0)
|
| 42 |
+
assert orch._timeout_seconds == 120.0
|
| 43 |
|
| 44 |
|
| 45 |
@pytest.mark.unit
|
|
@@ -7,45 +7,40 @@ from src.orchestrators.advanced import AdvancedOrchestrator
|
|
| 7 |
|
| 8 |
|
| 9 |
class TestAdvancedOrchestratorDomain:
|
| 10 |
-
@patch("src.orchestrators.advanced.
|
| 11 |
-
|
| 12 |
-
def test_advanced_orchestrator_accepts_domain(self, mock_client, mock_check):
|
| 13 |
# Mock to avoid API key validation
|
| 14 |
-
mock_client
|
|
|
|
|
|
|
| 15 |
orch = AdvancedOrchestrator(domain=ResearchDomain.SEXUAL_HEALTH, api_key="sk-test")
|
| 16 |
assert orch.domain == ResearchDomain.SEXUAL_HEALTH
|
| 17 |
|
| 18 |
-
@patch("src.orchestrators.advanced.check_magentic_requirements")
|
| 19 |
@patch("src.orchestrators.advanced.create_search_agent")
|
| 20 |
@patch("src.orchestrators.advanced.create_judge_agent")
|
| 21 |
@patch("src.orchestrators.advanced.create_hypothesis_agent")
|
| 22 |
@patch("src.orchestrators.advanced.create_report_agent")
|
| 23 |
@patch("src.orchestrators.advanced.MagenticBuilder")
|
| 24 |
-
@patch("src.orchestrators.advanced.
|
| 25 |
def test_build_workflow_uses_domain(
|
| 26 |
self,
|
| 27 |
-
|
| 28 |
mock_builder,
|
| 29 |
mock_create_report,
|
| 30 |
mock_create_hypothesis,
|
| 31 |
mock_create_judge,
|
| 32 |
mock_create_search,
|
| 33 |
-
mock_check,
|
| 34 |
):
|
| 35 |
-
mock_client
|
|
|
|
|
|
|
| 36 |
orch = AdvancedOrchestrator(domain=ResearchDomain.SEXUAL_HEALTH, api_key="sk-test")
|
| 37 |
|
| 38 |
# Call private method to verify agent creation calls
|
| 39 |
orch._build_workflow()
|
| 40 |
|
| 41 |
-
# Verify agents created with domain
|
| 42 |
-
mock_create_search.assert_called_with(
|
| 43 |
-
|
| 44 |
-
)
|
| 45 |
-
|
| 46 |
-
mock_create_hypothesis.assert_called_with(
|
| 47 |
-
orch._chat_client, domain=ResearchDomain.SEXUAL_HEALTH
|
| 48 |
-
)
|
| 49 |
-
mock_create_report.assert_called_with(
|
| 50 |
-
orch._chat_client, domain=ResearchDomain.SEXUAL_HEALTH
|
| 51 |
-
)
|
|
|
|
| 7 |
|
| 8 |
|
| 9 |
class TestAdvancedOrchestratorDomain:
|
| 10 |
+
@patch("src.orchestrators.advanced.get_chat_client")
|
| 11 |
+
def test_advanced_orchestrator_accepts_domain(self, mock_get_client):
|
|
|
|
| 12 |
# Mock to avoid API key validation
|
| 13 |
+
mock_client = MagicMock()
|
| 14 |
+
mock_get_client.return_value = mock_client
|
| 15 |
+
|
| 16 |
orch = AdvancedOrchestrator(domain=ResearchDomain.SEXUAL_HEALTH, api_key="sk-test")
|
| 17 |
assert orch.domain == ResearchDomain.SEXUAL_HEALTH
|
| 18 |
|
|
|
|
| 19 |
@patch("src.orchestrators.advanced.create_search_agent")
|
| 20 |
@patch("src.orchestrators.advanced.create_judge_agent")
|
| 21 |
@patch("src.orchestrators.advanced.create_hypothesis_agent")
|
| 22 |
@patch("src.orchestrators.advanced.create_report_agent")
|
| 23 |
@patch("src.orchestrators.advanced.MagenticBuilder")
|
| 24 |
+
@patch("src.orchestrators.advanced.get_chat_client")
|
| 25 |
def test_build_workflow_uses_domain(
|
| 26 |
self,
|
| 27 |
+
mock_get_client,
|
| 28 |
mock_builder,
|
| 29 |
mock_create_report,
|
| 30 |
mock_create_hypothesis,
|
| 31 |
mock_create_judge,
|
| 32 |
mock_create_search,
|
|
|
|
| 33 |
):
|
| 34 |
+
mock_client = MagicMock()
|
| 35 |
+
mock_get_client.return_value = mock_client
|
| 36 |
+
|
| 37 |
orch = AdvancedOrchestrator(domain=ResearchDomain.SEXUAL_HEALTH, api_key="sk-test")
|
| 38 |
|
| 39 |
# Call private method to verify agent creation calls
|
| 40 |
orch._build_workflow()
|
| 41 |
|
| 42 |
+
# Verify agents created with domain and correct client
|
| 43 |
+
mock_create_search.assert_called_with(mock_client, domain=ResearchDomain.SEXUAL_HEALTH)
|
| 44 |
+
mock_create_judge.assert_called_with(mock_client, domain=ResearchDomain.SEXUAL_HEALTH)
|
| 45 |
+
mock_create_hypothesis.assert_called_with(mock_client, domain=ResearchDomain.SEXUAL_HEALTH)
|
| 46 |
+
mock_create_report.assert_called_with(mock_client, domain=ResearchDomain.SEXUAL_HEALTH)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,14 +1,16 @@
|
|
| 1 |
"""Tests for Orchestrator Factory domain support."""
|
| 2 |
|
| 3 |
-
from unittest.mock import
|
| 4 |
|
| 5 |
from src.config.domain import ResearchDomain
|
| 6 |
from src.orchestrators.factory import create_orchestrator
|
| 7 |
|
| 8 |
|
| 9 |
class TestFactoryDomain:
|
| 10 |
-
@patch("src.orchestrators.factory.
|
| 11 |
-
def
|
|
|
|
|
|
|
| 12 |
mock_search = MagicMock()
|
| 13 |
mock_judge = MagicMock()
|
| 14 |
|
|
@@ -19,12 +21,8 @@ class TestFactoryDomain:
|
|
| 19 |
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 20 |
)
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
judge_handler=mock_judge,
|
| 25 |
-
config=ANY,
|
| 26 |
-
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 27 |
-
)
|
| 28 |
|
| 29 |
@patch("src.orchestrators.factory._get_advanced_orchestrator_class")
|
| 30 |
def test_create_advanced_uses_domain(self, mock_get_cls):
|
|
|
|
| 1 |
"""Tests for Orchestrator Factory domain support."""
|
| 2 |
|
| 3 |
+
from unittest.mock import MagicMock, patch
|
| 4 |
|
| 5 |
from src.config.domain import ResearchDomain
|
| 6 |
from src.orchestrators.factory import create_orchestrator
|
| 7 |
|
| 8 |
|
| 9 |
class TestFactoryDomain:
|
| 10 |
+
@patch("src.orchestrators.factory._get_advanced_orchestrator_class")
|
| 11 |
+
def test_create_simple_maps_to_advanced_with_domain(self, mock_get_cls):
|
| 12 |
+
mock_adv_cls = MagicMock()
|
| 13 |
+
mock_get_cls.return_value = mock_adv_cls
|
| 14 |
mock_search = MagicMock()
|
| 15 |
mock_judge = MagicMock()
|
| 16 |
|
|
|
|
| 21 |
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 22 |
)
|
| 23 |
|
| 24 |
+
call_kwargs = mock_adv_cls.call_args.kwargs
|
| 25 |
+
assert call_kwargs["domain"] == ResearchDomain.SEXUAL_HEALTH
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
@patch("src.orchestrators.factory._get_advanced_orchestrator_class")
|
| 28 |
def test_create_advanced_uses_domain(self, mock_get_cls):
|
|
@@ -1,47 +0,0 @@
|
|
| 1 |
-
"""Tests for Orchestrator (Simple) domain support."""
|
| 2 |
-
|
| 3 |
-
from unittest.mock import MagicMock
|
| 4 |
-
|
| 5 |
-
from src.config.domain import SEXUAL_HEALTH_CONFIG, ResearchDomain
|
| 6 |
-
from src.orchestrators.simple import Orchestrator
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
class TestSimpleOrchestratorDomain:
|
| 10 |
-
def test_orchestrator_accepts_domain(self):
|
| 11 |
-
mock_search = MagicMock()
|
| 12 |
-
mock_judge = MagicMock()
|
| 13 |
-
|
| 14 |
-
orch = Orchestrator(
|
| 15 |
-
search_handler=mock_search,
|
| 16 |
-
judge_handler=mock_judge,
|
| 17 |
-
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 18 |
-
)
|
| 19 |
-
|
| 20 |
-
assert orch.domain == ResearchDomain.SEXUAL_HEALTH
|
| 21 |
-
assert orch.domain_config.name == SEXUAL_HEALTH_CONFIG.name
|
| 22 |
-
|
| 23 |
-
def test_orchestrator_uses_domain_title_in_synthesis(self):
|
| 24 |
-
mock_search = MagicMock()
|
| 25 |
-
mock_judge = MagicMock()
|
| 26 |
-
|
| 27 |
-
orch = Orchestrator(
|
| 28 |
-
search_handler=mock_search,
|
| 29 |
-
judge_handler=mock_judge,
|
| 30 |
-
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 31 |
-
)
|
| 32 |
-
|
| 33 |
-
# Test _generate_template_synthesis (the sync fallback method)
|
| 34 |
-
mock_assessment = MagicMock()
|
| 35 |
-
mock_assessment.details.drug_candidates = []
|
| 36 |
-
mock_assessment.details.key_findings = []
|
| 37 |
-
mock_assessment.confidence = 0.5
|
| 38 |
-
mock_assessment.reasoning = "test"
|
| 39 |
-
mock_assessment.details.mechanism_score = 5
|
| 40 |
-
mock_assessment.details.clinical_evidence_score = 5
|
| 41 |
-
|
| 42 |
-
report = orch._generate_template_synthesis("query", [], mock_assessment)
|
| 43 |
-
assert "## Sexual Health Analysis" in report
|
| 44 |
-
|
| 45 |
-
# Test _generate_partial_synthesis
|
| 46 |
-
report_partial = orch._generate_partial_synthesis("query", [])
|
| 47 |
-
assert "## Sexual Health Analysis" in report_partial
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,320 +0,0 @@
|
|
| 1 |
-
"""Tests for simple orchestrator LLM synthesis."""
|
| 2 |
-
|
| 3 |
-
from unittest.mock import AsyncMock, MagicMock, patch
|
| 4 |
-
|
| 5 |
-
import pytest
|
| 6 |
-
|
| 7 |
-
from src.orchestrators.simple import Orchestrator
|
| 8 |
-
from src.utils.models import AssessmentDetails, Citation, Evidence, JudgeAssessment
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
@pytest.fixture
|
| 12 |
-
def sample_evidence() -> list[Evidence]:
|
| 13 |
-
"""Sample evidence for testing synthesis."""
|
| 14 |
-
return [
|
| 15 |
-
Evidence(
|
| 16 |
-
content="Testosterone therapy demonstrates efficacy in treating HSDD.",
|
| 17 |
-
citation=Citation(
|
| 18 |
-
source="pubmed",
|
| 19 |
-
title="Testosterone and Female Sexual Desire",
|
| 20 |
-
url="https://pubmed.ncbi.nlm.nih.gov/12345/",
|
| 21 |
-
date="2023",
|
| 22 |
-
authors=["Smith J", "Jones A"],
|
| 23 |
-
),
|
| 24 |
-
),
|
| 25 |
-
Evidence(
|
| 26 |
-
content="A meta-analysis of 8 RCTs shows significant improvement in sexual desire.",
|
| 27 |
-
citation=Citation(
|
| 28 |
-
source="pubmed",
|
| 29 |
-
title="Meta-analysis of Testosterone Therapy",
|
| 30 |
-
url="https://pubmed.ncbi.nlm.nih.gov/67890/",
|
| 31 |
-
date="2024",
|
| 32 |
-
authors=["Johnson B"],
|
| 33 |
-
),
|
| 34 |
-
),
|
| 35 |
-
]
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
@pytest.fixture
|
| 39 |
-
def sample_assessment() -> JudgeAssessment:
|
| 40 |
-
"""Sample assessment for testing synthesis."""
|
| 41 |
-
return JudgeAssessment(
|
| 42 |
-
sufficient=True,
|
| 43 |
-
confidence=0.85,
|
| 44 |
-
reasoning="Evidence is sufficient to synthesize findings on testosterone therapy for HSDD.",
|
| 45 |
-
recommendation="synthesize",
|
| 46 |
-
next_search_queries=[],
|
| 47 |
-
details=AssessmentDetails(
|
| 48 |
-
mechanism_score=8,
|
| 49 |
-
mechanism_reasoning="Strong evidence of androgen receptor activation pathway.",
|
| 50 |
-
clinical_evidence_score=7,
|
| 51 |
-
clinical_reasoning="Multiple RCTs support efficacy in postmenopausal HSDD.",
|
| 52 |
-
drug_candidates=["Testosterone", "LibiGel"],
|
| 53 |
-
key_findings=[
|
| 54 |
-
"Testosterone improves libido in postmenopausal women",
|
| 55 |
-
"Transdermal formulation has best safety profile",
|
| 56 |
-
],
|
| 57 |
-
),
|
| 58 |
-
)
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
@pytest.mark.unit
|
| 62 |
-
class TestGenerateSynthesis:
|
| 63 |
-
"""Tests for _generate_synthesis method."""
|
| 64 |
-
|
| 65 |
-
@pytest.mark.asyncio
|
| 66 |
-
async def test_calls_llm_for_narrative(
|
| 67 |
-
self,
|
| 68 |
-
sample_evidence: list[Evidence],
|
| 69 |
-
sample_assessment: JudgeAssessment,
|
| 70 |
-
) -> None:
|
| 71 |
-
"""Synthesis should make an LLM call using pydantic_ai when judge is paid tier."""
|
| 72 |
-
mock_search = MagicMock()
|
| 73 |
-
# Paid tier JudgeHandler has 'assess' but NOT 'synthesize'
|
| 74 |
-
mock_judge = MagicMock(spec=["assess"])
|
| 75 |
-
|
| 76 |
-
orchestrator = Orchestrator(
|
| 77 |
-
search_handler=mock_search,
|
| 78 |
-
judge_handler=mock_judge,
|
| 79 |
-
)
|
| 80 |
-
orchestrator.history = [{"iteration": 1}] # Needed for footer
|
| 81 |
-
|
| 82 |
-
with (
|
| 83 |
-
patch("pydantic_ai.Agent") as mock_agent_class,
|
| 84 |
-
patch("src.agent_factory.judges.get_model") as mock_get_model,
|
| 85 |
-
):
|
| 86 |
-
mock_model = MagicMock()
|
| 87 |
-
mock_get_model.return_value = mock_model
|
| 88 |
-
|
| 89 |
-
mock_agent = MagicMock()
|
| 90 |
-
mock_result = MagicMock()
|
| 91 |
-
mock_result.output = """### Executive Summary
|
| 92 |
-
|
| 93 |
-
Testosterone therapy demonstrates consistent efficacy for HSDD treatment.
|
| 94 |
-
|
| 95 |
-
### Background
|
| 96 |
-
|
| 97 |
-
HSDD affects many postmenopausal women.
|
| 98 |
-
|
| 99 |
-
### Evidence Synthesis
|
| 100 |
-
|
| 101 |
-
Studies show significant improvement in sexual desire scores.
|
| 102 |
-
|
| 103 |
-
### Recommendations
|
| 104 |
-
|
| 105 |
-
1. Consider testosterone therapy for postmenopausal HSDD
|
| 106 |
-
|
| 107 |
-
### Limitations
|
| 108 |
-
|
| 109 |
-
Long-term safety data is limited.
|
| 110 |
-
|
| 111 |
-
### References
|
| 112 |
-
|
| 113 |
-
1. Smith J et al. (2023). Testosterone and Female Sexual Desire."""
|
| 114 |
-
|
| 115 |
-
mock_agent.run = AsyncMock(return_value=mock_result)
|
| 116 |
-
mock_agent_class.return_value = mock_agent
|
| 117 |
-
|
| 118 |
-
result = await orchestrator._generate_synthesis(
|
| 119 |
-
query="testosterone HSDD",
|
| 120 |
-
evidence=sample_evidence,
|
| 121 |
-
assessment=sample_assessment,
|
| 122 |
-
)
|
| 123 |
-
|
| 124 |
-
# Verify LLM agent was created and called
|
| 125 |
-
mock_agent_class.assert_called_once()
|
| 126 |
-
mock_agent.run.assert_called_once()
|
| 127 |
-
|
| 128 |
-
# Verify output includes narrative content
|
| 129 |
-
assert "Executive Summary" in result
|
| 130 |
-
assert "Background" in result
|
| 131 |
-
assert "Evidence Synthesis" in result
|
| 132 |
-
|
| 133 |
-
@pytest.mark.asyncio
|
| 134 |
-
async def test_uses_free_tier_synthesis_when_available(
|
| 135 |
-
self,
|
| 136 |
-
sample_evidence: list[Evidence],
|
| 137 |
-
sample_assessment: JudgeAssessment,
|
| 138 |
-
) -> None:
|
| 139 |
-
"""Synthesis should use judge's synthesize method when in Free Tier."""
|
| 140 |
-
mock_search = MagicMock()
|
| 141 |
-
# Free tier JudgeHandler has 'synthesize' method
|
| 142 |
-
mock_judge = MagicMock()
|
| 143 |
-
# Setup synthesize method
|
| 144 |
-
mock_judge.synthesize = AsyncMock(return_value="Free tier narrative content.")
|
| 145 |
-
|
| 146 |
-
orchestrator = Orchestrator(
|
| 147 |
-
search_handler=mock_search,
|
| 148 |
-
judge_handler=mock_judge,
|
| 149 |
-
)
|
| 150 |
-
orchestrator.history = [{"iteration": 1}]
|
| 151 |
-
|
| 152 |
-
# We don't need to patch Agent or get_model because they shouldn't be called
|
| 153 |
-
result = await orchestrator._generate_synthesis(
|
| 154 |
-
query="test query",
|
| 155 |
-
evidence=sample_evidence,
|
| 156 |
-
assessment=sample_assessment,
|
| 157 |
-
)
|
| 158 |
-
|
| 159 |
-
# Verify judge's synthesize was called
|
| 160 |
-
mock_judge.synthesize.assert_called_once()
|
| 161 |
-
|
| 162 |
-
# Verify result contains the free tier content
|
| 163 |
-
assert "Free tier narrative content" in result
|
| 164 |
-
# Should still include footer
|
| 165 |
-
assert "Full Citation List" in result
|
| 166 |
-
|
| 167 |
-
@pytest.mark.asyncio
|
| 168 |
-
async def test_falls_back_on_llm_error_with_notice(
|
| 169 |
-
self,
|
| 170 |
-
sample_evidence: list[Evidence],
|
| 171 |
-
sample_assessment: JudgeAssessment,
|
| 172 |
-
) -> None:
|
| 173 |
-
"""Synthesis should fall back to template if LLM fails, WITH error notice."""
|
| 174 |
-
mock_search = MagicMock()
|
| 175 |
-
# Paid tier simulation
|
| 176 |
-
mock_judge = MagicMock(spec=["assess"])
|
| 177 |
-
|
| 178 |
-
orchestrator = Orchestrator(
|
| 179 |
-
search_handler=mock_search,
|
| 180 |
-
judge_handler=mock_judge,
|
| 181 |
-
)
|
| 182 |
-
orchestrator.history = [{"iteration": 1}]
|
| 183 |
-
|
| 184 |
-
with patch("pydantic_ai.Agent") as mock_agent_class:
|
| 185 |
-
# Simulate LLM failure
|
| 186 |
-
mock_agent_class.side_effect = Exception("LLM unavailable")
|
| 187 |
-
|
| 188 |
-
result = await orchestrator._generate_synthesis(
|
| 189 |
-
query="testosterone HSDD",
|
| 190 |
-
evidence=sample_evidence,
|
| 191 |
-
assessment=sample_assessment,
|
| 192 |
-
)
|
| 193 |
-
|
| 194 |
-
# Should surface error to user (MS Agent Framework pattern)
|
| 195 |
-
assert "AI narrative synthesis unavailable" in result
|
| 196 |
-
assert "Error" in result
|
| 197 |
-
|
| 198 |
-
# Should still include template content
|
| 199 |
-
assert "Assessment" in result or "Drug Candidates" in result
|
| 200 |
-
assert "Testosterone" in result # Drug candidate should be present
|
| 201 |
-
|
| 202 |
-
@pytest.mark.asyncio
|
| 203 |
-
async def test_includes_citation_footer(
|
| 204 |
-
self,
|
| 205 |
-
sample_evidence: list[Evidence],
|
| 206 |
-
sample_assessment: JudgeAssessment,
|
| 207 |
-
) -> None:
|
| 208 |
-
"""Synthesis should include full citation list footer."""
|
| 209 |
-
mock_search = MagicMock()
|
| 210 |
-
# Paid tier simulation
|
| 211 |
-
mock_judge = MagicMock(spec=["assess"])
|
| 212 |
-
|
| 213 |
-
orchestrator = Orchestrator(
|
| 214 |
-
search_handler=mock_search,
|
| 215 |
-
judge_handler=mock_judge,
|
| 216 |
-
)
|
| 217 |
-
orchestrator.history = [{"iteration": 1}]
|
| 218 |
-
|
| 219 |
-
with (
|
| 220 |
-
patch("pydantic_ai.Agent") as mock_agent_class,
|
| 221 |
-
patch("src.agent_factory.judges.get_model"),
|
| 222 |
-
):
|
| 223 |
-
mock_agent = MagicMock()
|
| 224 |
-
mock_result = MagicMock()
|
| 225 |
-
mock_result.output = "Narrative synthesis content."
|
| 226 |
-
mock_agent.run = AsyncMock(return_value=mock_result)
|
| 227 |
-
mock_agent_class.return_value = mock_agent
|
| 228 |
-
|
| 229 |
-
result = await orchestrator._generate_synthesis(
|
| 230 |
-
query="test query",
|
| 231 |
-
evidence=sample_evidence,
|
| 232 |
-
assessment=sample_assessment,
|
| 233 |
-
)
|
| 234 |
-
|
| 235 |
-
# Should include citation footer
|
| 236 |
-
assert "Full Citation List" in result
|
| 237 |
-
assert "pubmed.ncbi.nlm.nih.gov/12345" in result
|
| 238 |
-
assert "pubmed.ncbi.nlm.nih.gov/67890" in result
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
@pytest.mark.unit
|
| 242 |
-
class TestGenerateTemplateSynthesis:
|
| 243 |
-
"""Tests for _generate_template_synthesis fallback method."""
|
| 244 |
-
|
| 245 |
-
def test_returns_structured_output(
|
| 246 |
-
self,
|
| 247 |
-
sample_evidence: list[Evidence],
|
| 248 |
-
sample_assessment: JudgeAssessment,
|
| 249 |
-
) -> None:
|
| 250 |
-
"""Template synthesis should return structured markdown."""
|
| 251 |
-
mock_search = MagicMock()
|
| 252 |
-
mock_judge = MagicMock()
|
| 253 |
-
|
| 254 |
-
orchestrator = Orchestrator(
|
| 255 |
-
search_handler=mock_search,
|
| 256 |
-
judge_handler=mock_judge,
|
| 257 |
-
)
|
| 258 |
-
orchestrator.history = [{"iteration": 1}]
|
| 259 |
-
|
| 260 |
-
result = orchestrator._generate_template_synthesis(
|
| 261 |
-
query="testosterone HSDD",
|
| 262 |
-
evidence=sample_evidence,
|
| 263 |
-
assessment=sample_assessment,
|
| 264 |
-
)
|
| 265 |
-
|
| 266 |
-
# Should have all required sections
|
| 267 |
-
assert "Question" in result
|
| 268 |
-
assert "Drug Candidates" in result
|
| 269 |
-
assert "Key Findings" in result
|
| 270 |
-
assert "Assessment" in result
|
| 271 |
-
assert "Citations" in result
|
| 272 |
-
|
| 273 |
-
def test_includes_drug_candidates(
|
| 274 |
-
self,
|
| 275 |
-
sample_evidence: list[Evidence],
|
| 276 |
-
sample_assessment: JudgeAssessment,
|
| 277 |
-
) -> None:
|
| 278 |
-
"""Template synthesis should list drug candidates."""
|
| 279 |
-
mock_search = MagicMock()
|
| 280 |
-
mock_judge = MagicMock()
|
| 281 |
-
|
| 282 |
-
orchestrator = Orchestrator(
|
| 283 |
-
search_handler=mock_search,
|
| 284 |
-
judge_handler=mock_judge,
|
| 285 |
-
)
|
| 286 |
-
orchestrator.history = [{"iteration": 1}]
|
| 287 |
-
|
| 288 |
-
result = orchestrator._generate_template_synthesis(
|
| 289 |
-
query="test",
|
| 290 |
-
evidence=sample_evidence,
|
| 291 |
-
assessment=sample_assessment,
|
| 292 |
-
)
|
| 293 |
-
|
| 294 |
-
assert "Testosterone" in result
|
| 295 |
-
assert "LibiGel" in result
|
| 296 |
-
|
| 297 |
-
def test_includes_scores(
|
| 298 |
-
self,
|
| 299 |
-
sample_evidence: list[Evidence],
|
| 300 |
-
sample_assessment: JudgeAssessment,
|
| 301 |
-
) -> None:
|
| 302 |
-
"""Template synthesis should include assessment scores."""
|
| 303 |
-
mock_search = MagicMock()
|
| 304 |
-
mock_judge = MagicMock()
|
| 305 |
-
|
| 306 |
-
orchestrator = Orchestrator(
|
| 307 |
-
search_handler=mock_search,
|
| 308 |
-
judge_handler=mock_judge,
|
| 309 |
-
)
|
| 310 |
-
orchestrator.history = [{"iteration": 1}]
|
| 311 |
-
|
| 312 |
-
result = orchestrator._generate_template_synthesis(
|
| 313 |
-
query="test",
|
| 314 |
-
evidence=sample_evidence,
|
| 315 |
-
assessment=sample_assessment,
|
| 316 |
-
)
|
| 317 |
-
|
| 318 |
-
assert "8/10" in result # Mechanism score
|
| 319 |
-
assert "7/10" in result # Clinical score
|
| 320 |
-
assert "85%" in result # Confidence
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,104 +0,0 @@
|
|
| 1 |
-
from typing import Literal
|
| 2 |
-
from unittest.mock import MagicMock
|
| 3 |
-
|
| 4 |
-
import pytest
|
| 5 |
-
|
| 6 |
-
from src.orchestrators.simple import Orchestrator
|
| 7 |
-
from src.utils.models import AssessmentDetails, JudgeAssessment
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
def make_assessment(
|
| 11 |
-
mechanism: int,
|
| 12 |
-
clinical: int,
|
| 13 |
-
drug_candidates: list[str],
|
| 14 |
-
sufficient: bool = False,
|
| 15 |
-
recommendation: Literal["continue", "synthesize"] = "continue",
|
| 16 |
-
confidence: float = 0.8,
|
| 17 |
-
) -> JudgeAssessment:
|
| 18 |
-
return JudgeAssessment(
|
| 19 |
-
details=AssessmentDetails(
|
| 20 |
-
mechanism_score=mechanism,
|
| 21 |
-
mechanism_reasoning="reasoning is sufficient for testing purposes",
|
| 22 |
-
clinical_evidence_score=clinical,
|
| 23 |
-
clinical_reasoning="reasoning is sufficient for testing purposes",
|
| 24 |
-
drug_candidates=drug_candidates,
|
| 25 |
-
key_findings=["finding"],
|
| 26 |
-
),
|
| 27 |
-
sufficient=sufficient,
|
| 28 |
-
confidence=confidence,
|
| 29 |
-
recommendation=recommendation,
|
| 30 |
-
next_search_queries=[],
|
| 31 |
-
reasoning="reasoning is sufficient for testing purposes",
|
| 32 |
-
)
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
@pytest.fixture
|
| 36 |
-
def orchestrator():
|
| 37 |
-
search = MagicMock()
|
| 38 |
-
judge = MagicMock()
|
| 39 |
-
return Orchestrator(search, judge)
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
@pytest.mark.unit
|
| 43 |
-
def test_should_synthesize_high_scores(orchestrator):
|
| 44 |
-
"""High scores with drug candidates triggers synthesis."""
|
| 45 |
-
assessment = make_assessment(mechanism=7, clinical=6, drug_candidates=["Testosterone"])
|
| 46 |
-
|
| 47 |
-
# Access the private method via name mangling or just call it if it was public.
|
| 48 |
-
# Since I made it private _should_synthesize, I access it directly.
|
| 49 |
-
should_synth, reason = orchestrator._should_synthesize(
|
| 50 |
-
assessment, iteration=3, max_iterations=10, evidence_count=50
|
| 51 |
-
)
|
| 52 |
-
|
| 53 |
-
assert should_synth is True
|
| 54 |
-
assert reason == "high_scores_with_candidates"
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
@pytest.mark.unit
|
| 58 |
-
def test_should_synthesize_late_iteration(orchestrator):
|
| 59 |
-
"""Late iteration with acceptable scores triggers synthesis."""
|
| 60 |
-
assessment = make_assessment(mechanism=5, clinical=4, drug_candidates=[])
|
| 61 |
-
should_synth, reason = orchestrator._should_synthesize(
|
| 62 |
-
assessment, iteration=9, max_iterations=10, evidence_count=80
|
| 63 |
-
)
|
| 64 |
-
|
| 65 |
-
assert should_synth is True
|
| 66 |
-
assert reason in ["late_iteration_acceptable", "emergency_synthesis"]
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
@pytest.mark.unit
|
| 70 |
-
def test_should_not_synthesize_early_low_scores(orchestrator):
|
| 71 |
-
"""Early iteration with low scores continues searching."""
|
| 72 |
-
assessment = make_assessment(mechanism=3, clinical=2, drug_candidates=[])
|
| 73 |
-
should_synth, reason = orchestrator._should_synthesize(
|
| 74 |
-
assessment, iteration=2, max_iterations=10, evidence_count=20
|
| 75 |
-
)
|
| 76 |
-
|
| 77 |
-
assert should_synth is False
|
| 78 |
-
assert reason == "continue_searching"
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
@pytest.mark.unit
|
| 82 |
-
def test_judge_approved_overrides_all(orchestrator):
|
| 83 |
-
"""If judge explicitly says synthesize with good scores, do it."""
|
| 84 |
-
assessment = make_assessment(
|
| 85 |
-
mechanism=6, clinical=5, drug_candidates=[], sufficient=True, recommendation="synthesize"
|
| 86 |
-
)
|
| 87 |
-
should_synth, reason = orchestrator._should_synthesize(
|
| 88 |
-
assessment, iteration=2, max_iterations=10, evidence_count=20
|
| 89 |
-
)
|
| 90 |
-
|
| 91 |
-
assert should_synth is True
|
| 92 |
-
assert reason == "judge_approved"
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
@pytest.mark.unit
|
| 96 |
-
def test_max_evidence_threshold(orchestrator):
|
| 97 |
-
"""Force synthesis if we have tons of evidence."""
|
| 98 |
-
assessment = make_assessment(mechanism=2, clinical=2, drug_candidates=[])
|
| 99 |
-
should_synth, reason = orchestrator._should_synthesize(
|
| 100 |
-
assessment, iteration=5, max_iterations=10, evidence_count=150
|
| 101 |
-
)
|
| 102 |
-
|
| 103 |
-
assert should_synth is True
|
| 104 |
-
assert reason == "max_evidence_reached"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,82 +1,91 @@
|
|
| 1 |
-
"""Tests for App domain support."""
|
| 2 |
|
| 3 |
from unittest.mock import ANY, MagicMock, patch
|
| 4 |
|
|
|
|
|
|
|
| 5 |
from src.app import configure_orchestrator, research_agent
|
| 6 |
from src.config.domain import ResearchDomain
|
| 7 |
|
|
|
|
|
|
|
| 8 |
|
| 9 |
class TestAppDomain:
|
|
|
|
|
|
|
| 10 |
@patch("src.app.create_orchestrator")
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
# MockJudgeHandler should receive domain
|
| 17 |
-
mock_judge.assert_called_with(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 18 |
mock_create.assert_called_with(
|
| 19 |
-
search_handler=ANY,
|
| 20 |
-
judge_handler=ANY,
|
| 21 |
config=ANY,
|
| 22 |
-
mode="
|
| 23 |
api_key=None,
|
| 24 |
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 25 |
)
|
| 26 |
|
| 27 |
-
@patch.dict("os.environ", {}, clear=True)
|
| 28 |
-
@patch("src.app.settings")
|
| 29 |
@patch("src.app.create_orchestrator")
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
"""Test domain is passed when using free tier (no API keys)."""
|
| 35 |
-
# Simulate no keys in settings
|
| 36 |
-
mock_settings.has_openai_key = False
|
| 37 |
-
mock_settings.has_anthropic_key = False
|
| 38 |
|
| 39 |
-
configure_orchestrator(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
# HFInferenceJudgeHandler should receive domain (no API keys = free tier)
|
| 42 |
-
mock_hf_judge.assert_called_with(domain=ResearchDomain.SEXUAL_HEALTH)
|
| 43 |
mock_create.assert_called_with(
|
| 44 |
-
search_handler=ANY,
|
| 45 |
-
judge_handler=ANY,
|
| 46 |
config=ANY,
|
| 47 |
-
mode="
|
| 48 |
-
api_key=
|
| 49 |
-
domain=
|
| 50 |
)
|
| 51 |
|
|
|
|
| 52 |
@patch("src.app.settings")
|
| 53 |
@patch("src.app.configure_orchestrator")
|
| 54 |
async def test_research_agent_passes_domain(self, mock_config, mock_settings):
|
|
|
|
| 55 |
# Mock settings to have some state
|
| 56 |
mock_settings.has_openai_key = False
|
| 57 |
mock_settings.has_anthropic_key = False
|
| 58 |
|
| 59 |
# Mock orchestrator
|
| 60 |
mock_orch = MagicMock()
|
| 61 |
-
mock_orch.run.return_value = [] # Async iterator?
|
| 62 |
|
| 63 |
-
#
|
| 64 |
async def async_gen(*args):
|
| 65 |
if False:
|
| 66 |
yield # Make it a generator
|
| 67 |
|
| 68 |
mock_orch.run = async_gen
|
| 69 |
-
|
| 70 |
mock_config.return_value = (mock_orch, "Test Backend")
|
| 71 |
|
| 72 |
-
#
|
| 73 |
gen = research_agent(
|
| 74 |
-
message="query",
|
|
|
|
|
|
|
| 75 |
)
|
| 76 |
|
| 77 |
async for _ in gen:
|
| 78 |
pass
|
| 79 |
|
|
|
|
| 80 |
mock_config.assert_called_with(
|
| 81 |
-
use_mock=False,
|
|
|
|
|
|
|
|
|
|
| 82 |
)
|
|
|
|
| 1 |
+
"""Tests for App domain support (SPEC-16: Unified Architecture)."""
|
| 2 |
|
| 3 |
from unittest.mock import ANY, MagicMock, patch
|
| 4 |
|
| 5 |
+
import pytest
|
| 6 |
+
|
| 7 |
from src.app import configure_orchestrator, research_agent
|
| 8 |
from src.config.domain import ResearchDomain
|
| 9 |
|
| 10 |
+
pytestmark = pytest.mark.unit
|
| 11 |
+
|
| 12 |
|
| 13 |
class TestAppDomain:
|
| 14 |
+
"""Test domain parameter handling in app.py."""
|
| 15 |
+
|
| 16 |
@patch("src.app.create_orchestrator")
|
| 17 |
+
def test_configure_orchestrator_passes_domain(self, mock_create):
|
| 18 |
+
"""Test domain is passed to create_orchestrator (SPEC-16: unified architecture)."""
|
| 19 |
+
# Mock return value
|
| 20 |
+
mock_orch = MagicMock()
|
| 21 |
+
mock_create.return_value = mock_orch
|
| 22 |
+
|
| 23 |
+
configure_orchestrator(
|
| 24 |
+
use_mock=False,
|
| 25 |
+
mode="advanced", # SPEC-16: always advanced
|
| 26 |
+
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 27 |
+
)
|
| 28 |
|
|
|
|
|
|
|
| 29 |
mock_create.assert_called_with(
|
|
|
|
|
|
|
| 30 |
config=ANY,
|
| 31 |
+
mode="advanced",
|
| 32 |
api_key=None,
|
| 33 |
domain=ResearchDomain.SEXUAL_HEALTH,
|
| 34 |
)
|
| 35 |
|
|
|
|
|
|
|
| 36 |
@patch("src.app.create_orchestrator")
|
| 37 |
+
def test_configure_orchestrator_with_api_key(self, mock_create):
|
| 38 |
+
"""Test API key is passed through."""
|
| 39 |
+
mock_orch = MagicMock()
|
| 40 |
+
mock_create.return_value = mock_orch
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
configure_orchestrator(
|
| 43 |
+
use_mock=False,
|
| 44 |
+
user_api_key="sk-test-key",
|
| 45 |
+
domain="sexual_health",
|
| 46 |
+
)
|
| 47 |
|
|
|
|
|
|
|
| 48 |
mock_create.assert_called_with(
|
|
|
|
|
|
|
| 49 |
config=ANY,
|
| 50 |
+
mode="advanced",
|
| 51 |
+
api_key="sk-test-key",
|
| 52 |
+
domain="sexual_health",
|
| 53 |
)
|
| 54 |
|
| 55 |
+
@pytest.mark.asyncio
|
| 56 |
@patch("src.app.settings")
|
| 57 |
@patch("src.app.configure_orchestrator")
|
| 58 |
async def test_research_agent_passes_domain(self, mock_config, mock_settings):
|
| 59 |
+
"""Test research_agent passes domain to configure_orchestrator."""
|
| 60 |
# Mock settings to have some state
|
| 61 |
mock_settings.has_openai_key = False
|
| 62 |
mock_settings.has_anthropic_key = False
|
| 63 |
|
| 64 |
# Mock orchestrator
|
| 65 |
mock_orch = MagicMock()
|
|
|
|
| 66 |
|
| 67 |
+
# Mock async generator
|
| 68 |
async def async_gen(*args):
|
| 69 |
if False:
|
| 70 |
yield # Make it a generator
|
| 71 |
|
| 72 |
mock_orch.run = async_gen
|
|
|
|
| 73 |
mock_config.return_value = (mock_orch, "Test Backend")
|
| 74 |
|
| 75 |
+
# SPEC-16: mode parameter removed from research_agent
|
| 76 |
gen = research_agent(
|
| 77 |
+
message="query",
|
| 78 |
+
history=[],
|
| 79 |
+
domain=ResearchDomain.SEXUAL_HEALTH.value,
|
| 80 |
)
|
| 81 |
|
| 82 |
async for _ in gen:
|
| 83 |
pass
|
| 84 |
|
| 85 |
+
# SPEC-16: mode is always "advanced"
|
| 86 |
mock_config.assert_called_with(
|
| 87 |
+
use_mock=False,
|
| 88 |
+
mode="advanced",
|
| 89 |
+
user_api_key=None,
|
| 90 |
+
domain=ResearchDomain.SEXUAL_HEALTH.value,
|
| 91 |
)
|
|
@@ -36,10 +36,10 @@ async def test_research_agent_handles_none_parameters():
|
|
| 36 |
try:
|
| 37 |
# This should NOT raise AttributeError: 'NoneType' object has no attribute 'strip'
|
| 38 |
results = []
|
|
|
|
| 39 |
async for result in research_agent(
|
| 40 |
message="test query",
|
| 41 |
history=[],
|
| 42 |
-
mode="simple",
|
| 43 |
api_key=None, # Simulating Gradio passing None
|
| 44 |
api_key_state=None, # Simulating Gradio passing None
|
| 45 |
):
|
|
@@ -71,10 +71,10 @@ async def test_research_agent_handles_empty_string_parameters():
|
|
| 71 |
|
| 72 |
try:
|
| 73 |
results = []
|
|
|
|
| 74 |
async for result in research_agent(
|
| 75 |
message="test query",
|
| 76 |
history=[],
|
| 77 |
-
mode="simple",
|
| 78 |
api_key="", # Normal empty string
|
| 79 |
api_key_state="", # Normal empty string
|
| 80 |
):
|
|
|
|
| 36 |
try:
|
| 37 |
# This should NOT raise AttributeError: 'NoneType' object has no attribute 'strip'
|
| 38 |
results = []
|
| 39 |
+
# SPEC-16: mode parameter removed (unified architecture)
|
| 40 |
async for result in research_agent(
|
| 41 |
message="test query",
|
| 42 |
history=[],
|
|
|
|
| 43 |
api_key=None, # Simulating Gradio passing None
|
| 44 |
api_key_state=None, # Simulating Gradio passing None
|
| 45 |
):
|
|
|
|
| 71 |
|
| 72 |
try:
|
| 73 |
results = []
|
| 74 |
+
# SPEC-16: mode parameter removed (unified architecture)
|
| 75 |
async for result in research_agent(
|
| 76 |
message="test query",
|
| 77 |
history=[],
|
|
|
|
| 78 |
api_key="", # Normal empty string
|
| 79 |
api_key_state="", # Normal empty string
|
| 80 |
):
|
|
@@ -1,101 +0,0 @@
|
|
| 1 |
-
"""Tests for Magentic Orchestrator fixes."""
|
| 2 |
-
|
| 3 |
-
from unittest.mock import MagicMock, patch
|
| 4 |
-
|
| 5 |
-
import pytest
|
| 6 |
-
|
| 7 |
-
# Skip all tests if agent_framework not installed (optional dep)
|
| 8 |
-
pytest.importorskip("agent_framework")
|
| 9 |
-
|
| 10 |
-
from agent_framework import MagenticFinalResultEvent # noqa: E402
|
| 11 |
-
|
| 12 |
-
from src.orchestrators.advanced import AdvancedOrchestrator as MagenticOrchestrator # noqa: E402
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
class MockChatMessage:
|
| 16 |
-
"""Simulates the buggy ChatMessage that returns itself as text or has complex content."""
|
| 17 |
-
|
| 18 |
-
def __init__(self, content_str: str) -> None:
|
| 19 |
-
self.content_str = content_str
|
| 20 |
-
self.role = "assistant"
|
| 21 |
-
|
| 22 |
-
@property
|
| 23 |
-
def text(self) -> "MockChatMessage":
|
| 24 |
-
# Simulate the bug: .text returns the object itself or a repr string
|
| 25 |
-
return self
|
| 26 |
-
|
| 27 |
-
@property
|
| 28 |
-
def content(self) -> str:
|
| 29 |
-
# The fix plan says we should look for .content
|
| 30 |
-
return self.content_str
|
| 31 |
-
|
| 32 |
-
def __repr__(self) -> str:
|
| 33 |
-
return "<ChatMessage object at 0xMOCK>"
|
| 34 |
-
|
| 35 |
-
def __str__(self) -> str:
|
| 36 |
-
return "<ChatMessage object at 0xMOCK>"
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
@pytest.fixture
|
| 40 |
-
def mock_magentic_requirements():
|
| 41 |
-
"""Mock the API key check so tests run in CI without OPENAI_API_KEY."""
|
| 42 |
-
with patch("src.orchestrators.advanced.check_magentic_requirements"):
|
| 43 |
-
yield
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
class TestMagenticFixes:
|
| 47 |
-
"""Tests for the Magentic mode fixes."""
|
| 48 |
-
|
| 49 |
-
def test_process_event_extracts_text_correctly(self, mock_magentic_requirements) -> None:
|
| 50 |
-
"""
|
| 51 |
-
Test that _process_event correctly extracts text from a ChatMessage.
|
| 52 |
-
|
| 53 |
-
Verifies fix for bug where .text returns the object itself.
|
| 54 |
-
"""
|
| 55 |
-
orchestrator = MagenticOrchestrator()
|
| 56 |
-
|
| 57 |
-
# Create a mock message that mimics the bug
|
| 58 |
-
buggy_message = MockChatMessage("Final Report Content")
|
| 59 |
-
event = MagenticFinalResultEvent(message=buggy_message) # type: ignore[arg-type]
|
| 60 |
-
|
| 61 |
-
# Process the event
|
| 62 |
-
# We expect the fix to get "Final Report Content" instead of object repr
|
| 63 |
-
result_event = orchestrator._process_event(event, iteration=1)
|
| 64 |
-
|
| 65 |
-
assert result_event is not None
|
| 66 |
-
assert result_event.type == "complete"
|
| 67 |
-
assert result_event.message == "Final Report Content"
|
| 68 |
-
|
| 69 |
-
def test_max_rounds_configuration(self, mock_magentic_requirements) -> None:
|
| 70 |
-
"""Test that max_rounds is correctly passed to the orchestrator."""
|
| 71 |
-
orchestrator = MagenticOrchestrator(max_rounds=25)
|
| 72 |
-
assert orchestrator._max_rounds == 25
|
| 73 |
-
|
| 74 |
-
# Also verify it's used in _build_workflow
|
| 75 |
-
# Mock all the agent creation and OpenAI client calls
|
| 76 |
-
with (
|
| 77 |
-
patch("src.orchestrators.advanced.create_search_agent") as mock_search,
|
| 78 |
-
patch("src.orchestrators.advanced.create_judge_agent") as mock_judge,
|
| 79 |
-
patch("src.orchestrators.advanced.create_hypothesis_agent") as mock_hypo,
|
| 80 |
-
patch("src.orchestrators.advanced.create_report_agent") as mock_report,
|
| 81 |
-
patch("src.orchestrators.advanced.OpenAIChatClient") as mock_client,
|
| 82 |
-
patch("src.orchestrators.advanced.MagenticBuilder") as mock_builder,
|
| 83 |
-
):
|
| 84 |
-
# Setup mocks
|
| 85 |
-
mock_search.return_value = MagicMock()
|
| 86 |
-
mock_judge.return_value = MagicMock()
|
| 87 |
-
mock_hypo.return_value = MagicMock()
|
| 88 |
-
mock_report.return_value = MagicMock()
|
| 89 |
-
mock_client.return_value = MagicMock()
|
| 90 |
-
|
| 91 |
-
# Mock the builder chain
|
| 92 |
-
mock_chain = mock_builder.return_value.participants.return_value
|
| 93 |
-
mock_chain.with_standard_manager.return_value.build.return_value = MagicMock()
|
| 94 |
-
|
| 95 |
-
orchestrator._build_workflow()
|
| 96 |
-
|
| 97 |
-
# Check that max_round_count was passed as 25
|
| 98 |
-
participants_mock = mock_builder.return_value.participants.return_value
|
| 99 |
-
participants_mock.with_standard_manager.assert_called_once()
|
| 100 |
-
call_kwargs = participants_mock.with_standard_manager.call_args.kwargs
|
| 101 |
-
assert call_kwargs["max_round_count"] == 25
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,155 +0,0 @@
|
|
| 1 |
-
"""Tests for Magentic Orchestrator termination guarantee."""
|
| 2 |
-
|
| 3 |
-
from unittest.mock import MagicMock, patch
|
| 4 |
-
|
| 5 |
-
import pytest
|
| 6 |
-
|
| 7 |
-
# Skip all tests if agent_framework not installed (optional dep)
|
| 8 |
-
# MUST come before any agent_framework imports
|
| 9 |
-
pytest.importorskip("agent_framework")
|
| 10 |
-
|
| 11 |
-
from agent_framework import MagenticAgentMessageEvent # noqa: E402
|
| 12 |
-
|
| 13 |
-
from src.orchestrators.advanced import AdvancedOrchestrator as MagenticOrchestrator # noqa: E402
|
| 14 |
-
from src.utils.models import AgentEvent # noqa: E402
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
class MockChatMessage:
|
| 18 |
-
def __init__(self, content):
|
| 19 |
-
self.content = content
|
| 20 |
-
self.role = "assistant"
|
| 21 |
-
|
| 22 |
-
@property
|
| 23 |
-
def text(self):
|
| 24 |
-
return self.content
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
@pytest.fixture
|
| 28 |
-
def mock_magentic_requirements():
|
| 29 |
-
"""Mock requirements check."""
|
| 30 |
-
with patch("src.orchestrators.advanced.check_magentic_requirements"):
|
| 31 |
-
yield
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
@pytest.mark.asyncio
|
| 35 |
-
async def test_termination_event_emitted_on_stream_end(mock_magentic_requirements):
|
| 36 |
-
"""
|
| 37 |
-
Verify that a termination event is emitted when the workflow stream ends
|
| 38 |
-
without a MagenticFinalResultEvent (e.g. max rounds reached).
|
| 39 |
-
"""
|
| 40 |
-
orchestrator = MagenticOrchestrator(max_rounds=2)
|
| 41 |
-
|
| 42 |
-
# Use real event class
|
| 43 |
-
mock_message = MockChatMessage("Thinking...")
|
| 44 |
-
mock_agent_event = MagenticAgentMessageEvent(agent_id="SearchAgent", message=mock_message)
|
| 45 |
-
|
| 46 |
-
# Mock the workflow and its run_stream method
|
| 47 |
-
mock_workflow = MagicMock()
|
| 48 |
-
|
| 49 |
-
# Create an async generator for run_stream
|
| 50 |
-
async def mock_stream(task):
|
| 51 |
-
# Yield the real message event
|
| 52 |
-
yield mock_agent_event
|
| 53 |
-
# STOP HERE - No FinalResultEvent
|
| 54 |
-
|
| 55 |
-
mock_workflow.run_stream = mock_stream
|
| 56 |
-
|
| 57 |
-
# Mock _build_workflow to return our mock workflow
|
| 58 |
-
with patch.object(orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 59 |
-
events = []
|
| 60 |
-
async for event in orchestrator.run("Research query"):
|
| 61 |
-
events.append(event)
|
| 62 |
-
|
| 63 |
-
for i, e in enumerate(events):
|
| 64 |
-
print(f"Event {i}: {e.type} - {e.message}")
|
| 65 |
-
|
| 66 |
-
assert len(events) >= 2
|
| 67 |
-
assert events[0].type == "started"
|
| 68 |
-
|
| 69 |
-
# Verify the message event was processed
|
| 70 |
-
# Depending on _process_event logic, MagenticAgentMessageEvent might map to different types
|
| 71 |
-
# We assume it maps to something valid or we just check presence.
|
| 72 |
-
assert any("Thinking..." in e.message for e in events)
|
| 73 |
-
|
| 74 |
-
# THE CRITICAL CHECK: Did we get the fallback termination event?
|
| 75 |
-
last_event = events[-1]
|
| 76 |
-
assert last_event.type == "complete"
|
| 77 |
-
assert "Max iterations reached" in last_event.message
|
| 78 |
-
assert last_event.data.get("reason") == "max_rounds_reached"
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
@pytest.mark.asyncio
|
| 82 |
-
async def test_no_double_termination_event(mock_magentic_requirements):
|
| 83 |
-
"""
|
| 84 |
-
Verify that we DO NOT emit a fallback event if the workflow finished normally.
|
| 85 |
-
"""
|
| 86 |
-
orchestrator = MagenticOrchestrator()
|
| 87 |
-
|
| 88 |
-
mock_workflow = MagicMock()
|
| 89 |
-
|
| 90 |
-
with patch.object(orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 91 |
-
# Mock _process_event to simulate a natural completion event
|
| 92 |
-
with patch.object(orchestrator, "_process_event") as mock_process:
|
| 93 |
-
mock_process.side_effect = [
|
| 94 |
-
AgentEvent(type="thinking", message="Working...", iteration=1),
|
| 95 |
-
AgentEvent(type="complete", message="Done!", iteration=2),
|
| 96 |
-
]
|
| 97 |
-
|
| 98 |
-
async def mock_stream_with_yields(task):
|
| 99 |
-
yield "raw_event_1"
|
| 100 |
-
yield "raw_event_2"
|
| 101 |
-
|
| 102 |
-
mock_workflow.run_stream = mock_stream_with_yields
|
| 103 |
-
|
| 104 |
-
events = []
|
| 105 |
-
async for event in orchestrator.run("Research query"):
|
| 106 |
-
events.append(event)
|
| 107 |
-
|
| 108 |
-
assert events[-1].message == "Done!"
|
| 109 |
-
assert events[-1].type == "complete"
|
| 110 |
-
|
| 111 |
-
# Verify we didn't get a SECOND "Max iterations reached" event
|
| 112 |
-
fallback_events = [e for e in events if "Max iterations reached" in e.message]
|
| 113 |
-
assert len(fallback_events) == 0
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
@pytest.mark.asyncio
|
| 117 |
-
async def test_termination_on_timeout(mock_magentic_requirements):
|
| 118 |
-
"""
|
| 119 |
-
Verify that a termination event is emitted when the workflow times out.
|
| 120 |
-
"""
|
| 121 |
-
orchestrator = MagenticOrchestrator()
|
| 122 |
-
|
| 123 |
-
mock_workflow = MagicMock()
|
| 124 |
-
|
| 125 |
-
# Simulate a stream that times out (raises TimeoutError)
|
| 126 |
-
async def mock_stream_raises(task):
|
| 127 |
-
# Yield one event before timing out
|
| 128 |
-
yield MagenticAgentMessageEvent(
|
| 129 |
-
agent_id="SearchAgent", message=MockChatMessage("Working...")
|
| 130 |
-
)
|
| 131 |
-
raise TimeoutError()
|
| 132 |
-
|
| 133 |
-
mock_workflow.run_stream = mock_stream_raises
|
| 134 |
-
|
| 135 |
-
with patch.object(orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 136 |
-
events = []
|
| 137 |
-
async for event in orchestrator.run("Research query"):
|
| 138 |
-
events.append(event)
|
| 139 |
-
|
| 140 |
-
# Check for progress/normal events
|
| 141 |
-
assert any("Working..." in e.message for e in events)
|
| 142 |
-
|
| 143 |
-
# Check for timeout completion
|
| 144 |
-
completion_events = [e for e in events if e.type == "complete"]
|
| 145 |
-
assert len(completion_events) > 0
|
| 146 |
-
last_event = completion_events[-1]
|
| 147 |
-
|
| 148 |
-
# New behavior: synthesis is attempted on timeout
|
| 149 |
-
# The message contains the report, so we check the reason code
|
| 150 |
-
# In unit tests without API keys, synthesis will fail -> "timeout_synthesis_failed"
|
| 151 |
-
assert last_event.data.get("reason") in (
|
| 152 |
-
"timeout",
|
| 153 |
-
"timeout_synthesis",
|
| 154 |
-
"timeout_synthesis_failed", # Expected in unit tests (no API key)
|
| 155 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,290 +0,0 @@
|
|
| 1 |
-
"""Unit tests for Orchestrator."""
|
| 2 |
-
|
| 3 |
-
from unittest.mock import AsyncMock, patch
|
| 4 |
-
|
| 5 |
-
import pytest
|
| 6 |
-
|
| 7 |
-
from src.orchestrators import Orchestrator
|
| 8 |
-
from src.utils.models import (
|
| 9 |
-
AgentEvent,
|
| 10 |
-
AssessmentDetails,
|
| 11 |
-
Citation,
|
| 12 |
-
Evidence,
|
| 13 |
-
JudgeAssessment,
|
| 14 |
-
OrchestratorConfig,
|
| 15 |
-
SearchResult,
|
| 16 |
-
)
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
class TestOrchestrator:
|
| 20 |
-
"""Tests for Orchestrator."""
|
| 21 |
-
|
| 22 |
-
@pytest.fixture
|
| 23 |
-
def mock_search_handler(self):
|
| 24 |
-
"""Create a mock search handler."""
|
| 25 |
-
handler = AsyncMock()
|
| 26 |
-
handler.execute = AsyncMock(
|
| 27 |
-
return_value=SearchResult(
|
| 28 |
-
query="test",
|
| 29 |
-
evidence=[
|
| 30 |
-
Evidence(
|
| 31 |
-
content="Test content",
|
| 32 |
-
citation=Citation(
|
| 33 |
-
source="pubmed",
|
| 34 |
-
title="Test Title",
|
| 35 |
-
url="https://pubmed.ncbi.nlm.nih.gov/12345/",
|
| 36 |
-
date="2024-01-01",
|
| 37 |
-
),
|
| 38 |
-
),
|
| 39 |
-
],
|
| 40 |
-
sources_searched=["pubmed"],
|
| 41 |
-
total_found=1,
|
| 42 |
-
errors=[],
|
| 43 |
-
)
|
| 44 |
-
)
|
| 45 |
-
return handler
|
| 46 |
-
|
| 47 |
-
@pytest.fixture
|
| 48 |
-
def mock_judge_sufficient(self):
|
| 49 |
-
"""Create a mock judge that returns sufficient."""
|
| 50 |
-
handler = AsyncMock()
|
| 51 |
-
handler.assess = AsyncMock(
|
| 52 |
-
return_value=JudgeAssessment(
|
| 53 |
-
details=AssessmentDetails(
|
| 54 |
-
mechanism_score=8,
|
| 55 |
-
mechanism_reasoning="Good mechanism",
|
| 56 |
-
clinical_evidence_score=7,
|
| 57 |
-
clinical_reasoning="Good clinical",
|
| 58 |
-
drug_candidates=["Drug A"],
|
| 59 |
-
key_findings=["Finding 1"],
|
| 60 |
-
),
|
| 61 |
-
sufficient=True,
|
| 62 |
-
confidence=0.85,
|
| 63 |
-
recommendation="synthesize",
|
| 64 |
-
next_search_queries=[],
|
| 65 |
-
reasoning="Evidence is sufficient",
|
| 66 |
-
)
|
| 67 |
-
)
|
| 68 |
-
return handler
|
| 69 |
-
|
| 70 |
-
@pytest.fixture
|
| 71 |
-
def mock_judge_insufficient(self):
|
| 72 |
-
"""Create a mock judge that returns insufficient."""
|
| 73 |
-
handler = AsyncMock()
|
| 74 |
-
handler.assess = AsyncMock(
|
| 75 |
-
return_value=JudgeAssessment(
|
| 76 |
-
details=AssessmentDetails(
|
| 77 |
-
mechanism_score=4,
|
| 78 |
-
mechanism_reasoning="Weak mechanism",
|
| 79 |
-
clinical_evidence_score=3,
|
| 80 |
-
clinical_reasoning="Weak clinical",
|
| 81 |
-
drug_candidates=[],
|
| 82 |
-
key_findings=[],
|
| 83 |
-
),
|
| 84 |
-
sufficient=False,
|
| 85 |
-
confidence=0.3,
|
| 86 |
-
recommendation="continue",
|
| 87 |
-
next_search_queries=["more specific query"],
|
| 88 |
-
reasoning="Need more evidence to make a decision.",
|
| 89 |
-
)
|
| 90 |
-
)
|
| 91 |
-
return handler
|
| 92 |
-
|
| 93 |
-
@pytest.mark.asyncio
|
| 94 |
-
async def test_orchestrator_completes_with_sufficient_evidence(
|
| 95 |
-
self,
|
| 96 |
-
mock_search_handler,
|
| 97 |
-
mock_judge_sufficient,
|
| 98 |
-
):
|
| 99 |
-
"""Orchestrator should complete when evidence is sufficient."""
|
| 100 |
-
config = OrchestratorConfig(max_iterations=5)
|
| 101 |
-
orchestrator = Orchestrator(
|
| 102 |
-
search_handler=mock_search_handler,
|
| 103 |
-
judge_handler=mock_judge_sufficient,
|
| 104 |
-
config=config,
|
| 105 |
-
)
|
| 106 |
-
|
| 107 |
-
events = []
|
| 108 |
-
async for event in orchestrator.run("test query"):
|
| 109 |
-
events.append(event)
|
| 110 |
-
|
| 111 |
-
# Should have started, searched, judged, and completed
|
| 112 |
-
event_types = [e.type for e in events]
|
| 113 |
-
assert "started" in event_types
|
| 114 |
-
assert "searching" in event_types
|
| 115 |
-
assert "search_complete" in event_types
|
| 116 |
-
assert "judging" in event_types
|
| 117 |
-
assert "judge_complete" in event_types
|
| 118 |
-
assert "complete" in event_types
|
| 119 |
-
|
| 120 |
-
# Should only have 1 iteration
|
| 121 |
-
complete_event = next(e for e in events if e.type == "complete")
|
| 122 |
-
assert complete_event.iteration == 1
|
| 123 |
-
|
| 124 |
-
@pytest.mark.asyncio
|
| 125 |
-
async def test_orchestrator_loops_when_insufficient(
|
| 126 |
-
self,
|
| 127 |
-
mock_search_handler,
|
| 128 |
-
mock_judge_insufficient,
|
| 129 |
-
):
|
| 130 |
-
"""Orchestrator should loop when evidence is insufficient."""
|
| 131 |
-
config = OrchestratorConfig(max_iterations=3)
|
| 132 |
-
orchestrator = Orchestrator(
|
| 133 |
-
search_handler=mock_search_handler,
|
| 134 |
-
judge_handler=mock_judge_insufficient,
|
| 135 |
-
config=config,
|
| 136 |
-
)
|
| 137 |
-
|
| 138 |
-
events = []
|
| 139 |
-
async for event in orchestrator.run("test query"):
|
| 140 |
-
events.append(event)
|
| 141 |
-
|
| 142 |
-
# Should have looping events
|
| 143 |
-
event_types = [e.type for e in events]
|
| 144 |
-
assert event_types.count("looping") >= 2 # noqa: PLR2004
|
| 145 |
-
|
| 146 |
-
# Should hit max iterations
|
| 147 |
-
complete_event = next(e for e in events if e.type == "complete")
|
| 148 |
-
assert complete_event.data.get("max_reached") is True
|
| 149 |
-
|
| 150 |
-
@pytest.mark.asyncio
|
| 151 |
-
async def test_orchestrator_respects_max_iterations(
|
| 152 |
-
self,
|
| 153 |
-
mock_search_handler,
|
| 154 |
-
mock_judge_insufficient,
|
| 155 |
-
):
|
| 156 |
-
"""Orchestrator should stop at max_iterations."""
|
| 157 |
-
config = OrchestratorConfig(max_iterations=2)
|
| 158 |
-
orchestrator = Orchestrator(
|
| 159 |
-
search_handler=mock_search_handler,
|
| 160 |
-
judge_handler=mock_judge_insufficient,
|
| 161 |
-
config=config,
|
| 162 |
-
)
|
| 163 |
-
|
| 164 |
-
events = []
|
| 165 |
-
async for event in orchestrator.run("test query"):
|
| 166 |
-
events.append(event)
|
| 167 |
-
|
| 168 |
-
# Should have exactly 2 iterations
|
| 169 |
-
max_iteration = max(e.iteration for e in events)
|
| 170 |
-
assert max_iteration == 2 # noqa: PLR2004
|
| 171 |
-
|
| 172 |
-
@pytest.mark.asyncio
|
| 173 |
-
async def test_orchestrator_handles_search_error(self):
|
| 174 |
-
"""Orchestrator should handle search errors gracefully."""
|
| 175 |
-
mock_search = AsyncMock()
|
| 176 |
-
mock_search.execute = AsyncMock(side_effect=Exception("Search failed"))
|
| 177 |
-
|
| 178 |
-
mock_judge = AsyncMock()
|
| 179 |
-
mock_judge.assess = AsyncMock(
|
| 180 |
-
return_value=JudgeAssessment(
|
| 181 |
-
details=AssessmentDetails(
|
| 182 |
-
mechanism_score=0,
|
| 183 |
-
mechanism_reasoning="Not applicable here.",
|
| 184 |
-
clinical_evidence_score=0,
|
| 185 |
-
clinical_reasoning="Not applicable here.",
|
| 186 |
-
drug_candidates=[],
|
| 187 |
-
key_findings=[],
|
| 188 |
-
),
|
| 189 |
-
sufficient=False,
|
| 190 |
-
confidence=0.0,
|
| 191 |
-
recommendation="continue",
|
| 192 |
-
next_search_queries=["retry query"],
|
| 193 |
-
reasoning="Search failed, retrying...",
|
| 194 |
-
)
|
| 195 |
-
)
|
| 196 |
-
|
| 197 |
-
config = OrchestratorConfig(max_iterations=2)
|
| 198 |
-
orchestrator = Orchestrator(
|
| 199 |
-
search_handler=mock_search,
|
| 200 |
-
judge_handler=mock_judge,
|
| 201 |
-
config=config,
|
| 202 |
-
)
|
| 203 |
-
|
| 204 |
-
events = []
|
| 205 |
-
async for event in orchestrator.run("test query"):
|
| 206 |
-
events.append(event)
|
| 207 |
-
|
| 208 |
-
# Should recover and loop despite errors
|
| 209 |
-
event_types = [e.type for e in events]
|
| 210 |
-
assert "error" not in event_types
|
| 211 |
-
assert "looping" in event_types
|
| 212 |
-
|
| 213 |
-
@pytest.mark.asyncio
|
| 214 |
-
async def test_orchestrator_deduplicates_evidence(self, mock_judge_insufficient):
|
| 215 |
-
"""Orchestrator should deduplicate evidence by URL."""
|
| 216 |
-
# Search returns same evidence each time
|
| 217 |
-
duplicate_evidence = Evidence(
|
| 218 |
-
content="Duplicate content",
|
| 219 |
-
citation=Citation(
|
| 220 |
-
source="pubmed",
|
| 221 |
-
title="Same Title",
|
| 222 |
-
url="https://pubmed.ncbi.nlm.nih.gov/12345/", # Same URL
|
| 223 |
-
date="2024-01-01",
|
| 224 |
-
),
|
| 225 |
-
)
|
| 226 |
-
|
| 227 |
-
mock_search = AsyncMock()
|
| 228 |
-
mock_search.execute = AsyncMock(
|
| 229 |
-
return_value=SearchResult(
|
| 230 |
-
query="test",
|
| 231 |
-
evidence=[duplicate_evidence],
|
| 232 |
-
sources_searched=["pubmed"],
|
| 233 |
-
total_found=1,
|
| 234 |
-
errors=[],
|
| 235 |
-
)
|
| 236 |
-
)
|
| 237 |
-
|
| 238 |
-
config = OrchestratorConfig(max_iterations=2)
|
| 239 |
-
orchestrator = Orchestrator(
|
| 240 |
-
search_handler=mock_search,
|
| 241 |
-
judge_handler=mock_judge_insufficient,
|
| 242 |
-
config=config,
|
| 243 |
-
)
|
| 244 |
-
|
| 245 |
-
# Force use of local (in-memory) embedding service for test isolation
|
| 246 |
-
# Without this, the test uses persistent LlamaIndex store which has data from previous runs
|
| 247 |
-
with patch("src.utils.service_loader.settings") as mock_settings:
|
| 248 |
-
mock_settings.has_openai_key = False
|
| 249 |
-
|
| 250 |
-
events = []
|
| 251 |
-
async for event in orchestrator.run("test query"):
|
| 252 |
-
events.append(event)
|
| 253 |
-
|
| 254 |
-
# Second search_complete should show 0 new evidence
|
| 255 |
-
search_complete_events = [e for e in events if e.type == "search_complete"]
|
| 256 |
-
assert len(search_complete_events) == 2 # noqa: PLR2004
|
| 257 |
-
|
| 258 |
-
# First iteration should have 1 new
|
| 259 |
-
assert search_complete_events[0].data["new_count"] == 1
|
| 260 |
-
|
| 261 |
-
# Second iteration should have 0 new (duplicate)
|
| 262 |
-
assert search_complete_events[1].data["new_count"] == 0
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
class TestAgentEvent:
|
| 266 |
-
"""Tests for AgentEvent."""
|
| 267 |
-
|
| 268 |
-
def test_to_markdown(self):
|
| 269 |
-
"""AgentEvent should format to markdown correctly."""
|
| 270 |
-
event = AgentEvent(
|
| 271 |
-
type="searching",
|
| 272 |
-
message="Searching for: testosterone libido",
|
| 273 |
-
iteration=1,
|
| 274 |
-
)
|
| 275 |
-
|
| 276 |
-
md = event.to_markdown()
|
| 277 |
-
assert "π" in md
|
| 278 |
-
assert "SEARCHING" in md
|
| 279 |
-
assert "testosterone libido" in md
|
| 280 |
-
|
| 281 |
-
def test_complete_event_icon(self):
|
| 282 |
-
"""Complete event should have celebration icon."""
|
| 283 |
-
event = AgentEvent(
|
| 284 |
-
type="complete",
|
| 285 |
-
message="Done!",
|
| 286 |
-
iteration=3,
|
| 287 |
-
)
|
| 288 |
-
|
| 289 |
-
md = event.to_markdown()
|
| 290 |
-
assert "π" in md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -6,7 +6,7 @@ import pytest
|
|
| 6 |
|
| 7 |
pytestmark = pytest.mark.unit
|
| 8 |
|
| 9 |
-
from src.orchestrators import
|
| 10 |
|
| 11 |
|
| 12 |
@pytest.fixture
|
|
@@ -16,7 +16,7 @@ def mock_settings():
|
|
| 16 |
|
| 17 |
|
| 18 |
@pytest.fixture
|
| 19 |
-
def
|
| 20 |
with patch("src.orchestrators.factory._get_advanced_orchestrator_class") as mock:
|
| 21 |
# The mock returns a class (callable), which returns an instance
|
| 22 |
mock_class = MagicMock()
|
|
@@ -29,37 +29,32 @@ def mock_handlers():
|
|
| 29 |
return MagicMock(), MagicMock()
|
| 30 |
|
| 31 |
|
| 32 |
-
def
|
| 33 |
-
|
|
|
|
|
|
|
| 34 |
search, judge = mock_handlers
|
|
|
|
| 35 |
orch = create_orchestrator(search_handler=search, judge_handler=judge, mode="simple")
|
| 36 |
-
assert isinstance(orch, Orchestrator)
|
| 37 |
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
def test_create_orchestrator_advanced_explicit(mock_settings, mock_handlers, mock_magentic_cls):
|
| 40 |
-
"""Test explicit advanced mode."""
|
| 41 |
-
# Ensure has_openai_key is True so it doesn't error if we add checks
|
| 42 |
-
mock_settings.has_openai_key = True
|
| 43 |
|
|
|
|
|
|
|
| 44 |
orch = create_orchestrator(mode="advanced")
|
| 45 |
# verify instantiated
|
| 46 |
-
|
| 47 |
-
assert orch ==
|
| 48 |
|
| 49 |
|
| 50 |
-
def test_create_orchestrator_auto_advanced(mock_settings,
|
| 51 |
-
"""Test auto-detect
|
| 52 |
-
|
|
|
|
| 53 |
|
| 54 |
orch = create_orchestrator()
|
| 55 |
-
|
| 56 |
-
assert orch ==
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
def test_create_orchestrator_auto_simple(mock_settings, mock_handlers):
|
| 60 |
-
"""Test auto-detect simple mode when no paid keys."""
|
| 61 |
-
mock_settings.has_openai_key = False
|
| 62 |
-
|
| 63 |
-
search, judge = mock_handlers
|
| 64 |
-
orch = create_orchestrator(search_handler=search, judge_handler=judge)
|
| 65 |
-
assert isinstance(orch, Orchestrator)
|
|
|
|
| 6 |
|
| 7 |
pytestmark = pytest.mark.unit
|
| 8 |
|
| 9 |
+
from src.orchestrators import create_orchestrator
|
| 10 |
|
| 11 |
|
| 12 |
@pytest.fixture
|
|
|
|
| 16 |
|
| 17 |
|
| 18 |
@pytest.fixture
|
| 19 |
+
def mock_advanced_cls():
|
| 20 |
with patch("src.orchestrators.factory._get_advanced_orchestrator_class") as mock:
|
| 21 |
# The mock returns a class (callable), which returns an instance
|
| 22 |
mock_class = MagicMock()
|
|
|
|
| 29 |
return MagicMock(), MagicMock()
|
| 30 |
|
| 31 |
|
| 32 |
+
def test_create_orchestrator_simple_maps_to_advanced(
|
| 33 |
+
mock_settings, mock_handlers, mock_advanced_cls
|
| 34 |
+
):
|
| 35 |
+
"""Test that 'simple' mode explicitly maps to AdvancedOrchestrator."""
|
| 36 |
search, judge = mock_handlers
|
| 37 |
+
# Pass handlers (they are ignored but shouldn't crash)
|
| 38 |
orch = create_orchestrator(search_handler=search, judge_handler=judge, mode="simple")
|
|
|
|
| 39 |
|
| 40 |
+
# Verify AdvancedOrchestrator was created
|
| 41 |
+
mock_advanced_cls.assert_called_once()
|
| 42 |
+
assert orch == mock_advanced_cls.return_value
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
def test_create_orchestrator_advanced_explicit(mock_settings, mock_handlers, mock_advanced_cls):
|
| 46 |
+
"""Test explicit advanced mode."""
|
| 47 |
orch = create_orchestrator(mode="advanced")
|
| 48 |
# verify instantiated
|
| 49 |
+
mock_advanced_cls.assert_called_once()
|
| 50 |
+
assert orch == mock_advanced_cls.return_value
|
| 51 |
|
| 52 |
|
| 53 |
+
def test_create_orchestrator_auto_advanced(mock_settings, mock_advanced_cls):
|
| 54 |
+
"""Test auto-detect defaults to Advanced (Unified)."""
|
| 55 |
+
# Even with no keys (handled by factory internally), orchestrator factory returns Advanced
|
| 56 |
+
mock_settings.has_openai_key = False # Simulate no key
|
| 57 |
|
| 58 |
orch = create_orchestrator()
|
| 59 |
+
mock_advanced_cls.assert_called_once()
|
| 60 |
+
assert orch == mock_advanced_cls.return_value
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -49,7 +49,8 @@ async def test_streaming_events_are_buffered_not_spammed():
|
|
| 49 |
try:
|
| 50 |
# Run the research agent
|
| 51 |
results = []
|
| 52 |
-
|
|
|
|
| 53 |
results.append(result)
|
| 54 |
|
| 55 |
# Verify that we DO see streaming updates (for UX responsiveness)
|
|
|
|
| 49 |
try:
|
| 50 |
# Run the research agent
|
| 51 |
results = []
|
| 52 |
+
# SPEC-16: mode parameter removed (unified architecture)
|
| 53 |
+
async for result in research_agent("test query", [], api_key=""):
|
| 54 |
results.append(result)
|
| 55 |
|
| 56 |
# Verify that we DO see streaming updates (for UX responsiveness)
|
|
@@ -1,33 +1,53 @@
|
|
|
|
|
|
|
|
| 1 |
import gradio as gr
|
|
|
|
| 2 |
|
| 3 |
from src.app import create_demo
|
| 4 |
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
def
|
| 7 |
-
"""
|
| 8 |
demo, _ = create_demo()
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
)
|
| 12 |
|
| 13 |
|
| 14 |
def test_accordion_label_updated():
|
| 15 |
-
"""Verify the accordion label reflects the new, concise text."""
|
| 16 |
_, accordion = create_demo()
|
| 17 |
-
assert accordion.label == "βοΈ
|
| 18 |
-
"Accordion label
|
| 19 |
)
|
| 20 |
|
| 21 |
|
| 22 |
-
def
|
| 23 |
-
"""
|
| 24 |
demo, _ = create_demo()
|
| 25 |
-
#
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
)
|
|
|
|
| 1 |
+
"""UI element tests for SPEC-16 Unified Architecture."""
|
| 2 |
+
|
| 3 |
import gradio as gr
|
| 4 |
+
import pytest
|
| 5 |
|
| 6 |
from src.app import create_demo
|
| 7 |
|
| 8 |
+
pytestmark = pytest.mark.unit
|
| 9 |
+
|
| 10 |
|
| 11 |
+
def test_no_mode_selector_in_ui():
|
| 12 |
+
"""SPEC-16: Mode selector removed - everyone gets Advanced Mode."""
|
| 13 |
demo, _ = create_demo()
|
| 14 |
+
# No Radio should exist in additional_inputs
|
| 15 |
+
radios = [inp for inp in demo.additional_inputs if isinstance(inp, gr.Radio)]
|
| 16 |
+
assert len(radios) == 0, "Mode Radio should not exist (SPEC-16: unified architecture)"
|
| 17 |
|
| 18 |
|
| 19 |
def test_accordion_label_updated():
|
| 20 |
+
"""Verify the accordion label reflects the new, concise text (no Mode)."""
|
| 21 |
_, accordion = create_demo()
|
| 22 |
+
assert accordion.label == "βοΈ API Key (Free tier works!)", (
|
| 23 |
+
f"Accordion label should be 'βοΈ API Key (Free tier works!)', got '{accordion.label}'"
|
| 24 |
)
|
| 25 |
|
| 26 |
|
| 27 |
+
def test_examples_have_no_mode():
|
| 28 |
+
"""SPEC-16: Examples no longer include mode parameter."""
|
| 29 |
demo, _ = create_demo()
|
| 30 |
+
# Examples now have 4 items: [question, domain, api_key, api_key_state]
|
| 31 |
+
for example in demo.examples:
|
| 32 |
+
assert len(example) == 4, (
|
| 33 |
+
f"Examples should have 4 items [question, domain, api_key, api_key_state], "
|
| 34 |
+
f"got {len(example)}: {example}"
|
| 35 |
+
)
|
| 36 |
+
# First item is the question
|
| 37 |
+
assert isinstance(example[0], str) and len(example[0]) > 10, (
|
| 38 |
+
"First example item should be the research question"
|
| 39 |
+
)
|
| 40 |
+
# Second item is domain (not mode!)
|
| 41 |
+
assert example[1] in ("sexual_health", None), (
|
| 42 |
+
f"Second example item should be domain, got: {example[1]}"
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
def test_api_key_textbox_exists():
|
| 47 |
+
"""Verify API key textbox exists in additional inputs."""
|
| 48 |
+
demo, _ = create_demo()
|
| 49 |
+
textboxes = [inp for inp in demo.additional_inputs if isinstance(inp, gr.Textbox)]
|
| 50 |
+
assert len(textboxes) == 1, "Expected exactly one API key textbox"
|
| 51 |
+
assert textboxes[0].label == "π API Key (Optional)", (
|
| 52 |
+
f"API key textbox label should be 'π API Key (Optional)', got '{textboxes[0].label}'"
|
| 53 |
)
|
|
@@ -1184,7 +1184,7 @@ requires-dist = [
|
|
| 1184 |
{ name = "duckduckgo-search", specifier = ">=5.0" },
|
| 1185 |
{ name = "gradio", extras = ["mcp"], specifier = ">=6.0.0" },
|
| 1186 |
{ name = "httpx", specifier = ">=0.27" },
|
| 1187 |
-
{ name = "huggingface-hub", specifier = ">=0.
|
| 1188 |
{ name = "langchain", specifier = ">=0.3.9,<1.0" },
|
| 1189 |
{ name = "langchain-core", specifier = ">=0.3.21,<1.0" },
|
| 1190 |
{ name = "langchain-huggingface", specifier = ">=0.1.2,<1.0" },
|
|
@@ -5524,28 +5524,28 @@ wheels = [
|
|
| 5524 |
|
| 5525 |
[[package]]
|
| 5526 |
name = "ruff"
|
| 5527 |
-
version = "0.14.
|
| 5528 |
-
source = { registry = "https://pypi.org/simple" }
|
| 5529 |
-
sdist = { url = "https://files.pythonhosted.org/packages/
|
| 5530 |
-
wheels = [
|
| 5531 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5532 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5533 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5534 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5535 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5536 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5537 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5538 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5539 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5540 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5541 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5542 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5543 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5544 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5545 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5546 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5547 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5548 |
-
{ url = "https://files.pythonhosted.org/packages/
|
| 5549 |
]
|
| 5550 |
|
| 5551 |
[[package]]
|
|
|
|
| 1184 |
{ name = "duckduckgo-search", specifier = ">=5.0" },
|
| 1185 |
{ name = "gradio", extras = ["mcp"], specifier = ">=6.0.0" },
|
| 1186 |
{ name = "httpx", specifier = ">=0.27" },
|
| 1187 |
+
{ name = "huggingface-hub", specifier = ">=0.24.0" },
|
| 1188 |
{ name = "langchain", specifier = ">=0.3.9,<1.0" },
|
| 1189 |
{ name = "langchain-core", specifier = ">=0.3.21,<1.0" },
|
| 1190 |
{ name = "langchain-huggingface", specifier = ">=0.1.2,<1.0" },
|
|
|
|
| 5524 |
|
| 5525 |
[[package]]
|
| 5526 |
name = "ruff"
|
| 5527 |
+
version = "0.14.7"
|
| 5528 |
+
source = { registry = "https://pypi.org/simple" }
|
| 5529 |
+
sdist = { url = "https://files.pythonhosted.org/packages/b7/5b/dd7406afa6c95e3d8fa9d652b6d6dd17dd4a6bf63cb477014e8ccd3dcd46/ruff-0.14.7.tar.gz", hash = "sha256:3417deb75d23bd14a722b57b0a1435561db65f0ad97435b4cf9f85ffcef34ae5", size = 5727324 }
|
| 5530 |
+
wheels = [
|
| 5531 |
+
{ url = "https://files.pythonhosted.org/packages/8c/b1/7ea5647aaf90106f6d102230e5df874613da43d1089864da1553b899ba5e/ruff-0.14.7-py3-none-linux_armv6l.whl", hash = "sha256:b9d5cb5a176c7236892ad7224bc1e63902e4842c460a0b5210701b13e3de4fca", size = 13414475 },
|
| 5532 |
+
{ url = "https://files.pythonhosted.org/packages/af/19/fddb4cd532299db9cdaf0efdc20f5c573ce9952a11cb532d3b859d6d9871/ruff-0.14.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:3f64fe375aefaf36ca7d7250292141e39b4cea8250427482ae779a2aa5d90015", size = 13634613 },
|
| 5533 |
+
{ url = "https://files.pythonhosted.org/packages/40/2b/469a66e821d4f3de0440676ed3e04b8e2a1dc7575cf6fa3ba6d55e3c8557/ruff-0.14.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:93e83bd3a9e1a3bda64cb771c0d47cda0e0d148165013ae2d3554d718632d554", size = 12765458 },
|
| 5534 |
+
{ url = "https://files.pythonhosted.org/packages/f1/05/0b001f734fe550bcfde4ce845948ac620ff908ab7241a39a1b39bb3c5f49/ruff-0.14.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3838948e3facc59a6070795de2ae16e5786861850f78d5914a03f12659e88f94", size = 13236412 },
|
| 5535 |
+
{ url = "https://files.pythonhosted.org/packages/11/36/8ed15d243f011b4e5da75cd56d6131c6766f55334d14ba31cce5461f28aa/ruff-0.14.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:24c8487194d38b6d71cd0fd17a5b6715cda29f59baca1defe1e3a03240f851d1", size = 13182949 },
|
| 5536 |
+
{ url = "https://files.pythonhosted.org/packages/3b/cf/fcb0b5a195455729834f2a6eadfe2e4519d8ca08c74f6d2b564a4f18f553/ruff-0.14.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:79c73db6833f058a4be8ffe4a0913b6d4ad41f6324745179bd2aa09275b01d0b", size = 13816470 },
|
| 5537 |
+
{ url = "https://files.pythonhosted.org/packages/7f/5d/34a4748577ff7a5ed2f2471456740f02e86d1568a18c9faccfc73bd9ca3f/ruff-0.14.7-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:12eb7014fccff10fc62d15c79d8a6be4d0c2d60fe3f8e4d169a0d2def75f5dad", size = 15289621 },
|
| 5538 |
+
{ url = "https://files.pythonhosted.org/packages/53/53/0a9385f047a858ba133d96f3f8e3c9c66a31cc7c4b445368ef88ebeac209/ruff-0.14.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6c623bbdc902de7ff715a93fa3bb377a4e42dd696937bf95669118773dbf0c50", size = 14975817 },
|
| 5539 |
+
{ url = "https://files.pythonhosted.org/packages/a8/d7/2f1c32af54c3b46e7fadbf8006d8b9bcfbea535c316b0bd8813d6fb25e5d/ruff-0.14.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f53accc02ed2d200fa621593cdb3c1ae06aa9b2c3cae70bc96f72f0000ae97a9", size = 14284549 },
|
| 5540 |
+
{ url = "https://files.pythonhosted.org/packages/92/05/434ddd86becd64629c25fb6b4ce7637dd52a45cc4a4415a3008fe61c27b9/ruff-0.14.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:281f0e61a23fcdcffca210591f0f53aafaa15f9025b5b3f9706879aaa8683bc4", size = 14071389 },
|
| 5541 |
+
{ url = "https://files.pythonhosted.org/packages/ff/50/fdf89d4d80f7f9d4f420d26089a79b3bb1538fe44586b148451bc2ba8d9c/ruff-0.14.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:dbbaa5e14148965b91cb090236931182ee522a5fac9bc5575bafc5c07b9f9682", size = 14202679 },
|
| 5542 |
+
{ url = "https://files.pythonhosted.org/packages/77/54/87b34988984555425ce967f08a36df0ebd339bb5d9d0e92a47e41151eafc/ruff-0.14.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:1464b6e54880c0fe2f2d6eaefb6db15373331414eddf89d6b903767ae2458143", size = 13147677 },
|
| 5543 |
+
{ url = "https://files.pythonhosted.org/packages/67/29/f55e4d44edfe053918a16a3299e758e1c18eef216b7a7092550d7a9ec51c/ruff-0.14.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:f217ed871e4621ea6128460df57b19ce0580606c23aeab50f5de425d05226784", size = 13151392 },
|
| 5544 |
+
{ url = "https://files.pythonhosted.org/packages/36/69/47aae6dbd4f1d9b4f7085f4d9dcc84e04561ee7ad067bf52e0f9b02e3209/ruff-0.14.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:6be02e849440ed3602d2eb478ff7ff07d53e3758f7948a2a598829660988619e", size = 13412230 },
|
| 5545 |
+
{ url = "https://files.pythonhosted.org/packages/b7/4b/6e96cb6ba297f2ba502a231cd732ed7c3de98b1a896671b932a5eefa3804/ruff-0.14.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:19a0f116ee5e2b468dfe80c41c84e2bbd6b74f7b719bee86c2ecde0a34563bcc", size = 14195397 },
|
| 5546 |
+
{ url = "https://files.pythonhosted.org/packages/69/82/251d5f1aa4dcad30aed491b4657cecd9fb4274214da6960ffec144c260f7/ruff-0.14.7-py3-none-win32.whl", hash = "sha256:e33052c9199b347c8937937163b9b149ef6ab2e4bb37b042e593da2e6f6cccfa", size = 13126751 },
|
| 5547 |
+
{ url = "https://files.pythonhosted.org/packages/a8/b5/d0b7d145963136b564806f6584647af45ab98946660d399ec4da79cae036/ruff-0.14.7-py3-none-win_amd64.whl", hash = "sha256:e17a20ad0d3fad47a326d773a042b924d3ac31c6ca6deb6c72e9e6b5f661a7c6", size = 14531726 },
|
| 5548 |
+
{ url = "https://files.pythonhosted.org/packages/1d/d2/1637f4360ada6a368d3265bf39f2cf737a0aaab15ab520fc005903e883f8/ruff-0.14.7-py3-none-win_arm64.whl", hash = "sha256:be4d653d3bea1b19742fcc6502354e32f65cd61ff2fbdb365803ef2c2aec6228", size = 13609215 },
|
| 5549 |
]
|
| 5550 |
|
| 5551 |
[[package]]
|