File size: 5,286 Bytes
1d32642 d73ddfe 1d32642 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect)
**Status**: Open
**Priority**: P1 - Major UX degradation
**Affects**: Simple mode, all deployments
**Root Cause**: LLM synthesis silently failing β template fallback
**Related**: SPEC_12 (implemented but not functioning)
---
## Problem Statement
SPEC_12 implemented LLM-based narrative synthesis, but users still see **template-formatted bullet points** instead of **prose paragraphs**:
### What Users See (Template Fallback)
```markdown
## Sexual Health Analysis
### Question
what medication for the best boners?
### Drug Candidates
- **tadalafil**
- **sildenafil**
### Key Findings
- Tadalafil improves erectile function
### Assessment
- **Mechanism Score**: 4/10
- **Clinical Evidence Score**: 6/10
```
### What They Should See (LLM Synthesis)
```markdown
### Executive Summary
Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
with strong evidence from multiple RCTs demonstrating improved erectile function...
### Background
Erectile dysfunction (ED) is a common male sexual health disorder...
### Evidence Synthesis
**Mechanism of Action**
Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)...
```
---
## Root Cause Analysis
### Location: `src/orchestrators/simple.py:555-564`
```python
try:
agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
result = await agent.run(user_prompt)
narrative = result.output
except Exception as e: # β SILENT FALLBACK
logger.warning("LLM synthesis failed, using template fallback", error=str(e))
return self._generate_template_synthesis(query, evidence, assessment)
```
**The Problem**: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed.
### Why Synthesis Fails
| Cause | Symptom | Frequency |
|-------|---------|-----------|
| No API key in deployment | HuggingFace Spaces | HIGH |
| API rate limiting | Heavy usage | MEDIUM |
| Token overflow | Long evidence lists | MEDIUM |
| Model mismatch | Wrong model ID | LOW |
| Network timeout | Slow connections | LOW |
---
## Evidence: LLM Synthesis WORKS When Configured
Local test with API key:
```python
# This works perfectly:
agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
result = await agent.run(user_prompt)
print(result.output) # β Beautiful narrative prose!
```
Output:
```
### Executive Summary
Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
with one study (Smith, 2020; N=100) reporting improved erectile function...
```
---
## Impact
| Metric | Current | Expected |
|--------|---------|----------|
| Report quality | 3/10 (metadata dump) | 9/10 (professional prose) |
| User satisfaction | Low | High |
| Clinical utility | Limited | High |
The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose.
---
## Fix Options
### Option A: Surface Error to User (RECOMMENDED)
When LLM synthesis fails, don't silently fall back. Show the user what went wrong:
```python
except Exception as e:
logger.error("LLM synthesis failed", error=str(e), exc_info=True)
# Show error in report instead of silent fallback
error_note = f"""
β οΈ **Note**: AI narrative synthesis unavailable.
Showing structured summary instead.
_Technical: {type(e).__name__}: {str(e)[:100]}_
"""
template = self._generate_template_synthesis(query, evidence, assessment)
return f"{error_note}\n\n{template}"
```
### Option B: HuggingFace Secrets Configuration
For HuggingFace Spaces deployment, add secrets:
- `OPENAI_API_KEY` β Required for synthesis
- `ANTHROPIC_API_KEY` β Alternative provider
### Option C: Graceful Degradation with Explanation
Add a banner explaining synthesis status:
- β
"AI-synthesized narrative report" (when LLM works)
- β οΈ "Structured summary (AI synthesis unavailable)" (fallback)
---
## Diagnostic Steps
To determine why synthesis is failing in production:
1. **Review logs** for warning: `"LLM synthesis failed, using template fallback"`
2. **Verify API key**: Is `OPENAI_API_KEY` set in environment?
3. **Confirm model access**: Is `gpt-5` accessible with current API tier?
4. **Inspect rate limits**: Is the account quota exhausted?
---
## Acceptance Criteria
- [ ] Users see narrative prose reports (not bullet points) when API key is configured
- [ ] When synthesis fails, user sees clear indication (not silent fallback)
- [ ] HuggingFace Spaces deployment has proper secrets configured
- [ ] Logging captures the specific exception for debugging
---
## Files to Modify
| File | Change |
|------|--------|
| `src/orchestrators/simple.py:555-580` | Add error surfacing in fallback |
| `src/app.py` | Add synthesis status indicator to UI |
| HuggingFace Spaces Settings | Add `OPENAI_API_KEY` secret |
---
## Test Plan
1. Run locally with API key β Should get narrative prose
2. Run locally WITHOUT API key β Should get template WITH error message
3. Deploy to HuggingFace with secrets β Should get narrative prose
4. Deploy to HuggingFace WITHOUT secrets β Should get template WITH warning
|