File size: 5,286 Bytes
1d32642
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d73ddfe
 
 
 
1d32642
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect)

**Status**: Open
**Priority**: P1 - Major UX degradation
**Affects**: Simple mode, all deployments
**Root Cause**: LLM synthesis silently failing β†’ template fallback
**Related**: SPEC_12 (implemented but not functioning)

---

## Problem Statement

SPEC_12 implemented LLM-based narrative synthesis, but users still see **template-formatted bullet points** instead of **prose paragraphs**:

### What Users See (Template Fallback)

```markdown
## Sexual Health Analysis

### Question
what medication for the best boners?

### Drug Candidates
- **tadalafil**
- **sildenafil**

### Key Findings
- Tadalafil improves erectile function

### Assessment
- **Mechanism Score**: 4/10
- **Clinical Evidence Score**: 6/10
```

### What They Should See (LLM Synthesis)

```markdown
### Executive Summary

Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
with strong evidence from multiple RCTs demonstrating improved erectile function...

### Background

Erectile dysfunction (ED) is a common male sexual health disorder...

### Evidence Synthesis

**Mechanism of Action**
Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)...
```

---

## Root Cause Analysis

### Location: `src/orchestrators/simple.py:555-564`

```python
try:
    agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
    result = await agent.run(user_prompt)
    narrative = result.output
except Exception as e:  # ← SILENT FALLBACK
    logger.warning("LLM synthesis failed, using template fallback", error=str(e))
    return self._generate_template_synthesis(query, evidence, assessment)
```

**The Problem**: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed.

### Why Synthesis Fails

| Cause | Symptom | Frequency |
|-------|---------|-----------|
| No API key in deployment | HuggingFace Spaces | HIGH |
| API rate limiting | Heavy usage | MEDIUM |
| Token overflow | Long evidence lists | MEDIUM |
| Model mismatch | Wrong model ID | LOW |
| Network timeout | Slow connections | LOW |

---

## Evidence: LLM Synthesis WORKS When Configured

Local test with API key:
```python
# This works perfectly:
agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
result = await agent.run(user_prompt)
print(result.output)  # β†’ Beautiful narrative prose!
```

Output:
```
### Executive Summary

Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
with one study (Smith, 2020; N=100) reporting improved erectile function...
```

---

## Impact

| Metric | Current | Expected |
|--------|---------|----------|
| Report quality | 3/10 (metadata dump) | 9/10 (professional prose) |
| User satisfaction | Low | High |
| Clinical utility | Limited | High |

The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose.

---

## Fix Options

### Option A: Surface Error to User (RECOMMENDED)

When LLM synthesis fails, don't silently fall back. Show the user what went wrong:

```python
except Exception as e:
    logger.error("LLM synthesis failed", error=str(e), exc_info=True)

    # Show error in report instead of silent fallback
    error_note = f"""
⚠️ **Note**: AI narrative synthesis unavailable.
Showing structured summary instead.

_Technical: {type(e).__name__}: {str(e)[:100]}_
"""
    template = self._generate_template_synthesis(query, evidence, assessment)
    return f"{error_note}\n\n{template}"
```

### Option B: HuggingFace Secrets Configuration

For HuggingFace Spaces deployment, add secrets:
- `OPENAI_API_KEY` β†’ Required for synthesis
- `ANTHROPIC_API_KEY` β†’ Alternative provider

### Option C: Graceful Degradation with Explanation

Add a banner explaining synthesis status:
- βœ… "AI-synthesized narrative report" (when LLM works)
- ⚠️ "Structured summary (AI synthesis unavailable)" (fallback)

---

## Diagnostic Steps

To determine why synthesis is failing in production:

1. **Review logs** for warning: `"LLM synthesis failed, using template fallback"`
2. **Verify API key**: Is `OPENAI_API_KEY` set in environment?
3. **Confirm model access**: Is `gpt-5` accessible with current API tier?
4. **Inspect rate limits**: Is the account quota exhausted?

---

## Acceptance Criteria

- [ ] Users see narrative prose reports (not bullet points) when API key is configured
- [ ] When synthesis fails, user sees clear indication (not silent fallback)
- [ ] HuggingFace Spaces deployment has proper secrets configured
- [ ] Logging captures the specific exception for debugging

---

## Files to Modify

| File | Change |
|------|--------|
| `src/orchestrators/simple.py:555-580` | Add error surfacing in fallback |
| `src/app.py` | Add synthesis status indicator to UI |
| HuggingFace Spaces Settings | Add `OPENAI_API_KEY` secret |

---

## Test Plan

1. Run locally with API key β†’ Should get narrative prose
2. Run locally WITHOUT API key β†’ Should get template WITH error message
3. Deploy to HuggingFace with secrets β†’ Should get narrative prose
4. Deploy to HuggingFace WITHOUT secrets β†’ Should get template WITH warning