Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

File size: 5,286 Bytes

# P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect)

**Status**: Open
**Priority**: P1 - Major UX degradation
**Affects**: Simple mode, all deployments
**Root Cause**: LLM synthesis silently failing → template fallback
**Related**: SPEC_12 (implemented but not functioning)

---

## Problem Statement

SPEC_12 implemented LLM-based narrative synthesis, but users still see **template-formatted bullet points** instead of **prose paragraphs**:

### What Users See (Template Fallback)

```markdown
## Sexual Health Analysis

### Question
what medication for the best boners?

### Drug Candidates
- **tadalafil**
- **sildenafil**

### Key Findings
- Tadalafil improves erectile function

### Assessment
- **Mechanism Score**: 4/10
- **Clinical Evidence Score**: 6/10
```

### What They Should See (LLM Synthesis)

```markdown
### Executive Summary

Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
with strong evidence from multiple RCTs demonstrating improved erectile function...

### Background

Erectile dysfunction (ED) is a common male sexual health disorder...

### Evidence Synthesis

**Mechanism of Action**
Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)...
```

---

## Root Cause Analysis

### Location: `src/orchestrators/simple.py:555-564`

```python
try:
    agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
    result = await agent.run(user_prompt)
    narrative = result.output
except Exception as e:  # ← SILENT FALLBACK
    logger.warning("LLM synthesis failed, using template fallback", error=str(e))
    return self._generate_template_synthesis(query, evidence, assessment)
```

**The Problem**: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed.

### Why Synthesis Fails

| Cause | Symptom | Frequency |
|-------|---------|-----------|
| No API key in deployment | HuggingFace Spaces | HIGH |
| API rate limiting | Heavy usage | MEDIUM |
| Token overflow | Long evidence lists | MEDIUM |
| Model mismatch | Wrong model ID | LOW |
| Network timeout | Slow connections | LOW |

---

## Evidence: LLM Synthesis WORKS When Configured

Local test with API key:
```python
# This works perfectly:
agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
result = await agent.run(user_prompt)
print(result.output)  # → Beautiful narrative prose!
```

Output:
```
### Executive Summary

Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
with one study (Smith, 2020; N=100) reporting improved erectile function...
```

---

## Impact

| Metric | Current | Expected |
|--------|---------|----------|
| Report quality | 3/10 (metadata dump) | 9/10 (professional prose) |
| User satisfaction | Low | High |
| Clinical utility | Limited | High |

The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose.

---

## Fix Options

### Option A: Surface Error to User (RECOMMENDED)

When LLM synthesis fails, don't silently fall back. Show the user what went wrong:

```python
except Exception as e:
    logger.error("LLM synthesis failed", error=str(e), exc_info=True)

    # Show error in report instead of silent fallback
    error_note = f"""
⚠️ **Note**: AI narrative synthesis unavailable.
Showing structured summary instead.

_Technical: {type(e).__name__}: {str(e)[:100]}_
"""
    template = self._generate_template_synthesis(query, evidence, assessment)
    return f"{error_note}\n\n{template}"
```

### Option B: HuggingFace Secrets Configuration

For HuggingFace Spaces deployment, add secrets:
- `OPENAI_API_KEY` → Required for synthesis
- `ANTHROPIC_API_KEY` → Alternative provider

### Option C: Graceful Degradation with Explanation

Add a banner explaining synthesis status:
- ✅ "AI-synthesized narrative report" (when LLM works)
- ⚠️ "Structured summary (AI synthesis unavailable)" (fallback)

---

## Diagnostic Steps

To determine why synthesis is failing in production:

1. **Review logs** for warning: `"LLM synthesis failed, using template fallback"`
2. **Verify API key**: Is `OPENAI_API_KEY` set in environment?
3. **Confirm model access**: Is `gpt-5` accessible with current API tier?
4. **Inspect rate limits**: Is the account quota exhausted?

---

## Acceptance Criteria

- [ ] Users see narrative prose reports (not bullet points) when API key is configured
- [ ] When synthesis fails, user sees clear indication (not silent fallback)
- [ ] HuggingFace Spaces deployment has proper secrets configured
- [ ] Logging captures the specific exception for debugging

---

## Files to Modify

| File | Change |
|------|--------|
| `src/orchestrators/simple.py:555-580` | Add error surfacing in fallback |
| `src/app.py` | Add synthesis status indicator to UI |
| HuggingFace Spaces Settings | Add `OPENAI_API_KEY` secret |

---

## Test Plan

1. Run locally with API key → Should get narrative prose
2. Run locally WITHOUT API key → Should get template WITH error message
3. Deploy to HuggingFace with secrets → Should get narrative prose
4. Deploy to HuggingFace WITHOUT secrets → Should get template WITH warning