# Phase 2: Parallel Subagents via Prompt Engineering

**Created**: 2025-10-04
**Status**: ✅ READY TO EXECUTE
**Timeline**: 1-2 days
**Expected Impact**: 3-4x speed improvement (8min → 2-3min)

---

## Executive Summary

**Discovery**: Markdown agents in `.claude/agents/` ARE subagents that can run in parallel.

**No code changes needed** - SDK already supports parallel execution. We just need to update command prompts to request parallel coordination.

---

## 💡 Key SDK Insight

From [building-agents.md](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk):

> "Claude Agent SDK supports subagents by default. Subagents enable parallelization: you can spin up multiple subagents to work on different tasks simultaneously. Second, they help manage context: subagents use their own isolated context windows, and only send relevant information back to the orchestrator."

### What This Means

1. **Markdown agents = Subagents** (automatic)
2. **Parallel execution = Built-in** (SDK native)
3. **Context isolation = Automatic** (each agent has own window)
4. **No configuration needed** (just prompt correctly)

---

## Implementation Plan

### Step 1: Update Command Prompts (30 min)

**File**: `.claude/commands/invest/research-stock.md`

**Current prompt** (sequential):
```markdown
Research {symbol} stock comprehensively using available tools.
Analyze fundamentals, technicals, and news sentiment.
```

**New prompt** (parallel coordination):
```markdown
# Stock Research: {symbol}

Coordinate these specialist SUBAGENTS to work IN PARALLEL:

## 1. fundamental-analyst
- Tools: company-research MCP server
- Task: Analyze company profile, financials, filings, analyst ratings
- Output: Fundamental analysis section

## 2. technical-analyst
- Tools: stock-analyzer MCP server
- Task: Analyze price action, technical indicators, moving averages
- Output: Technical analysis section

## 3. news-analyst
- Tools: news-analyzer MCP server
- Task: Analyze recent news, sentiment, market narrative
- Output: News & sentiment section

## Execution Instructions

**CRITICAL**:
1. Launch ALL THREE subagents SIMULTANEOUSLY (not sequential)
2. Each subagent works independently with isolated context
3. Wait for ALL subagents to complete
4. Synthesize their findings into comprehensive report
5. Use cache hooks to eliminate duplicate API calls

**Expected Time**: 2-3 minutes total (parallel execution)

## Report Structure

1. Executive Summary (synthesized from all agents)
2. Fundamental Analysis (from fundamental-analyst)
3. Technical Analysis (from technical-analyst)
4. News & Sentiment (from news-analyst)
5. Investment Thesis (synthesized conclusion)
```

### Step 2: Test Parallel Execution (1 hour)

**Test Commands**:
```bash
# Test 1: Fresh run (no cache)
navam chat --prompt "/invest:research-stock AAPL"

# Test 2: Cached run (should be faster)
navam chat --prompt "/invest:research-stock AAPL"

# Test 3: Different stock
navam chat --prompt "/invest:research-stock NVDA"
```

**What to Monitor**:
- Total execution time (target: 2-3 min)
- Cache hit rate (target: 70% on 2nd run)
- Evidence of parallel agent execution in logs
- Report quality and completeness

### Step 3: Measure & Validate (30 min)

**Metrics to Track**:
```python
# From /perf command
{
    "workflow_time": "2m 45s",         # vs 8min baseline
    "cache_hit_rate": "72%",           # maintained from Phase 1
    "agents_used": 3,                  # vs 1 main agent
    "total_cost": "$0.28",             # vs $0.40 current
    "parallel_execution": True          # NEW metric to add
}
```

---

## Expected Behavior

### How SDK Orchestrates Parallel Subagents

```
User → /invest:research-stock AAPL
  ↓
Main Agent receives prompt with parallel instructions
  ↓
SDK launches 3 subagents SIMULTANEOUSLY:
  ├─ fundamental-analyst → get_company_profile, get_financials, etc.
  ├─ technical-analyst → analyze_stock, get_moving_averages, etc.
  └─ news-analyst → get_company_news, analyze_sentiment, etc.
  ↓
Each subagent:
  - Has isolated 200k context window
  - Calls tools independently
  - Benefits from cache hooks (70% hit rate)
  - Works in parallel with others
  ↓
All 3 complete ~simultaneously (2-3 min)
  ↓
Results sent back to main agent (context-compressed)
  ↓
Main agent synthesizes comprehensive report
  ↓
User receives final report
```

### Context Management

**Before (Sequential)**:
- Main agent: 200k tokens
- All tool calls in same context
- Context grows large, slow
- Total time: 8+ min

**After (Parallel Subagents)**:
- Main agent: 200k tokens
- fundamental-analyst: 200k tokens (isolated)
- technical-analyst: 200k tokens (isolated)
- news-analyst: 200k tokens (isolated)
- Only synthesis returned to main (compressed)
- Total time: 2-3 min

**Context Efficiency Gain**: ~3-4x more effective use of context windows

---

## Success Criteria

### Minimum (Must Have)
- ✅ Workflow completes successfully
- ✅ All 3 analysis sections present in report
- ✅ Cache hit rate maintained at ~70%
- ✅ No regression in report quality
- ✅ Execution time <4 minutes

### Target (Should Have)
- ⭐ Execution time 2-3 minutes (3-4x speedup)
- ⭐ Evidence of parallel agent execution
- ⭐ Cache working across subagents
- ⭐ Cost reduction to ~$0.30 per query

### Stretch (Nice to Have)
- 🚀 Per-subagent timing visible to user
- 🚀 Progressive results as each agent completes
- 🚀 Execution time <2 minutes with full cache

---

## Risk Assessment

### Risk 1: SDK May Not Actually Parallelize
**Likelihood**: Low (SDK docs explicitly mention parallel execution)
**Impact**: High (no speedup achieved)
**Mitigation**:
- Test with timing logs to verify
- If sequential, file issue with Anthropic
- Fallback: Keep current behavior, no regression

### Risk 2: Prompt Engineering May Not Trigger Parallelization
**Likelihood**: Medium (depends on Claude's interpretation)
**Impact**: Medium (sequential execution, moderate speedup from cache)
**Mitigation**:
- Try multiple prompt variations
- Explicitly request "PARALLEL", "SIMULTANEOUS"
- Document what works in active.md

### Risk 3: Context Isolation May Cause Incomplete Analysis
**Likelihood**: Low (SDK designed for this)
**Impact**: Medium (quality regression)
**Mitigation**:
- Compare reports before/after
- Ensure synthesis step has all info
- Adjust if needed

---

## Testing Checklist

- [ ] Update research-stock.md with parallel prompt
- [ ] Test with AAPL (fresh, no cache)
- [ ] Verify 3 subagents mentioned in logs/output
- [ ] Measure execution time (<3 min target)
- [ ] Test with AAPL (cached, should be faster)
- [ ] Verify cache hit rate (~70%)
- [ ] Test with NVDA (different stock)
- [ ] Compare report quality vs baseline
- [ ] Document findings in active.md
- [ ] Update /perf command if needed

---

## Files to Modify

### 1. `.claude/commands/invest/research-stock.md`
**Change**: Add parallel subagent coordination instructions
**Impact**: Primary workflow for investment research

### 2. `.claude/commands/invest/compare-stocks.md` (Optional)
**Change**: Similar parallel coordination for comparison
**Impact**: Stock comparison workflow

### 3. `src/navam/chat.py` (Optional - monitoring only)
**Change**: Add logging to detect subagent launches
**Impact**: Better visibility into parallel execution

---

## Monitoring & Validation

### What to Look For

**In Logs**:
```
[SDK] Launching subagent: fundamental-analyst
[SDK] Launching subagent: technical-analyst
[SDK] Launching subagent: news-analyst
[Cache] Hit: mcp__company-research__get_company_profile
[Cache] Miss: mcp__stock-analyzer__analyze_stock
[SDK] Subagent fundamental-analyst complete (1m 23s)
[SDK] Subagent technical-analyst complete (1m 18s)
[SDK] Subagent news-analyst complete (1m 31s)
[SDK] Synthesizing results...
```

**In /perf output**:
```
Workflow Performance:
- Total time: 2m 45s (vs 8m baseline) ✅
- Cache hits: 14/20 (70%) ✅
- Subagents used: 3 (fundamental, technical, news) ✅
- Parallel execution: Yes ✅
- Cost: $0.28 (vs $0.40 current) ✅
```

---

## Next Steps After Phase 2

### If Successful (Parallel Execution Works)
1. Update all invest commands with parallel coordination
2. Add per-subagent timing to /perf command
3. Release v1.6.0 with parallel subagents
4. Move to Phase 3: Enhanced cost tracking

### If Partial Success (Sequential but Faster)
1. Document what works, what doesn't
2. File SDK issue if needed
3. Keep improvements from better prompting
4. Release v1.5.12 with enhanced prompts

### If Unsuccessful (No Improvement)
1. Document findings thoroughly
2. Keep v1.5.11 as stable
3. Research alternative approaches
4. Consider reaching out to Anthropic support

---

## Reference Documents

### Primary Sources
- **SDK Docs**: `artifacts/refer/claude-agent-sdk/building-agents.md` (lines 53-56)
- **Python API**: `artifacts/refer/claude-agent-sdk/PYTHON-SDK-API-REFERENCE.md`
- **Active Backlog**: `artifacts/backlog/active.md`

### Historical Context
- **Failed Attempts**: `artifacts/backlog/archive-003-phase2-failed-attempts.md`
- **Research Questions**: `artifacts/backlog/archive-004-phase2-research-questions.md`
- **Phase 1 Success**: `artifacts/backlog/archive-002.md`

---

## Key Takeaways

### What We Learned

1. **Programmatic agents parameter is broken** in Python SDK
2. **Markdown file agents work perfectly** and ARE subagents
3. **SDK already supports parallelization** - we just need to use it
4. **Prompt engineering > Code configuration** for this use case
5. **No code changes needed** - just better prompts

### Strategic Insight

> "We don't need to fix the SDK - we just need to use it correctly"

The SDK provides exactly what we need through markdown file subagents. We were overthinking the solution by trying to use programmatic configuration.

---

**Status**: Ready to execute ✅
**Next Action**: Update research-stock.md command prompt
**Expected Completion**: 2025-10-05
