# Agent Selection Fix Plan (v1.7.2)

**Date**: 2025-10-04
**Severity**: 🔴 CRITICAL
**Status**: ✅ FIX COMPLETE - Ready for Testing

---

## 🔍 Root Cause Analysis

### Problem
Claude is using "general-purpose" agent for ALL Task tool invocations instead of specialized agents (ledger-performance-analyst, risk-shield-manager, etc.).

### Impact
- **No streaming reports** (requires specialized agent names)
- **4-5x slower execution** (13min vs 2-3min)
- **4-5x higher cost** ($1.77 vs $0.30-0.40)
- **Missing domain expertise** (generic responses vs specialized analysis)

### Root Cause
**Command prompts don't explicitly show Claude the Task tool parameter format!**

Our commands say things like:
```markdown
**Agent 1: ledger-performance-analyst** (Performance Analysis)
- Task: "Analyze portfolio performance..."
```

But they DON'T show Claude that this should be invoked as:
```python
Task(subagent_type="ledger-performance-analyst",
     description="Portfolio performance analysis",
     prompt="Analyze portfolio performance...")
```

Claude interprets "use the ledger-performance-analyst agent" as **guidance for what to do**, not as **how to invoke the Task tool**. So it uses the default "general-purpose" agent and does the work itself.

---

## ✅ Verification: Agent Discovery Works

Agent files are correctly present and formatted:
- ✅ 18 agents in `src/navam/.claude/agents/`
- ✅ Correct YAML frontmatter with `name:` field
- ✅ `setting_sources=["project"]` configured
- ✅ `add_dirs=[Path(__file__).parent]` configured
- ✅ Agents discoverable by SDK

**The SDK agent discovery mechanism is working correctly.** The issue is purely in how commands instruct Claude to invoke agents.

---

## 🔧 Solution

Update ALL investment command prompts with **explicit Task tool invocation examples** showing the exact parameter format.

### Before (Vague):
```markdown
**Agent 1: ledger-performance-analyst** (Performance Analysis)
- Task: "Analyze portfolio performance..."
```

### After (Explicit):
```markdown
**Agent 1: ledger-performance-analyst** (Performance Analysis)
- Use Task tool: `Task(subagent_type="ledger-performance-analyst", description="Portfolio performance analysis", prompt="Analyze portfolio performance with comprehensive return analysis, benchmark comparison, attribution analysis, risk-adjusted metrics, and identify top/bottom performers across different time periods.")`
```

---

## 📋 Implementation Checklist

### Phase 1: Immediate Fix (v1.7.2-alpha) ✅ COMPLETE
- [x] **review-portfolio.md** - Fixed (6 agents: 4 parallel + 2 sequential)
- [x] **research-stock.md** - Fixed (3 parallel agents)
- [x] **plan-goals.md** - Fixed (3 parallel agents)
- [x] **optimize-taxes.md** - Fixed (6 agents: 3 core + 3 execution)
- [x] **monitor-holdings.md** - Fixed (6 agents: 3 core + 3 secondary)
- [x] **execute-rebalance.md** - Fixed (6 agents: 3 core + 3 execution)
- [x] **screen-opportunities.md** - Fixed (4 agents: 1 screening + 3 parallel analysis)

### Phase 2: Sync and Test (v1.7.2-beta)
- [x] Sync all updated commands to package (`uv run python src/navam/sync.py`) ✅ COMPLETE
- [ ] Test `/invest:review-portfolio user/portfolio.csv`
  - Verify specialized agent names appear (not "general-purpose")
  - Verify streaming reports appear
  - Verify execution time ~2-3 minutes
  - Verify cost ~$0.30-0.40
- [ ] Test `/invest:research-stock AAPL`
- [ ] Test `/invest:plan-goals`

### Phase 3: Release (v1.7.2)
- [ ] Update version to 1.7.2
- [ ] Update CHANGELOG.md
- [ ] Build and test package
- [ ] Publish to PyPI
- [ ] Create git tag v1.7.2
- [ ] Update backlog/active.md with resolution

---

## 📝 Command Update Template

For each command with parallel agents, use this template:

```markdown
**CRITICAL**: Use the Task tool with the EXACT subagent_type parameter as shown below:

**Agent 1: <agent-name>** (<Role>)
- Use Task tool: `Task(subagent_type="<agent-name>", description="<short-description>", prompt="<detailed-task-description>")`
```

### Required Changes Per Command

#### research-stock.md (3 agents)
```markdown
Agent 1: Task(subagent_type="quill-equity-analyst", ...)
Agent 2: Task(subagent_type="news-sentry-market-watch", ...)
Agent 3: Task(subagent_type="risk-shield-manager", ...)
```

#### plan-goals.md (3 agents)
```markdown
Agent 1: Task(subagent_type="compass-goal-planner", ...)
Agent 2: Task(subagent_type="atlas-investment-strategist", ...)
Agent 3: Task(subagent_type="risk-shield-manager", ...)
```

#### optimize-taxes.md (6 agents)
```markdown
# Parallel
Agent 1: Task(subagent_type="tax-scout", ...)
Agent 2: Task(subagent_type="risk-shield-manager", ...)
Agent 3: Task(subagent_type="rebalance-bot", ...)

# Sequential
Agent 4: Task(subagent_type="trader-jane-execution", ...)
Agent 5: Task(subagent_type="cash-treasury-steward", ...)
Agent 6: Task(subagent_type="quill-equity-analyst", ...)
```

#### monitor-holdings.md (6 agents)
```markdown
# Parallel
Agent 1: Task(subagent_type="news-sentry-market-watch", ...)
Agent 2: Task(subagent_type="ledger-performance-analyst", ...)
Agent 3: Task(subagent_type="risk-shield-manager", ...)

# Sequential
Agent 4: Task(subagent_type="earnings-whisperer", ...)
Agent 5: Task(subagent_type="factor-scout", ...)
Agent 6: Task(subagent_type="quill-equity-analyst", ...)
```

#### execute-rebalance.md (6 agents)
```markdown
# Parallel
Agent 1: Task(subagent_type="rebalance-bot", ...)
Agent 2: Task(subagent_type="risk-shield-manager", ...)
Agent 3: Task(subagent_type="tax-scout", ...)

# Sequential
Agent 4: Task(subagent_type="trader-jane-execution", ...)
Agent 5: Task(subagent_type="cash-treasury-steward", ...)
Agent 6: Task(subagent_type="compliance-sentinel", ...)
```

#### screen-opportunities.md (3 agents)
```markdown
# After screening phase
Agent 1: Task(subagent_type="quill-equity-analyst", ...) # For top 3 candidates
```

---

## 🎯 Expected Outcomes

After applying this fix to all commands:

### Performance Metrics
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Agent Selection | "general-purpose" | Specialized names | ✅ Fixed |
| Streaming Reports | None | Progressive display | ✅ Working |
| Execution Time | 10-13 min | 2-3 min | 70% faster |
| Cost per Workflow | $1.50-1.80 | $0.30-0.40 | 75% cheaper |
| User Experience | Long wait | Real-time feedback | 5x better |

### Quality Improvements
- ✅ Domain-specific prompts and expertise applied
- ✅ Specialized tool configurations used
- ✅ Consistent, high-quality analysis
- ✅ Better adherence to agent role definitions

---

## 🧪 Testing Protocol

For each updated command:

1. **Invoke Command**
   ```bash
   navam chat
   /invest:<command-name> <args>
   ```

2. **Verify Specialized Agents**
   - Check UI shows agent names (not "general-purpose")
   - Example: "🤖 Launching Agent: ledger-performance-analyst"

3. **Verify Streaming Reports**
   - Look for cyan-colored streaming panels
   - Each agent should show results immediately upon completion
   - Format: "📊 Streaming Report: <agent-name>"

4. **Measure Performance**
   - Execution time should be 2-3 minutes
   - Cost should be $0.30-0.40
   - Check `/perf` command for metrics

5. **Verify Quality**
   - Review final report for specialized insights
   - Check that domain expertise is evident
   - Confirm actionable recommendations

---

## 📚 Related Documentation

- **Backlog Issue**: `artifacts/backlog/active.md` (Specialized Agents Not Being Used section)
- **Agent Files**: `src/navam/.claude/agents/*.md` (18 agent definitions)
- **Command Files**: `.claude/commands/invest/*.md` (7 investment workflows)
- **SDK Documentation**: `artifacts/refer/claude-agent-sdk/PYTHON-SDK-API-REFERENCE.md`
- **Building Agents**: `artifacts/refer/claude-agent-sdk/building-agents.md`

---

## 💡 Key Learnings

1. **Explicit > Implicit**: SDK needs EXPLICIT parameter examples, not just guidance
2. **Agent Discovery Works**: The SDK correctly finds and loads agents from `.claude/agents/`
3. **Prompt Engineering Matters**: How we write command prompts directly affects agent selection
4. **Testing is Critical**: Production testing revealed what development testing missed
5. **Progressive Enhancement**: Start with one fixed command, verify it works, then apply to others

---

## 🚀 Next Actions

1. **Immediate** (Today): Fix remaining 6 commands with explicit Task tool examples
2. **Urgent** (Today): Sync, test, and verify fix works in production
3. **High** (Tomorrow): Release v1.7.2 with full fix
4. **Medium** (This Week): Monitor production usage and gather feedback
5. **Low** (Next Week): Consider adding validation to detect generic agent usage

---

**Status**: ✅ Fix Complete - Ready for Production Testing
**Priority**: 🔴 CRITICAL - Testing Required
**Target Release**: v1.7.2 (pending production validation)
