# Runtime Issues Analysis - Navam v1.6.0

**Date**: 2025-10-04
**Context**: Analysis of `/invest:research-stock GOOG` runtime behavior
**Status**: 🔴 CRITICAL - Multiple agent discovery and UX issues identified

---

## 🚨 Critical Issues Discovered

### 1. Agent Discovery Failure ⚠️ **CRITICAL**

**Problem**: SDK using "general-purpose" fallback agent instead of specialized agents

**Symptom**:
```
🤖 Agent Execution
┃ Agent: general-purpose
┃ Task: Research Google (Alphabet Inc) stock comprehensively
```

**Expected**:
```
🤖 Agent Execution
┃ Agent: quill-equity-analyst
┃ Task: Analyze Google (Alphabet Inc) fundamentals
```

**Root Cause**: **MISSING `setting_sources` parameter in ClaudeAgentOptions**

**Evidence from Documentation** (`artifacts/refer/claude-agent-sdk/PYTHON-SDK-API-REFERENCE.md:389-412`):

```python
#### setting_sources
Control which filesystem settings to load.

Type: list[str] | None

Options:
- "user" - Load from user directory (~/.claude/)
- "project" - Load from project directory (.claude/)

# Load only project settings
options = ClaudeAgentOptions(
    setting_sources=["project"]  # ← REQUIRED for project-level agents!
)
```

**Current Code** (`src/navam/chat.py:246-256`):
```python
self.claude_options = ClaudeAgentOptions(
    allowed_tools=allowed_tools or self._get_default_tools(),
    permission_mode=self.permission_mode,
    system_prompt=self._get_system_prompt(),
    mcp_servers=self.mcp_servers,
    add_dirs=agent_dirs,  # ✅ Has add_dirs
    # ❌ MISSING: setting_sources=["project"]
    can_use_tool=self._handle_tool_permission if should_use_permission_callback else None,
)
```

**Impact**:
- ❌ SDK cannot discover bundled agents in `src/navam/.claude/agents/`
- ❌ Falls back to built-in "general-purpose" agent
- ❌ Specialized agents (quill-equity-analyst, news-sentry-market-watch, etc.) never called
- ❌ Zero performance benefit from specialized agent design

**Fix Required**:
```python
self.claude_options = ClaudeAgentOptions(
    allowed_tools=allowed_tools or self._get_default_tools(),
    permission_mode=self.permission_mode,
    system_prompt=self._get_system_prompt(),
    mcp_servers=self.mcp_servers,
    add_dirs=agent_dirs,
    setting_sources=["project"],  # ← ADD THIS LINE
    can_use_tool=self._handle_tool_permission if should_use_permission_callback else None,
)
```

---

### 2. Tool Execution Visibility During Agent Runs 🔍

**Problem**: Tool calls made by agents are not visible in the UI

**Symptom**:
```
🤖 Agent Execution
┃ Agent: general-purpose
┃ Task: Research Google (Alphabet Inc) stock comprehensively
┃ Description: General-Purpose Agent - Multi-step task execution

[No tool execution shown during agent run]

🔔 Claude Response
Claude: [Long response with data that must have come from tool calls]
```

**Expected**:
```
🤖 Agent Execution
┃ Agent: quill-equity-analyst

🔧 Tool Execution
┃ Tool: mcp__company-research__get_company_profile
┃ Input: {"symbol": "GOOG"}

🔧 Tool Execution
┃ Tool: mcp__company-research__get_company_financials
┃ Input: {"symbol": "GOOG", "period": "annual"}

🔔 Claude Response
Claude: Based on the financial analysis...
```

**Root Cause**: Unknown - requires investigation

**Possible Causes**:
1. Agent tool calls are nested/internal to Task tool execution
2. SDK doesn't emit tool use events for subagent tool calls
3. Chat.py message processing doesn't handle nested tool blocks
4. Tool result blocks for agent tasks don't contain child tool executions

**Investigation Needed**:
- Check if `ToolResultBlock` for Task tool contains child blocks
- Review SDK message streaming for agent executions
- Examine chat.py:950-1100 message processing logic

**Impact**:
- ❌ No visibility into what agents are actually doing
- ❌ Can't debug which tools agents are calling
- ❌ Can't track cache hits for agent tool calls
- ❌ Poor UX - users don't see work being done

---

### 3. Response Accumulation vs Incremental Display 📝

**Problem**: Claude responses show full accumulated text instead of just new content

**Symptom**:
```
🔔 Claude Response
Claude: Based on my comprehensive analysis of Google (Alphabet Inc)...

🔔 Claude Response
Claude: Based on my comprehensive analysis of Google (Alphabet Inc)...
[Same text repeated, then continues with new content]
```

**Expected**:
```
🔔 Claude Response
Claude: Based on my comprehensive analysis of Google (Alphabet Inc)...

🔔 Claude Response
Claude: [Only new content, no repetition]
...here are the key findings:
```

**Root Cause**: Likely in `chat.py` message accumulation logic

**Current Code** (`chat.py:860-920`):
```python
# Accumulate text from streaming response
if isinstance(block, TextBlock):
    accumulated_text += block.text
    # Show response with ALL accumulated text
    self.notifications.show_response(
        f"Claude: {accumulated_text}",  # ← Shows full accumulated text
        timestamp=True
    )
```

**Fix Approach**:
- Track last displayed text length
- Only show delta (new text since last display)
- Or: Use Rich Live display for incremental updates

**Impact**:
- ❌ Confusing UX - looks like Claude is repeating itself
- ❌ Harder to read long responses
- ❌ Wastes terminal space

---

### 4. Cache Showing 0 Hits Despite Infrastructure 💾

**Problem**: Cache metrics show 0 hits even though hook infrastructure exists

**Symptom**:
```
Cache Metrics:
┃ Unique tool calls: 0
┃ Potential duplicates (cacheable): 0
┃ Cache hit rate: 0.00% (0/0)
┃ Cache savings estimate: $0.0000
```

**Expected**:
```
Cache Metrics:
┃ Unique tool calls: 8
┃ Potential duplicates (cacheable): 3
┃ Cache hit rate: 27.27% (3/11)
┃ Cache savings estimate: $0.0421
```

**Investigation from Code**:

**Cache Hook Infrastructure** (`chat.py:264-282`):
```python
# Initialize session cache for performance optimization
self.session_cache = SessionCache(ttl_seconds=300, max_entries=100)
self.cache_enabled = True  # Can be toggled for debugging

# Track tool call patterns for cache effectiveness analysis
self.tool_call_tracker = {}  # tool_name+args_hash -> {'count': N, ...}

# Track performance metrics
self.performance_metrics = {
    'workflow_start': None,
    'last_activity': None,
    'tool_calls_made': 0,
    'operations': [],
    'permission_checks': 0,
    'permission_check_time': 0.0,
    'potential_cache_hits': 0,
    'unique_tool_calls': 0,
    'cache_hits_actual': 0,
    'cache_misses_actual': 0
}
```

**Tool Tracking Logic** (`chat.py:978-994`):
```python
# Track tool calls for cache effectiveness analysis
# (Skip agent Task tools as they're not cacheable)
if tool_name != "Task" and tool_name.startswith("mcp__"):
    cache_key = self.session_cache._make_key(tool_name, tool_input)

    if cache_key in self.tool_call_tracker:
        # This is a duplicate call - could have been cached!
        self.tool_call_tracker[cache_key]['count'] += 1
        self.performance_metrics['potential_cache_hits'] += 1
    else:
        # First time seeing this tool call
        self.tool_call_tracker[cache_key] = {
            'tool_name': tool_name,
            'count': 1,
            'first_seen': time.time(),
            'args': tool_input
        }
        self.performance_metrics['unique_tool_calls'] += 1
```

**Possible Root Causes**:

1. **Agent tool calls not tracked**: Tool calls made by agents (within Task tool) may not be visible to main message loop
2. **Hook implementation missing**: Code shows TODO comment about hooks not being supported in v0.1.0
3. **Metrics not updated**: Tool tracking logic may not be executing during agent runs

**Evidence of Hooks Not Implemented** (`chat.py:254-256`):
```python
can_use_tool=self._handle_tool_permission if should_use_permission_callback else None,
# Note: No model or env specified - use Pro/Max plan defaults
# TODO: Hooks not yet supported in claude-agent-sdk v0.1.0
# Will need to implement caching at a different layer
```

**Impact**:
- ❌ No cache hit tracking
- ❌ No cost savings measurement
- ❌ Can't verify 70% API reduction claim from v1.6.0
- ❌ Missing performance data

**Next Steps**:
1. Check SDK version - is hooks support available now?
2. Implement pre_tool_use and post_tool_use hooks per API reference
3. Update cache tracking to work with agent tool calls
4. Test cache effectiveness with duplicate queries

---

### 5. Performance Metrics Showing No Data 📊

**Problem**: Performance command shows zeros across all metrics

**Symptom**:
```
Performance Metrics:
┃ Total tool calls: 0
┃ Total operations: 0
┃ Avg operation time: 0.00s
```

**Root Cause**: Similar to cache issue - agent tool calls not tracked

**Impact**:
- ❌ No workflow performance data
- ❌ Can't measure response times
- ❌ Can't identify bottlenecks
- ❌ No data for optimization

---

## 🎯 Priority Fix Order

### Priority 1: Agent Discovery ⚡ **IMMEDIATE**
**File**: `src/navam/chat.py:246-256`
**Change**: Add `setting_sources=["project"]` to ClaudeAgentOptions
**Impact**: Enables specialized agents, unlocks core functionality
**Effort**: 5 minutes
**Test**: Run `/invest:research-stock AAPL` and verify quill-equity-analyst is called

### Priority 2: Cache Hooks Implementation 🔧
**File**: `src/navam/chat.py:246-256`
**Change**: Add hooks parameter with pre_tool_use and post_tool_use
**Impact**: Enables 70% API call reduction, cost savings
**Effort**: 2 hours
**Test**: Run query twice, verify cache hits on second run

### Priority 3: Tool Execution Visibility 🔍
**Investigation Required**: Why agent tool calls aren't visible
**Files**: `src/navam/chat.py:950-1100` (message processing)
**Impact**: Better UX, debugging capability
**Effort**: 4-6 hours

### Priority 4: Response Display Fix 📝
**File**: `src/navam/chat.py:860-920`
**Change**: Track last displayed position, show delta only
**Impact**: Better UX, cleaner output
**Effort**: 1-2 hours

### Priority 5: Performance Metrics 📊
**Dependency**: Fix tool visibility first
**Impact**: Enables performance optimization
**Effort**: 2-3 hours

---

## 🧪 Test Plan

### Test 1: Agent Discovery
```bash
navam chat
> /invest:research-stock AAPL

Expected output:
🤖 Agent Execution
┃ Agent: quill-equity-analyst  # ← Should show this, not "general-purpose"
```

### Test 2: Cache Effectiveness
```bash
navam chat
> /invest:research-stock AAPL
> /perf  # Note cache metrics

> /invest:research-stock AAPL  # Same query
> /perf  # Verify cache hits > 0
```

### Test 3: Tool Visibility
```bash
navam chat
> /invest:research-stock MSFT

Expected output should include:
🔧 Tool Execution
┃ Tool: mcp__company-research__get_company_profile
┃ Input: {"symbol": "MSFT"}
```

---

## 📝 Technical Details

### Agent Name Resolution

**Agent File**: `.claude/agents/quill-equity-analyst.md`
```yaml
---
name: quill-equity-analyst  # ← This is the agent identifier
description: Use this agent when you need comprehensive fundamental equity analysis
model: sonnet
color: blue
---
```

**Command Reference**: `.claude/commands/invest/research-stock.md`
```markdown
2. **Fundamental Analysis (Quill Equity Analyst)**
   - Launch quill-equity-analyst agent WITH pre-gathered data as context
```

**SDK Task Tool Call** (expected):
```python
{
    "tool_name": "Task",
    "tool_input": {
        "subagent_type": "quill-equity-analyst",  # ← Should match agent name
        "description": "Analyze AAPL fundamentals",
        "prompt": "..."
    }
}
```

**Current Behavior**: SDK can't find "quill-equity-analyst" → falls back to "general-purpose"

**Why**: Missing `setting_sources=["project"]` prevents SDK from loading project agents

---

### Cache Hook Implementation (Reference)

**From**: `artifacts/refer/claude-agent-sdk/PYTHON-SDK-API-REFERENCE.md:589-642`

```python
async def pre_tool_use_hook(tool_name: str, tool_input: dict) -> dict:
    """Check cache before tool execution"""
    if not tool_name.startswith("mcp__"):
        return {"behavior": "allow"}

    cached = session_cache.get(tool_name, tool_input)
    if cached:
        return {
            "behavior": "deny",
            "result": cached  # Return cached result, skip execution
        }

    return {"behavior": "allow"}

async def post_tool_use_hook(tool_name: str, tool_input: dict, result: dict):
    """Cache result after execution"""
    if tool_name.startswith("mcp__"):
        session_cache.set(tool_name, tool_input, result)

options = ClaudeAgentOptions(
    hooks={
        'pre_tool_use': pre_tool_use_hook,
        'post_tool_use': post_tool_use_hook
    }
)
```

---

## 🔗 Related Documentation

- **API Reference**: `artifacts/refer/claude-agent-sdk/PYTHON-SDK-API-REFERENCE.md`
- **Setting Sources**: Lines 389-412
- **Hooks**: Lines 589-642
- **Migration Guide**: `artifacts/refer/claude-agent-sdk/MIGRATION-GUIDE.md`
- **Programmatic Agents**: `artifacts/refer/claude-agent-sdk/PROGRAMMATIC-AGENTS-NOT-WORKING.md`

---

## ✅ Success Criteria

- [ ] Agent discovery working - specialized agents called by name
- [ ] Cache hooks implemented and showing hit rate > 0%
- [ ] Tool executions visible during agent runs
- [ ] Response display shows incremental text only
- [ ] Performance metrics tracking tool calls and timing
- [ ] `/invest:research-stock AAPL` completes with:
  - quill-equity-analyst called (not general-purpose)
  - Multiple tool calls visible
  - Cache hits on second run
  - Performance metrics populated

---

**Status**: 🔴 CRITICAL ISSUES IDENTIFIED
**Next Step**: Implement Priority 1 fix (add setting_sources parameter)
**ETA**: 5 minutes for P1, 8-12 hours for remaining fixes

---

*Last Updated: 2025-10-04*
