# Sensei MCP Orchestrator - Gap Analysis & Implementation Plan

**Date:** 2025-01-23
**Version:** v0.5.0
**Status:** Critical Gap Identified - Orchestrator Returns Placeholder Responses

---

## 🚨 EXECUTIVE SUMMARY

**The Problem:** The `get_engineering_guidance()` MCP tool returns placeholder text instead of actual multi-persona analysis, making it essentially non-functional for end users.

**Root Cause:** The orchestrator architecture is complete, but the **actual LLM-based analysis is not implemented**. Both `BasePersona.analyze()` and `SkillOrchestrator._standard_synthesis()` return hardcoded placeholder strings.

**Impact:** Users calling the MCP tool get template responses like "[This would be generated by the orchestrator using the full skill content]" instead of real engineering guidance.

**Priority:** CRITICAL - This is a core feature that appears complete but doesn't work.

---

## 🔍 GAP ANALYSIS

### What's Working ✅

1. **Persona Loading** (`src/sensei_mcp/personas/loader.py`)
   - Loads all 23 persona skill files from `src/sensei_mcp/personas/skills/*.md`
   - Parses metadata (name, description, expertise, principles)
   - Registry pattern works correctly

2. **Context Detection** (`src/sensei_mcp/context_detector.py`)
   - Detects query context (CRISIS, SECURITY, ARCHITECTURAL, etc.)
   - Keyword matching and pattern recognition

3. **Persona Selection** (`src/sensei_mcp/orchestrator.py`)
   - Intelligently selects 3-5 relevant personas based on query
   - Relevance scoring via keyword matching
   - Mode selection (auto, crisis, quick, full)

4. **Session Management** (`src/sensei_mcp/session.py`)
   - Tracks consultations, decisions, constraints
   - Session context passed to personas

5. **Export & Analytics** (`src/sensei_mcp/exporter.py`, `src/sensei_mcp/analytics.py`)
   - Consultation export (markdown, JSON)
   - Session insights and statistics
   - Works correctly for the placeholder responses

### What's Broken ❌

#### 1. **BasePersona.analyze() - Returns Placeholder** (CRITICAL)

**Location:** `src/sensei_mcp/personas/base.py:90-121`

**Current Implementation:**
```python
def analyze(self, query: str, context: Optional[Dict] = None) -> str:
    """Provide this persona's perspective on the query."""
    return f"""
**{self.name.replace('_', ' ').title()}** perspective:

Query: {query}

Based on my core principles:
- {principle1}
- {principle2}
- {principle3}

My analysis: [This would be generated by the orchestrator using the full skill content]
"""
```

**The Problem:**
- Returns hardcoded template string
- Does NOT use `self.full_content` (the complete persona skill markdown)
- Does NOT invoke LLM to generate actual analysis
- Placeholder text: "This would be generated by the orchestrator using the full skill content"

**What's Missing:**
- LLM invocation to analyze query from persona's perspective
- Prompt construction using persona's full skill content
- Integration with Claude API (or other LLM)

---

#### 2. **SkillOrchestrator._standard_synthesis() - Returns Placeholder** (CRITICAL)

**Location:** `src/sensei_mcp/orchestrator.py:195-236`

**Current Implementation:**
```python
def _standard_synthesis(self, query: str, perspectives: Dict[str, str], context: QueryContext) -> str:
    """Standard synthesis format."""
    lines = []
    # ... shows each persona's placeholder perspective ...

    # Synthesis placeholder (would be LLM-generated in practice)
    lines.append("✅ SYNTHESIS & RECOMMENDATION:")
    lines.append("")
    lines.append("[The orchestrator would synthesize these perspectives here,")
    lines.append(" identifying consensus, tensions, and providing a clear recommendation]")
    lines.append("")
    lines.append("**Recommended Path:**")
    lines.append("[Specific, actionable recommendation based on all perspectives]")
```

**The Problem:**
- Displays all persona perspectives (which are already placeholders)
- Then adds MORE placeholder text for synthesis
- Does NOT synthesize perspectives into coherent recommendation
- Placeholder text: "[The orchestrator would synthesize these perspectives here...]"

**What's Missing:**
- LLM invocation to synthesize multiple perspectives
- Conflict resolution logic (identify agreements/disagreements)
- Actionable recommendation generation
- Integration with Claude API (or other LLM)

---

## 🏗️ ARCHITECTURE GAP

### Current Flow (What Happens Now)

```
User Query
    ↓
get_engineering_guidance() [MCP Tool]
    ↓
SkillOrchestrator.orchestrate()
    ↓
select_personas() [✅ WORKS]
    ↓
gather_perspectives()
    ↓
BasePersona.analyze() [❌ RETURNS PLACEHOLDER]
    ↓
synthesize()
    ↓
_standard_synthesis() [❌ RETURNS PLACEHOLDER]
    ↓
Return to user: "Placeholder text"
```

### Required Flow (What Should Happen)

```
User Query
    ↓
get_engineering_guidance() [MCP Tool]
    ↓
SkillOrchestrator.orchestrate()
    ↓
select_personas() [✅ WORKS]
    ↓
gather_perspectives()
    ↓
FOR EACH persona:
    ↓
    BasePersona.analyze()
        ↓
        LLM.invoke(
            system_prompt = persona.full_content,
            user_prompt = f"Analyze this query: {query}\nContext: {session_context}",
            model = "claude-3-5-sonnet"
        )
        ↓
    Return actual analysis [✅ REAL RESPONSE]
    ↓
synthesize()
    ↓
    _standard_synthesis()
        ↓
        LLM.invoke(
            system_prompt = "You are the Skill Orchestrator...",
            user_prompt = f"Query: {query}\n\nPerspectives:\n{all_perspectives}\n\nSynthesize these into a coherent recommendation.",
            model = "claude-3-5-sonnet"
        )
        ↓
    Return synthesized recommendation [✅ REAL RESPONSE]
    ↓
Return to user: Actual multi-persona analysis
```

---

## 🛠️ IMPLEMENTATION REQUIREMENTS

### 1. LLM Client Integration

**What's Needed:**
- Anthropic Claude API client (or generic LLM abstraction)
- API key management (environment variable or config file)
- Retry logic and error handling
- Token usage tracking (for cost monitoring)

**Options:**
- Direct `anthropic` Python SDK
- `litellm` for multi-provider support (Claude, OpenAI, local models)
- LangChain (overkill for this use case)

**Recommendation:** Use `anthropic` SDK directly for simplicity and control.

---

### 2. BasePersona.analyze() Implementation

**File:** `src/sensei_mcp/personas/base.py`

**New Method:**
```python
def analyze(self, query: str, context: Optional[Dict] = None) -> str:
    """
    Provide this persona's perspective on the query using LLM.

    Args:
        query: The question or scenario to analyze
        context: Optional session context (constraints, decisions, patterns)

    Returns:
        The persona's perspective as a string (LLM-generated)
    """
    # 1. Construct system prompt from full skill content
    system_prompt = self.full_content

    # 2. Construct user prompt with query + context
    user_prompt = self._build_user_prompt(query, context)

    # 3. Invoke LLM
    response = llm_client.invoke(
        system_prompt=system_prompt,
        user_prompt=user_prompt,
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000
    )

    # 4. Return response
    return response
```

**Helper Method:**
```python
def _build_user_prompt(self, query: str, context: Optional[Dict] = None) -> str:
    """Build user prompt with query and session context."""
    prompt_parts = [f"Query: {query}"]

    if context:
        if context.get('active_constraints'):
            prompt_parts.append(f"\nActive Constraints: {', '.join(context['active_constraints'])}")
        if context.get('patterns_agreed'):
            prompt_parts.append(f"\nAgreed Patterns: {', '.join(context['patterns_agreed'])}")
        if context.get('recent_decisions'):
            prompt_parts.append("\nRecent Decisions:")
            for d in context['recent_decisions'][-3:]:
                prompt_parts.append(f"  - {d['description']} ({d['rationale']})")

    prompt_parts.append("\nProvide your perspective on this query based on your expertise and principles.")

    return "\n".join(prompt_parts)
```

---

### 3. SkillOrchestrator._standard_synthesis() Implementation

**File:** `src/sensei_mcp/orchestrator.py`

**New Method:**
```python
def _standard_synthesis(self, query: str, perspectives: Dict[str, str], context: QueryContext) -> str:
    """
    Standard synthesis format with LLM-generated synthesis.

    Args:
        query: Original query
        perspectives: Dict of persona name → their analysis
        context: Detected query context

    Returns:
        Formatted synthesis with real recommendations
    """
    lines = []

    lines.append("🎭 ORCHESTRATED ANALYSIS")
    lines.append(f"Context: {context.value.upper()}")
    lines.append(f"Personas Consulted: {len(perspectives)}")
    lines.append("")
    lines.append("━" * 60)
    lines.append("")

    # Show each perspective
    for name, perspective in perspectives.items():
        display_name = name.replace('-', ' ').title()
        lines.append(f"💭 {display_name}:")
        lines.append(perspective.strip())
        lines.append("")

    lines.append("━" * 60)
    lines.append("")

    # **NEW: LLM-generated synthesis**
    synthesis_prompt = self._build_synthesis_prompt(query, perspectives, context)
    synthesis = llm_client.invoke(
        system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
        user_prompt=synthesis_prompt,
        model="claude-3-5-sonnet-20241022",
        max_tokens=1500
    )

    lines.append("✅ SYNTHESIS & RECOMMENDATION:")
    lines.append("")
    lines.append(synthesis.strip())

    return "\n".join(lines)
```

**Helper Method:**
```python
def _build_synthesis_prompt(self, query: str, perspectives: Dict[str, str], context: QueryContext) -> str:
    """Build synthesis prompt for LLM."""
    prompt_parts = [
        f"Original Query: {query}",
        f"Context: {context.value}",
        "",
        "Perspectives from experts:"
    ]

    for name, perspective in perspectives.items():
        prompt_parts.append(f"\n### {name.replace('-', ' ').title()}")
        prompt_parts.append(perspective.strip())

    prompt_parts.append("\n---\n")
    prompt_parts.append("Synthesize these perspectives into a coherent recommendation:")
    prompt_parts.append("1. Identify consensus points (where experts agree)")
    prompt_parts.append("2. Identify tensions (where experts disagree)")
    prompt_parts.append("3. Provide a clear, actionable recommendation")
    prompt_parts.append("4. Explain trade-offs and when to revisit the decision")

    return "\n".join(prompt_parts)

# Orchestrator system prompt
ORCHESTRATOR_SYSTEM_PROMPT = """
You are the Skill Orchestrator - the Chief of Staff who synthesizes input from multiple specialized engineering personas.

Your job:
1. Read all perspectives carefully
2. Identify where experts agree (consensus)
3. Identify where experts disagree (tensions)
4. Resolve conflicts via mediation ("disagree and commit")
5. Provide a clear, actionable recommendation

Format your response as:

**Recommended Path:**
[Specific, actionable recommendation based on all perspectives]

**Consensus:**
[Points where all/most experts agree]

**Tensions:**
[Points of disagreement and why]

**Disagree and Commit:**
[How to proceed despite disagreements, if any]

**Revisit When:**
[Conditions that would change this recommendation]
"""
```

---

### 4. LLM Client Abstraction

**File:** `src/sensei_mcp/llm_client.py` (NEW)

```python
"""
LLM client abstraction for persona analysis and synthesis.

Supports Anthropic Claude API with fallback to local models.
"""

import os
from typing import Optional
from anthropic import Anthropic

class LLMClient:
    """Wrapper for LLM API calls."""

    def __init__(self, api_key: Optional[str] = None):
        """
        Initialize LLM client.

        Args:
            api_key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
        """
        self.api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
        if not self.api_key:
            raise ValueError("ANTHROPIC_API_KEY not found in environment")

        self.client = Anthropic(api_key=self.api_key)

    def invoke(
        self,
        system_prompt: str,
        user_prompt: str,
        model: str = "claude-3-5-sonnet-20241022",
        max_tokens: int = 2000,
        temperature: float = 0.7
    ) -> str:
        """
        Invoke LLM with system and user prompts.

        Args:
            system_prompt: System instructions (persona skill content)
            user_prompt: User query and context
            model: Claude model ID
            max_tokens: Maximum response tokens
            temperature: Response randomness (0-1)

        Returns:
            LLM response text
        """
        try:
            response = self.client.messages.create(
                model=model,
                max_tokens=max_tokens,
                temperature=temperature,
                system=system_prompt,
                messages=[
                    {"role": "user", "content": user_prompt}
                ]
            )

            # Extract text from response
            return response.content[0].text

        except Exception as e:
            # Log error and return fallback
            print(f"LLM invocation error: {e}")
            return f"[Error: Could not generate response - {str(e)}]"

# Global client instance (initialized on first use)
_client: Optional[LLMClient] = None

def get_client() -> LLMClient:
    """Get or create global LLM client."""
    global _client
    if _client is None:
        _client = LLMClient()
    return _client
```

---

### 5. Integration Points

**Changes Required:**

1. **`src/sensei_mcp/personas/base.py`:**
   - Import `from sensei_mcp.llm_client import get_client`
   - Replace `analyze()` method with LLM-based implementation

2. **`src/sensei_mcp/orchestrator.py`:**
   - Import `from sensei_mcp.llm_client import get_client`
   - Replace `_standard_synthesis()` with LLM-based implementation
   - Replace `_brief_synthesis()` with LLM-based implementation
   - Replace `_executive_synthesis()` with LLM-based implementation

3. **`pyproject.toml`:**
   - Add dependency: `anthropic = "^0.40.0"`

4. **Documentation:**
   - Update README.md with API key setup instructions
   - Add troubleshooting section for API key errors

---

## 📋 IMPLEMENTATION PLAN

### Phase 1: LLM Client Infrastructure (4-6 hours)

1. **Create `llm_client.py`** (2 hours)
   - Implement `LLMClient` class
   - Add `anthropic` dependency
   - Add error handling and retries
   - Add token usage logging

2. **Environment Setup** (1 hour)
   - Document API key configuration
   - Add `.env.example` file
   - Update README with setup instructions

3. **Testing** (1-2 hours)
   - Unit tests for `LLMClient`
   - Mock API responses for tests
   - Test error handling

---

### Phase 2: Persona Analysis Implementation (6-8 hours)

1. **Update `BasePersona.analyze()`** (3-4 hours)
   - Implement LLM-based analysis
   - Add `_build_user_prompt()` helper
   - Handle session context properly
   - Add error handling

2. **Testing** (2-3 hours)
   - Test each persona with real queries
   - Validate response quality
   - Test with session context
   - Test error cases (API failure, timeout)

3. **Optimization** (1 hour)
   - Add caching for identical queries
   - Optimize prompt token usage
   - Add timeout configuration

---

### Phase 3: Synthesis Implementation (6-8 hours)

1. **Update `SkillOrchestrator._standard_synthesis()`** (3-4 hours)
   - Implement LLM-based synthesis
   - Add `_build_synthesis_prompt()` helper
   - Add orchestrator system prompt

2. **Update Other Synthesis Methods** (2 hours)
   - Implement `_brief_synthesis()` with LLM
   - Implement `_executive_synthesis()` with LLM

3. **Testing** (1-2 hours)
   - Test synthesis with multiple perspectives
   - Validate consensus/tension detection
   - Test different output formats

---

### Phase 4: End-to-End Testing & Polish (4-6 hours)

1. **Integration Testing** (2-3 hours)
   - Test full orchestration flow
   - Test different query types (CRISIS, SECURITY, ARCHITECTURAL)
   - Test different modes (auto, crisis, quick, full)

2. **Performance Testing** (1 hour)
   - Measure latency (expected: 5-15 seconds per consultation)
   - Test concurrent requests
   - Validate token usage

3. **Documentation** (1-2 hours)
   - Update CHANGELOG.md
   - Update README.md
   - Add TROUBLESHOOTING.md section
   - Document cost implications (API usage)

---

## 💰 COST ANALYSIS

### Token Usage Estimates

**Per Consultation:**
- Persona analysis (3-5 personas × 2000 tokens each): 6,000-10,000 tokens
- Synthesis (1500 tokens): 1,500 tokens
- **Total output tokens:** ~7,500-11,500 tokens per consultation

**Input tokens:**
- Persona skill content (system prompt): ~2,000-3,000 tokens each
- User query + context: ~500-1,000 tokens
- **Total input tokens per persona:** ~2,500-4,000 tokens
- **Total input tokens per consultation:** ~10,000-20,000 tokens (for 5 personas)

### Cost per Consultation (Claude 3.5 Sonnet)

**Pricing:**
- Input: $3 per 1M tokens
- Output: $15 per 1M tokens

**Per Consultation:**
- Input: 15,000 tokens × $3 / 1M = $0.045
- Output: 10,000 tokens × $15 / 1M = $0.15
- **Total: ~$0.20 per consultation**

**Monthly costs (estimated usage):**
- 10 consultations/day × 30 days = 300 consultations/month
- **Total: ~$60/month** (light usage)

- 100 consultations/day × 30 days = 3,000 consultations/month
- **Total: ~$600/month** (heavy usage)

**Note:** This is for development/personal use. Enterprise customers would need higher limits.

---

## 🚧 RISKS & MITIGATIONS

### Risk 1: High Latency

**Issue:** LLM calls add 2-5 seconds per persona, 10-25 seconds total per consultation.

**Mitigation:**
- Add loading indicators in CLI (`sensei-mcp` output)
- Consider parallel persona analysis (5 personas in parallel = 5 seconds instead of 25 seconds)
- Add caching for identical queries (session-level cache)
- Offer "quick" mode with single persona for faster responses

### Risk 2: API Costs

**Issue:** $0.20 per consultation can add up for heavy users.

**Mitigation:**
- Document cost implications clearly in README
- Add token usage logging/tracking
- Offer cost controls (max tokens, max personas)
- Consider local model fallback (Ollama, LM Studio) for cost-sensitive users

### Risk 3: API Key Management

**Issue:** Users need Anthropic API key, which adds friction.

**Mitigation:**
- Clear setup instructions in README
- Helpful error messages when API key missing
- Support `.env` file for easy configuration
- Consider offering hosted version (SaaS) in future

### Risk 4: Response Quality

**Issue:** LLM responses might be inconsistent or off-topic.

**Mitigation:**
- Careful prompt engineering (system prompts from skill files are comprehensive)
- Add response validation (check if response addresses query)
- Add feedback mechanism (let users report bad responses)
- Iterative prompt tuning based on user feedback

### Risk 5: Backward Compatibility

**Issue:** Existing tests and examples expect placeholder text, not real responses.

**Mitigation:**
- Update tests to mock LLM client
- Add integration tests with real API (optional, via env flag)
- Update all examples in documentation
- Consider deprecation path for non-LLM mode (if needed)

---

## 🎯 SUCCESS METRICS

### Phase 1: Infrastructure

- ✅ `LLMClient` implemented and tested
- ✅ API key configuration documented
- ✅ Error handling works (graceful degradation)

### Phase 2: Persona Analysis

- ✅ All 23 personas return real analysis (not placeholders)
- ✅ Response quality validated (human review of 10+ queries)
- ✅ Average latency < 5 seconds per persona

### Phase 3: Synthesis

- ✅ Synthesis identifies consensus and tensions correctly
- ✅ Recommendations are actionable and clear
- ✅ All 3 output formats work (brief, standard, executive)

### Phase 4: End-to-End

- ✅ Full orchestration works for all query types
- ✅ Session context properly influences responses
- ✅ All existing tests updated and passing
- ✅ Documentation complete

---

## 📝 NEXT STEPS

### Immediate (This Week)

1. **Confirm Implementation Approach** with stakeholders
   - LLM provider choice (Anthropic Claude vs others)
   - Budget approval for API costs
   - Timeline expectations

2. **Set Up Development Environment**
   - Get Anthropic API key
   - Install `anthropic` SDK
   - Test basic API calls

3. **Create Feature Branch**
   - `git checkout -b feature/orchestrator-llm-integration`

### Week 1: Infrastructure + Persona Analysis

- Implement `llm_client.py`
- Update `BasePersona.analyze()`
- Write tests
- Document setup

### Week 2: Synthesis + Testing

- Update `SkillOrchestrator` synthesis methods
- End-to-end testing
- Performance optimization
- Documentation

### Week 3: Polish + Release

- Final testing
- Update CHANGELOG
- Create v0.6.0 release (or v0.5.1 patch)
- Announce to users

---

## 🤔 OPEN QUESTIONS

1. **LLM Provider:**
   - Stick with Anthropic Claude only?
   - Or support multi-provider (OpenAI, local models via Ollama)?
   - **Recommendation:** Start with Claude, add others in v0.7.0 if needed

2. **Caching Strategy:**
   - Cache responses per session?
   - Cache globally (across all users)?
   - TTL for cache (1 hour? 1 day?)?
   - **Recommendation:** Session-level cache with 1-hour TTL

3. **Concurrency:**
   - Analyze personas in parallel (faster) or sequentially (cheaper)?
   - **Recommendation:** Parallel by default, sequential in "budget" mode

4. **Backward Compatibility:**
   - Keep placeholder mode as fallback (if API key missing)?
   - Or require API key always?
   - **Recommendation:** Require API key, clear error if missing

5. **Model Selection:**
   - Allow users to choose model (Sonnet vs Opus)?
   - **Recommendation:** Default to Sonnet, allow override via config

---

**Made with 🎭 by the Sensei Engineering Team**
*Analysis Date: 2025-01-23*
