# Phase 2 Startup Guide: Smart Memory Implementation

**Version**: v0.15.0 Unified Memory Model
**Phase**: Phase 2 - Smart Memory (Week 3-4, Day 13-24)
**Duration**: 11.5 days (10 days implementation + 1.5 days QA)
**Last Updated**: 2025-11-03

---

## 📋 Quick Start for New Claude Code Session

### Prerequisites Checklist

Before starting Phase 2, verify:

```bash
# 1. Confirm you're on the correct branch
git branch --show-current
# Expected: feature/v0.15.0-unified-memory

# 2. Verify Phase 1 is complete
git log --oneline | head -3
# Expected to see:
#   docs: add Phase 2-4 execution plan...
#   feat: implement v0.15.0 Unified Memory Model (Phase 1 complete)

# 3. Check all Phase 1 tests pass
./.venv/bin/pytest tests/core/test_memory.py tests/core/test_compatibility.py \
  tests/utils/test_migrate_to_memory.py tests/cli/test_memory_commands.py \
  tests/mcp/test_server_memory.py -v
# Expected: 183 passed

# 4. Verify working directory is clean
git status
# Expected: "nothing to commit, working tree clean"

# 5. Check Python environment
./.venv/bin/python --version
# Expected: Python 3.11+
```

✅ **All checks passed?** → Proceed to Phase 2 execution

---

## 🎯 Phase 2 Overview

### Objectives

1. **Auto-extract memories from Git commits** (Agent 6)
   - Extract architectural decisions from commit messages
   - Detect code patterns from diffs
   - Confidence scoring for auto-extracted memories

2. **Detect relationships between memories** (Agent 7)
   - Semantic similarity linking
   - Tag-based relationships
   - Merge candidate detection

3. **CLI commands for extraction** (Agent 8)
   - `clauxton memory extract`
   - `clauxton memory link`
   - `clauxton memory suggest-merge`

### Success Criteria

- **Implementation**: 3 files (~850 lines)
- **Tests**: 40 tests, all passing
- **Coverage**: >90% on new modules
- **Quality Score**: >90/100 (Grade A or higher)
- **Performance**: Extract <100ms per commit, Link <200ms for 1K entries
- **Duration**: 10 days (+ 1.5 days QA)

---

## 🚀 Execution Steps

### Step 1: Launch Agents 6 & 7 in Parallel (Day 13-18, 6 days)

**Important**: Launch **both agents in a single message** for true parallel execution.

Copy and paste this to Claude Code:

```
I want to launch Phase 2 of v0.15.0 Unified Memory Model implementation.

Please launch Agent 6 (Memory Extraction) and Agent 7 (Relationship Detection)
IN PARALLEL using the Task tool.

Use the detailed prompts from the sections below:
- Agent 6 Prompt: [See "Agent 6 Detailed Prompt" section]
- Agent 7 Prompt: [See "Agent 7 Detailed Prompt" section]

Launch both agents in a SINGLE MESSAGE with TWO Task tool calls.
```

---

## 📝 Agent 6 Detailed Prompt

**Agent**: Memory Extraction
**Duration**: 6 days
**Depends on**: Phase 1 complete ✅

### Full Prompt for Task Tool

```markdown
# Agent 6: Memory Extraction from Git Commits

## Context

You are implementing **memory auto-extraction** for Clauxton v0.15.0, which extracts architectural decisions and code patterns from Git commit history.

**Project**: Clauxton - Claude Code plugin
**Version**: v0.15.0 Unified Memory Model
**Phase**: Phase 2 (Smart Memory), Day 13-18
**Branch**: feature/v0.15.0-unified-memory
**Dependencies**: Phase 1 complete ✅

## Documentation to Read First

1. `docs/v0.15.0_IMPLEMENTATION_PLAN.md` (Week 3-4 section, lines 264-330)
2. `docs/v0.15.0_PHASE2-4_EXECUTION_PLAN.md` (Phase 2 section)
3. `CLAUDE.md` (Code style guidelines)
4. Review existing code:
   - `clauxton/core/memory.py` (Memory system)
   - `clauxton/analysis/git_analyzer.py` (Git analysis patterns)

## Tasks

### Task 1: Implement MemoryExtractor (Day 13-15)

**File**: `clauxton/semantic/memory_extractor.py`

Create a class that extracts memories from Git commits:

```python
from pathlib import Path
from typing import List, Optional, Dict
from datetime import datetime, timedelta
from git import Repo
from clauxton.core.memory import Memory, MemoryEntry
import re

class MemoryExtractor:
    """Extract memories from Git commits and code changes."""

    def __init__(self, project_root: Path):
        """
        Initialize extractor.

        Args:
            project_root: Project root directory
        """
        self.project_root = project_root
        self.repo = Repo(project_root)
        self.memory = Memory(project_root)

    def extract_from_commit(self, commit_sha: str) -> List[MemoryEntry]:
        """
        Extract memories from a single commit.

        Args:
            commit_sha: Commit SHA to analyze

        Returns:
            List of extracted MemoryEntry objects

        Examples:
            - "feat: Add user authentication" → Decision memory
            - "refactor: Switch to PostgreSQL" → Decision memory
            - Large API changes → Pattern memory
        """
        commit = self.repo.commit(commit_sha)
        memories = []

        # Extract decisions from commit message
        decision = self._extract_decision(commit.message, commit)
        if decision:
            memories.append(decision)

        # Extract patterns from code changes
        if commit.parents:  # Skip initial commit
            diff = commit.diff(commit.parents[0])
            patterns = self._detect_patterns(diff, commit)
            memories.extend(patterns)

        return memories

    def extract_from_recent_commits(
        self,
        since_days: int = 7,
        auto_add: bool = False
    ) -> List[MemoryEntry]:
        """
        Extract memories from recent commits.

        Args:
            since_days: Number of days to look back
            auto_add: If True, automatically add to memory system

        Returns:
            List of extracted MemoryEntry objects
        """
        since_date = datetime.now() - timedelta(days=since_days)
        commits = list(self.repo.iter_commits(
            since=since_date.strftime("%Y-%m-%d")
        ))

        all_memories = []
        for commit in commits:
            memories = self.extract_from_commit(commit.hexsha)
            all_memories.extend(memories)

            if auto_add:
                for memory in memories:
                    self.memory.add(memory)

        return all_memories

    def _extract_decision(
        self,
        commit_message: str,
        commit
    ) -> Optional[MemoryEntry]:
        """
        Identify architectural decisions from commit message.

        Patterns to detect:
        - Conventional commits: "feat:", "fix:", "refactor:", "perf:"
        - Migration: "Switch to X", "Migrate to Y", "Replace X with Y"
        - Implementation: "Add authentication", "Implement caching"

        Args:
            commit_message: Full commit message
            commit: Git commit object

        Returns:
            MemoryEntry if decision detected, None otherwise
        """
        # Decision patterns
        decision_patterns = [
            (r'^feat:\s*(.+)', 'feature', 0.8),
            (r'^refactor:\s*(.+)', 'refactor', 0.9),
            (r'^perf:\s*(.+)', 'performance', 0.9),
            (r'switch\s+to\s+(\w+)', 'technology-switch', 0.95),
            (r'migrate\s+to\s+(\w+)', 'migration', 0.95),
            (r'replace\s+(\w+)\s+with\s+(\w+)', 'replacement', 0.95),
            (r'add\s+(\w+\s+)?authentication', 'authentication', 0.9),
            (r'implement\s+(\w+\s+)?caching', 'caching', 0.9),
        ]

        for pattern, category, confidence in decision_patterns:
            match = re.search(pattern, commit_message, re.IGNORECASE)
            if match:
                # Extract title from first line
                first_line = commit_message.split('\n')[0]
                title = first_line.strip()

                # Remove conventional commit prefix
                title = re.sub(r'^(feat|fix|refactor|perf|docs|test|chore):\s*', '', title)

                # Extract body (if exists)
                lines = commit_message.split('\n')
                body = '\n'.join(lines[1:]).strip() if len(lines) > 1 else ""

                content = body if body else title

                return MemoryEntry(
                    id=self._generate_memory_id(),
                    type="decision",
                    title=title[:200],  # Limit to 200 chars
                    content=content,
                    category=category,
                    tags=self._extract_tags(commit_message),
                    created_at=datetime.fromtimestamp(commit.committed_date),
                    updated_at=datetime.fromtimestamp(commit.committed_date),
                    source="git-commit",
                    confidence=confidence,
                    source_ref=commit.hexsha,
                )

        return None

    def _detect_patterns(self, diff, commit) -> List[MemoryEntry]:
        """
        Detect code patterns from diff.

        Examples:
        - New API endpoint added
        - Database migration
        - New React component pattern
        - Testing pattern

        Args:
            diff: Git diff object
            commit: Git commit object

        Returns:
            List of pattern memories
        """
        patterns = []

        # Count changes by file type
        api_changes = 0
        ui_changes = 0
        db_changes = 0
        test_changes = 0

        for change in diff:
            path = change.a_path or change.b_path

            if not path:
                continue

            # Detect file types
            if 'api' in path.lower() or 'route' in path.lower():
                api_changes += 1
            elif any(ext in path for ext in ['.tsx', '.jsx', '.vue', '.html']):
                ui_changes += 1
            elif 'migration' in path.lower() or 'schema' in path.lower():
                db_changes += 1
            elif 'test' in path.lower():
                test_changes += 1

        # Generate pattern memories for significant changes
        if api_changes >= 2:
            patterns.append(self._create_pattern_memory(
                title="API endpoint changes",
                content=f"Modified {api_changes} API-related files",
                category="api",
                commit=commit,
                confidence=0.7
            ))

        if ui_changes >= 3:
            patterns.append(self._create_pattern_memory(
                title="UI component changes",
                content=f"Modified {ui_changes} UI components",
                category="frontend",
                commit=commit,
                confidence=0.7
            ))

        if db_changes >= 1:
            patterns.append(self._create_pattern_memory(
                title="Database schema changes",
                content=f"Modified {db_changes} database-related files",
                category="database",
                commit=commit,
                confidence=0.8
            ))

        return patterns

    def _create_pattern_memory(
        self,
        title: str,
        content: str,
        category: str,
        commit,
        confidence: float
    ) -> MemoryEntry:
        """Create a pattern memory entry."""
        return MemoryEntry(
            id=self._generate_memory_id(),
            type="pattern",
            title=title,
            content=content,
            category=category,
            tags=[],
            created_at=datetime.fromtimestamp(commit.committed_date),
            updated_at=datetime.fromtimestamp(commit.committed_date),
            source="git-commit",
            confidence=confidence,
            source_ref=commit.hexsha,
        )

    def _extract_tags(self, text: str) -> List[str]:
        """Extract potential tags from text."""
        # Common tech keywords
        tech_keywords = [
            'api', 'rest', 'graphql', 'authentication', 'auth',
            'database', 'postgresql', 'mysql', 'redis', 'cache',
            'frontend', 'backend', 'react', 'vue', 'angular',
            'test', 'testing', 'ci', 'cd', 'docker',
        ]

        tags = []
        text_lower = text.lower()
        for keyword in tech_keywords:
            if keyword in text_lower:
                tags.append(keyword)

        return tags[:5]  # Limit to 5 tags

    def _generate_memory_id(self) -> str:
        """Generate unique memory ID."""
        from datetime import datetime
        now = datetime.now()
        date_str = now.strftime("%Y%m%d")

        memories = self.memory.list_all()
        today_memories = [m for m in memories if m.id.startswith(f"MEM-{date_str}")]

        if not today_memories:
            seq = 1
        else:
            seqs = [int(m.id.split("-")[-1]) for m in today_memories]
            seq = max(seqs) + 1

        return f"MEM-{date_str}-{seq:03d}"
```

### Task 2: Write Comprehensive Tests (Day 16-18)

**File**: `tests/semantic/test_memory_extractor.py`

Write 15+ tests:

```python
import pytest
from pathlib import Path
from datetime import datetime, timedelta
from clauxton.semantic.memory_extractor import MemoryExtractor
from clauxton.core.memory import Memory, MemoryEntry
from git import Repo

def test_extract_decision_from_feat_commit(tmp_path):
    """Test extracting decision from 'feat:' commit."""
    # Setup git repo with commit
    repo = Repo.init(tmp_path)
    (tmp_path / "test.txt").write_text("test")
    repo.index.add(["test.txt"])
    commit = repo.index.commit("feat: Add user authentication\n\nImplement JWT-based auth")

    # Extract
    extractor = MemoryExtractor(tmp_path)
    memories = extractor.extract_from_commit(commit.hexsha)

    # Assert
    assert len(memories) == 1
    assert memories[0].type == "decision"
    assert "authentication" in memories[0].title.lower()
    assert memories[0].confidence >= 0.8
    assert memories[0].source == "git-commit"

def test_extract_decision_from_migration_commit(tmp_path):
    """Test extracting decision from migration commit."""
    repo = Repo.init(tmp_path)
    (tmp_path / "test.txt").write_text("test")
    repo.index.add(["test.txt"])
    commit = repo.index.commit("Migrate to PostgreSQL for better performance")

    extractor = MemoryExtractor(tmp_path)
    memories = extractor.extract_from_commit(commit.hexsha)

    assert len(memories) >= 1
    decision = next((m for m in memories if m.type == "decision"), None)
    assert decision is not None
    assert "postgresql" in decision.title.lower()
    assert decision.confidence >= 0.9

def test_detect_api_pattern_from_diff(tmp_path):
    """Test detecting API pattern from file changes."""
    repo = Repo.init(tmp_path)

    # First commit
    (tmp_path / "api_v1.py").write_text("def get_users(): pass")
    repo.index.add(["api_v1.py"])
    repo.index.commit("Initial API")

    # Second commit with API changes
    (tmp_path / "api_v1.py").write_text("def get_users(): return []\ndef create_user(): pass")
    (tmp_path / "api_routes.py").write_text("routes = []")
    repo.index.add(["api_v1.py", "api_routes.py"])
    commit = repo.index.commit("Add more API endpoints")

    extractor = MemoryExtractor(tmp_path)
    memories = extractor.extract_from_commit(commit.hexsha)

    patterns = [m for m in memories if m.type == "pattern"]
    assert len(patterns) >= 1
    assert any("api" in p.category.lower() for p in patterns)

def test_extract_from_recent_commits(tmp_path):
    """Test extracting from recent commits."""
    repo = Repo.init(tmp_path)

    # Create 3 commits
    for i in range(3):
        (tmp_path / f"file{i}.txt").write_text(f"content {i}")
        repo.index.add([f"file{i}.txt"])
        repo.index.commit(f"feat: Add feature {i}")

    extractor = MemoryExtractor(tmp_path)
    memories = extractor.extract_from_recent_commits(since_days=7)

    assert len(memories) >= 3  # At least one per commit

def test_auto_add_extracted_memories(tmp_path):
    """Test auto-adding extracted memories to memory system."""
    repo = Repo.init(tmp_path)
    (tmp_path / "test.txt").write_text("test")
    repo.index.add(["test.txt"])
    repo.index.commit("feat: Add authentication")

    extractor = MemoryExtractor(tmp_path)
    memories = extractor.extract_from_recent_commits(since_days=7, auto_add=True)

    # Check memories were added
    memory_system = Memory(tmp_path)
    all_memories = memory_system.list_all()
    assert len(all_memories) >= len(memories)

def test_confidence_scoring(tmp_path):
    """Test confidence scoring for different commit types."""
    repo = Repo.init(tmp_path)
    (tmp_path / "test.txt").write_text("test")
    repo.index.add(["test.txt"])

    # High confidence: migration
    commit1 = repo.index.commit("Migrate to PostgreSQL")
    extractor = MemoryExtractor(tmp_path)
    memories1 = extractor.extract_from_commit(commit1.hexsha)
    if memories1:
        assert memories1[0].confidence >= 0.9

    # Medium confidence: feat
    (tmp_path / "test2.txt").write_text("test")
    repo.index.add(["test2.txt"])
    commit2 = repo.index.commit("feat: Add new feature")
    memories2 = extractor.extract_from_commit(commit2.hexsha)
    if memories2:
        assert memories2[0].confidence >= 0.7

# Add 8+ more tests for:
# - Empty commits
# - Merge commits
# - Tag extraction
# - Edge cases (very long messages, special characters)
# - Performance (large diffs)
```

## Quality Requirements

### 1. Code Review
- [ ] Follow code style: CLAUDE.md
- [ ] Type hints: 100%
- [ ] Docstrings: Google style
- [ ] No code smells

### 2. Performance
- [ ] Extract <100ms per commit
- [ ] Handle large diffs efficiently
- [ ] Add performance benchmarks

### 3. Testing
- [ ] 15+ comprehensive tests
- [ ] Coverage >90%
- [ ] Test all patterns (feat, refactor, migration, etc.)
- [ ] Test edge cases

### 4. Lint & Type Check
- [ ] mypy --strict: Pass
- [ ] ruff check: Pass

## Deliverables

1. `clauxton/semantic/memory_extractor.py` (~350 lines)
2. `tests/semantic/test_memory_extractor.py` (15+ tests)
3. Coverage report (>90%)
4. Performance benchmarks
5. Completion report

## Expected Duration

5-6 days

## Success Criteria

- [ ] All tests pass
- [ ] Coverage >90%
- [ ] Type check passes
- [ ] Lint passes
- [ ] Performance targets met
- [ ] Can extract decisions and patterns from real commits
```

---

## 📝 Agent 7 Detailed Prompt

**Agent**: Relationship Detection
**Duration**: 6 days
**Depends on**: Phase 1 complete ✅
**Runs in parallel with**: Agent 6

### Full Prompt for Task Tool

```markdown
# Agent 7: Memory Relationship Detection

## Context

You are implementing **relationship detection** for Clauxton v0.15.0, which automatically finds connections between memories.

**Project**: Clauxton - Claude Code plugin
**Version**: v0.15.0 Unified Memory Model
**Phase**: Phase 2 (Smart Memory), Day 13-18
**Branch**: feature/v0.15.0-unified-memory
**Dependencies**: Phase 1 complete ✅

## Documentation to Read First

1. `docs/v0.15.0_IMPLEMENTATION_PLAN.md` (Week 3-4 section, lines 336-383)
2. `docs/v0.15.0_PHASE2-4_EXECUTION_PLAN.md` (Phase 2 section)
3. `CLAUDE.md` (Code style guidelines)
4. Review existing code:
   - `clauxton/core/memory.py` (Memory system)
   - `clauxton/core/search.py` (TF-IDF search)

## Tasks

### Task 1: Implement MemoryLinker (Day 13-15)

**File**: `clauxton/semantic/memory_linker.py`

Create a class that detects relationships between memories:

```python
from pathlib import Path
from typing import List, Tuple, Optional
from clauxton.core.memory import Memory, MemoryEntry
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from difflib import SequenceMatcher

class MemoryLinker:
    """Auto-detect relationships between memories."""

    def __init__(self, project_root: Path):
        """
        Initialize linker.

        Args:
            project_root: Project root directory
        """
        self.project_root = project_root
        self.memory = Memory(project_root)

    def find_relationships(
        self,
        entry: MemoryEntry,
        existing_memories: Optional[List[MemoryEntry]] = None,
        threshold: float = 0.3
    ) -> List[str]:
        """
        Find related memories.

        Uses multiple signals:
        1. Semantic similarity (TF-IDF cosine similarity)
        2. Shared tags (high weight)
        3. Same category (medium weight)
        4. Temporal proximity (low weight)
        5. File/code overlap (for code type)

        Args:
            entry: Memory entry to find relationships for
            existing_memories: List of existing memories (or fetch all)
            threshold: Minimum similarity score (0.0-1.0)

        Returns:
            List of related memory IDs sorted by relevance
        """
        if existing_memories is None:
            existing_memories = self.memory.list_all()

        # Don't relate to self
        existing_memories = [m for m in existing_memories if m.id != entry.id]

        if not existing_memories:
            return []

        # Calculate scores for each signal
        similarities = []

        for other in existing_memories:
            score = 0.0

            # 1. Semantic similarity (weight: 0.4)
            content_sim = self._content_similarity(entry, other)
            score += content_sim * 0.4

            # 2. Shared tags (weight: 0.3)
            tag_sim = self._tag_similarity(entry, other)
            score += tag_sim * 0.3

            # 3. Same category (weight: 0.2)
            if entry.category == other.category:
                score += 0.2

            # 4. Temporal proximity (weight: 0.1)
            temporal_sim = self._temporal_similarity(entry, other)
            score += temporal_sim * 0.1

            similarities.append((other.id, score))

        # Filter by threshold and sort
        related = [(id, score) for id, score in similarities if score >= threshold]
        related.sort(key=lambda x: x[1], reverse=True)

        # Return top 5 by default
        return [id for id, score in related[:5]]

    def auto_link_all(self, threshold: float = 0.3) -> int:
        """
        Auto-link all memories.

        Args:
            threshold: Minimum similarity score

        Returns:
            Number of relationships created
        """
        memories = self.memory.list_all()
        relationships_created = 0

        for entry in memories:
            # Find relationships
            related_ids = self.find_relationships(entry, memories, threshold)

            # Update if new relationships found
            if related_ids and related_ids != entry.related_to:
                self.memory.update(entry.id, related_to=related_ids)
                relationships_created += len(related_ids)

        return relationships_created

    def suggest_merge_candidates(
        self,
        threshold: float = 0.8
    ) -> List[Tuple[str, str, float]]:
        """
        Find duplicate/similar memories that should be merged.

        Args:
            threshold: Minimum similarity score (0.0-1.0)

        Returns:
            List of (memory_id1, memory_id2, similarity_score) tuples
        """
        memories = self.memory.list_all()
        candidates = []

        for i, mem1 in enumerate(memories):
            for mem2 in memories[i+1:]:
                # Same type is required for merge
                if mem1.type != mem2.type:
                    continue

                # Calculate overall similarity
                similarity = self._merge_similarity(mem1, mem2)

                if similarity >= threshold:
                    candidates.append((mem1.id, mem2.id, similarity))

        # Sort by similarity (highest first)
        candidates.sort(key=lambda x: x[2], reverse=True)

        return candidates

    def _content_similarity(
        self,
        mem1: MemoryEntry,
        mem2: MemoryEntry
    ) -> float:
        """
        Calculate content similarity using TF-IDF.

        Args:
            mem1: First memory
            mem2: Second memory

        Returns:
            Similarity score (0.0-1.0)
        """
        try:
            # Combine title and content
            text1 = f"{mem1.title} {mem1.content}"
            text2 = f"{mem2.title} {mem2.content}"

            # TF-IDF vectorization
            vectorizer = TfidfVectorizer()
            vectors = vectorizer.fit_transform([text1, text2])

            # Cosine similarity
            similarity = cosine_similarity(vectors[0], vectors[1])[0][0]

            return float(similarity)
        except Exception:
            # Fallback to simple word overlap
            words1 = set(mem1.content.lower().split())
            words2 = set(mem2.content.lower().split())

            if not words1 or not words2:
                return 0.0

            overlap = len(words1 & words2)
            total = len(words1 | words2)

            return overlap / total if total > 0 else 0.0

    def _tag_similarity(
        self,
        mem1: MemoryEntry,
        mem2: MemoryEntry
    ) -> float:
        """
        Calculate tag similarity (Jaccard index).

        Args:
            mem1: First memory
            mem2: Second memory

        Returns:
            Similarity score (0.0-1.0)
        """
        if not mem1.tags or not mem2.tags:
            return 0.0

        tags1 = set(mem1.tags)
        tags2 = set(mem2.tags)

        intersection = len(tags1 & tags2)
        union = len(tags1 | tags2)

        return intersection / union if union > 0 else 0.0

    def _temporal_similarity(
        self,
        mem1: MemoryEntry,
        mem2: MemoryEntry
    ) -> float:
        """
        Calculate temporal proximity.

        Memories created close in time are more likely related.

        Args:
            mem1: First memory
            mem2: Second memory

        Returns:
            Similarity score (0.0-1.0)
        """
        # Time difference in days
        time_diff = abs((mem1.created_at - mem2.created_at).days)

        # Decay function: similar if within 7 days
        if time_diff <= 7:
            return 1.0 - (time_diff / 7.0)
        else:
            return 0.0

    def _merge_similarity(
        self,
        mem1: MemoryEntry,
        mem2: MemoryEntry
    ) -> float:
        """
        Calculate overall similarity for merge detection.

        Args:
            mem1: First memory
            mem2: Second memory

        Returns:
            Similarity score (0.0-1.0)
        """
        # Title similarity (Levenshtein-based)
        title_sim = SequenceMatcher(None, mem1.title, mem2.title).ratio()

        # Content similarity (TF-IDF)
        content_sim = self._content_similarity(mem1, mem2)

        # Category match
        category_match = 1.0 if mem1.category == mem2.category else 0.0

        # Weighted average
        similarity = (
            title_sim * 0.3 +
            content_sim * 0.5 +
            category_match * 0.2
        )

        return similarity
```

### Task 2: Write Comprehensive Tests (Day 16-18)

**File**: `tests/semantic/test_memory_linker.py`

Write 15+ tests:

```python
import pytest
from pathlib import Path
from datetime import datetime, timedelta
from clauxton.semantic.memory_linker import MemoryLinker
from clauxton.core.memory import Memory, MemoryEntry

def test_find_relationships_by_tags(tmp_path):
    """Test finding relationships based on shared tags."""
    memory = Memory(tmp_path)

    # Add memories with shared tags
    now = datetime.now()
    mem1 = MemoryEntry(
        id="MEM-20251103-001",
        type="knowledge",
        title="API Authentication",
        content="Use JWT tokens",
        category="api",
        tags=["api", "authentication", "jwt"],
        created_at=now,
        updated_at=now,
        source="manual",
    )
    mem2 = MemoryEntry(
        id="MEM-20251103-002",
        type="knowledge",
        title="User Login",
        content="Login with JWT",
        category="auth",
        tags=["authentication", "jwt", "login"],
        created_at=now,
        updated_at=now,
        source="manual",
    )
    mem3 = MemoryEntry(
        id="MEM-20251103-003",
        type="knowledge",
        title="Database Schema",
        content="PostgreSQL schema",
        category="database",
        tags=["database", "postgresql"],
        created_at=now,
        updated_at=now,
        source="manual",
    )

    memory.add(mem1)
    memory.add(mem2)
    memory.add(mem3)

    # Find relationships for mem1
    linker = MemoryLinker(tmp_path)
    related = linker.find_relationships(mem1)

    # mem2 should be related (shared tags: authentication, jwt)
    # mem3 should not be related (no shared tags)
    assert "MEM-20251103-002" in related
    assert "MEM-20251103-003" not in related

def test_find_relationships_by_content(tmp_path):
    """Test finding relationships based on content similarity."""
    memory = Memory(tmp_path)

    now = datetime.now()
    mem1 = MemoryEntry(
        id="MEM-20251103-001",
        type="knowledge",
        title="REST API Design",
        content="Use RESTful principles for API design with proper HTTP methods",
        category="api",
        tags=[],
        created_at=now,
        updated_at=now,
        source="manual",
    )
    mem2 = MemoryEntry(
        id="MEM-20251103-002",
        type="knowledge",
        title="API Best Practices",
        content="RESTful API should follow HTTP standards and use proper methods",
        category="api",
        tags=[],
        created_at=now,
        updated_at=now,
        source="manual",
    )

    memory.add(mem1)
    memory.add(mem2)

    linker = MemoryLinker(tmp_path)
    related = linker.find_relationships(mem1)

    # High content similarity should create relationship
    assert "MEM-20251103-002" in related

def test_auto_link_all(tmp_path):
    """Test auto-linking all memories."""
    memory = Memory(tmp_path)

    now = datetime.now()
    for i in range(5):
        mem = MemoryEntry(
            id=f"MEM-20251103-{i+1:03d}",
            type="knowledge",
            title=f"Memory {i}",
            content=f"Content about topic {i % 2}",  # Some overlap
            category="test",
            tags=[f"tag{i % 2}"],
            created_at=now,
            updated_at=now,
            source="manual",
        )
        memory.add(mem)

    linker = MemoryLinker(tmp_path)
    count = linker.auto_link_all(threshold=0.2)

    # Should create some relationships
    assert count > 0

def test_suggest_merge_candidates(tmp_path):
    """Test finding duplicate memories."""
    memory = Memory(tmp_path)

    now = datetime.now()
    # Very similar memories (merge candidates)
    mem1 = MemoryEntry(
        id="MEM-20251103-001",
        type="knowledge",
        title="API Authentication",
        content="Use JWT tokens for authentication",
        category="api",
        tags=["api", "auth"],
        created_at=now,
        updated_at=now,
        source="manual",
    )
    mem2 = MemoryEntry(
        id="MEM-20251103-002",
        type="knowledge",
        title="API Authentication",
        content="Use JWT tokens for auth",
        category="api",
        tags=["api", "auth"],
        created_at=now,
        updated_at=now,
        source="manual",
    )
    # Different memory (not a merge candidate)
    mem3 = MemoryEntry(
        id="MEM-20251103-003",
        type="knowledge",
        title="Database Schema",
        content="PostgreSQL schema design",
        category="database",
        tags=["database"],
        created_at=now,
        updated_at=now,
        source="manual",
    )

    memory.add(mem1)
    memory.add(mem2)
    memory.add(mem3)

    linker = MemoryLinker(tmp_path)
    candidates = linker.suggest_merge_candidates(threshold=0.8)

    # mem1 and mem2 should be merge candidates
    assert len(candidates) >= 1
    assert any(
        (mem1.id in (id1, id2) and mem2.id in (id1, id2))
        for id1, id2, score in candidates
    )

def test_temporal_similarity(tmp_path):
    """Test temporal proximity detection."""
    memory = Memory(tmp_path)

    now = datetime.now()
    old = now - timedelta(days=30)

    # Recent memory
    mem1 = MemoryEntry(
        id="MEM-20251103-001",
        type="knowledge",
        title="Recent Memory",
        content="Recent content",
        category="test",
        tags=["test"],
        created_at=now,
        updated_at=now,
        source="manual",
    )
    # Recent memory (within 7 days)
    mem2 = MemoryEntry(
        id="MEM-20251103-002",
        type="knowledge",
        title="Also Recent",
        content="Recent content",
        category="test",
        tags=["test"],
        created_at=now - timedelta(days=3),
        updated_at=now - timedelta(days=3),
        source="manual",
    )
    # Old memory (30 days ago)
    mem3 = MemoryEntry(
        id="MEM-20251103-003",
        type="knowledge",
        title="Old Memory",
        content="Old content",
        category="test",
        tags=["test"],
        created_at=old,
        updated_at=old,
        source="manual",
    )

    memory.add(mem1)
    memory.add(mem2)
    memory.add(mem3)

    linker = MemoryLinker(tmp_path)
    related = linker.find_relationships(mem1)

    # mem2 should be more related than mem3 due to temporal proximity
    # (even with same tags and content similarity)
    assert "MEM-20251103-002" in related

# Add 10+ more tests for:
# - Category-based relationships
# - Self-linking prevention
# - Empty memory lists
# - Edge cases (no tags, no content)
# - Performance (1000 memories)
```

## Quality Requirements

### 1. Code Review
- [ ] Follow code style: CLAUDE.md
- [ ] Type hints: 100%
- [ ] Docstrings: Google style
- [ ] No code smells

### 2. Performance
- [ ] Link <200ms for 1,000 entries
- [ ] Handle large memory sets efficiently
- [ ] Add performance benchmarks

### 3. Testing
- [ ] 15+ comprehensive tests
- [ ] Coverage >90%
- [ ] Test all similarity signals
- [ ] Test edge cases

### 4. Lint & Type Check
- [ ] mypy --strict: Pass
- [ ] ruff check: Pass

## Deliverables

1. `clauxton/semantic/memory_linker.py` (~300 lines)
2. `tests/semantic/test_memory_linker.py` (15+ tests)
3. Coverage report (>90%)
4. Performance benchmarks
5. Completion report

## Expected Duration

5-6 days

## Success Criteria

- [ ] All tests pass
- [ ] Coverage >90%
- [ ] Type check passes
- [ ] Lint passes
- [ ] Performance targets met
- [ ] Can find relationships between real memories
```

---

## 🎯 Step 2: Wait for Agents 6 & 7 Completion (6 days)

Monitor progress:
- Check completion reports from both agents
- Verify all tests pass
- Review code quality

Expected completion: Day 18

---

## 🎯 Step 3: Launch Agent 8 (Day 19-22, 4 days)

After Agents 6 & 7 complete, launch Agent 8 for CLI commands.

**Prompt for Agent 8**: See `docs/v0.15.0_PHASE2-4_EXECUTION_PLAN.md` (Agent 8 section)

---

## 🎯 Step 4: Quality Review (Day 23-24, 1.5 days)

After all 3 agents complete:

```
Launch Review Agent with improved prompts (same as Phase 1).

Use the prompt from: docs/QUALITY_REVIEW_PROMPTS.md (Prompt 1)

Context:
- Phase: Phase 2 - Smart Memory
- Deliverables: 3 files, 40 tests
- Review all Phase 2 files

Generate 4 reports:
1. Executive Summary
2. Quality Review
3. Detailed Findings
4. Improvement Tasks
```

---

## 🎯 Step 5: Execute Improvements (If needed, 1-2 days)

If Review Agent identifies issues:
- Launch improvement SubAgents **in parallel**
- Each SubAgent fixes one issue
- Re-run tests after fixes

---

## ✅ Phase 2 Completion Criteria

- [ ] All 3 agents completed
- [ ] 40+ tests passing
- [ ] Coverage >90%
- [ ] Quality Score >90/100
- [ ] All improvements completed
- [ ] Documentation updated

---

## 🚨 Troubleshooting

### Issue: Agent dependencies not met

**Solution**: Verify Phase 1 tests pass:
```bash
./.venv/bin/pytest tests/core/test_memory.py -v
```

### Issue: Git repository issues

**Solution**: Check git repo exists:
```bash
git status
git log --oneline | head -5
```

### Issue: Tests failing

**Solution**: Check test environment:
```bash
./.venv/bin/pytest --version
./.venv/bin/python -c "import git; print(git.__version__)"
```

---

## 📚 Reference Documents

- **Phase 2-4 Full Plan**: `docs/v0.15.0_PHASE2-4_EXECUTION_PLAN.md`
- **Implementation Plan**: `docs/v0.15.0_IMPLEMENTATION_PLAN.md`
- **SubAgent Plan**: `docs/v0.15.0_SUBAGENT_PLAN.md`
- **Quality Prompts**: `docs/QUALITY_REVIEW_PROMPTS.md`
- **Code Style**: `CLAUDE.md`

---

## 🎉 Success!

After Phase 2 completion:
1. Commit all changes
2. Create completion summary
3. Proceed to Phase 3 (Memory Intelligence)

---

**Last Updated**: 2025-11-03
**Status**: ✅ Ready to Execute
**Next Action**: Copy prompts to new Claude Code session and launch Agents 6 & 7 in parallel
