# Intelligent Search

Greb uses 2-turn agentic search with AST-based code navigation, inspired by Fast Context from Cognition.ai.

## Overview

Intelligent search provides:

- **2 strategic search turns** for maximum speed (matches Windsurf)
- **Turn 1**: 32 parallel keyword searches
- **Turn 2**: 64 parallel combined (AST + expansion + gap filling)
- **AST-based code navigation** - follows imports, function calls, and class references
- **Intelligent gap filling** - discovers related files, tests, configs, and documentation
- **Blazing fast** - only 1 LLM call for reranking, competitive with Windsurf

## How It Works

### Turn 1: Intelligent Keyword Search (32 parallel)
- Selects the best keywords from your provided list using heuristics
- Avoids common words and prioritizes structured identifiers
- Executes 32 parallel grep searches

### Turn 2: Combined AST + Expansion + Gap Filling (64 parallel)
All follow-up searches executed in ONE massive parallel turn:
- **AST references**: Analyzes top 5 files, extracts imports/functions/classes (20 searches)
- **Context expansion**: Reads promising files, extracts identifiers (15 searches)
- **Gap filling**: Tests, configs, docs, sibling directories (10 searches)
- **Total**: Up to 64 parallel searches for comprehensive coverage

## Configuration

### Tuning Options (Optional)

```bash
# Number of search turns (default: 2, maximum speed)
export GREB_MAX_TURNS=2

# Turn 1 parallel searches (default: 32)
export GREB_PARALLEL_TURN1=32

# Turn 2 parallel searches (default: 64)
export GREB_PARALLEL_TURN2=64

# Enable/disable AST parsing (default: true)
export GREB_ENABLE_AST_PARSING=true
```

## Usage Examples

### With Python Client

```python
from greb import GrebClient

# Intelligent search is enabled by default - just use the client!
client = GrebClient(
    api_key='your_api_key',
    base_url='https://search.grebmcp.com'
)

# Intelligent search will automatically follow code references
results = client.search(
    query='find authentication middleware implementation',
    keywords={
        "primary_terms": ["authentication", "middleware"],
        "file_patterns": ["*.js", "*.py"],
        "intent": "find authentication middleware functions"
    },
    directory='./src'
)

for result in results.results:
    print(f"{result.path}: {result.summary}")
    print(f"Score: {result.score:.3f}")
```

### With MCP Server

Configure your MCP server:

```json
{
  "mcpServers": {
    "greb-mcp": {
      "disabled": false,
      "timeout": 60,
      "type": "stdio",
      "command": "greb-mcp",
      "args": [],
      "env": {
        "GREB_API_KEY": "your_api_key",
        "GREB_API_URL": "https://search.grebmcp.com"
      }
    }
  }
}
```

## Performance

| Metric | Specification |
|--------|---------------|
| Search turns | 2 (maximum speed) |
| Turn 1 parallel | 32 (keyword search) |
| Turn 2 parallel | 64 (AST + expansion + gap fill) |
| Total searches | Up to 96 parallel operations |
| Code navigation | AST + imports + references |
| LLM calls | 1 (rerank only) |
| Typical latency | **~7 seconds** (matches Windsurf) |

## Best For:
- **Tracing execution flows** - "where is login handled?"
- **Following references** - "find where this function is called"
- **Complex queries** - "how does authentication work?"
- **Large codebases** - intelligently navigates instead of keyword spam
- **Finding related files** - discovers tests, configs, documentation automatically

## Architecture Details

### AST Parsing Performance
- Python files: ~30ms per file using `ast` module
- JavaScript/TypeScript: ~20ms using regex patterns
- Java/Go/Rust: ~25ms using regex patterns
- Total AST overhead: <200ms for typical search

### No Additional LLM Calls
The intelligent mode uses **zero additional LLM calls** compared to classic mode:
- Turn strategy is deterministic (heuristic-based)
- AST parsing is local (no API calls)
- Only 1 LLM call for final reranking (same as classic)

## Supported Languages

AST parsing supports:
- **Python** (.py) - Full AST analysis
- **JavaScript/TypeScript** (.js, .ts, .jsx, .tsx) - Regex-based
- **Java** (.java) - Regex-based
- **Go** (.go) - Regex-based
- **Rust** (.rs) - Regex-based
- **Generic** - Identifier extraction for other languages

## Troubleshooting

### Intelligent mode not working

Check if the environment variable is set:
```bash
echo $GREB_USE_INTELLIGENT_SEARCH
```

### Slower than expected

Try adjusting parallel searches:
```bash
export GREB_PARALLEL_PER_TURN=4  # Reduce if system is slow
```

### AST parsing errors

Disable AST parsing to fall back to keyword-only search:
```bash
export GREB_ENABLE_AST_PARSING=false
```

## Development

### Running Tests

```bash
# Run unit tests for code analyzer
pytest tests/test_code_analyzer.py

# Run integration tests
pytest tests/test_intelligent_orchestrator.py

# Compare classic vs intelligent mode
python examples/compare_modes.py
```

### Adding New Language Support

To add AST parsing for a new language:

1. Add parser method to `FastCodeAnalyzer` in `src/pipeline/code_analyzer.py`
2. Add file extension to the dispatch logic
3. Extract imports, function definitions, and class definitions
4. Return list of `CodeReference` objects

Example:
```python
def _parse_ruby_fast(self, file_path: str) -> List[CodeReference]:
    refs = []
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()

    # Extract requires
    requires = re.findall(r"require\s+['\"]([^'\"]+)['\"]", content)
    for req in requires:
        refs.append(CodeReference(
            type='import',
            name=req,
            priority=0.8
        ))

    return self._deduplicate_references(refs)
```

## Feedback

Found a bug or have a feature request? Open an issue on GitHub or contact us at cheetahai69@gmail.com.
