# DeepRails Python SDK

A lightweight, intuitive Python SDK for interacting with the DeepRails API. DeepRails helps you evaluate and improve AI-generated outputs through a comprehensive set of guardrail metrics.

## Installation

```bash
pip install deeprails
```

## Quick Start

```python
from deeprails import DeepRails

# Initialize with your API token
client = DeepRails(token="YOUR_API_KEY")

# Create an evaluation
evaluation = client.create_evaluation(
    model_input={"user_prompt": "Prompt used to generate completion"},
    model_output="Generated output",
    model_used="gpt-4o-mini (LLM used to generate completion)",
    guardrail_metrics=["correctness", "completeness"]
)

# Print evaluation ID
print(f"Evaluation created with ID: {evaluation.eval_id}")
```

## Features

- **Simple API**: Just a few lines of code to integrate evaluation into your workflow
- **Comprehensive Metrics**: Evaluate outputs on correctness, completeness, and more
- **Real-time Progress**: Track evaluation progress in real-time
- **Detailed Results**: Get detailed scores and rationales for each metric

## Authentication

All API requests require authentication using your DeepRails API key. Your API key is a sensitive credential that should be kept secure.

```python
# Best practice: Load token from environment variable
import os
token = os.environ.get("DEEPRAILS_API_KEY")
client = DeepRails(token=token)
```

## Creating Evaluations

```python
try:
    evaluation = client.create_evaluation(
        model_input={"user_prompt": "Prompt used to generate completion"},
        model_output="Generated output",
        model_used="gpt-4o-mini (LLM used to generate completion)",
        guardrail_metrics=["correctness", "completeness"]
    )
    print(f"ID: {evaluation.eval_id}")
    print(f"Status: {evaluation.evaluation_status}")
    print(f"Progress: {evaluation.progress}%")
except Exception as e:
    print(f"Error: {e}")
```

### Parameters

- `model_input`: Dictionary containing the prompt and any context (must include `user_prompt`)
- `model_output`: The generated output to evaluate
- `model_used`: (Optional) The model that generated the output
- `run_mode`: (Optional) Evaluation run mode - defaults to "smart"
- `guardrail_metrics`: (Optional) List of metrics to evaluate
- `nametag`: (Optional) Custom identifier for this evaluation
- `webhook`: (Optional) URL to receive completion notifications

## Retrieving Evaluations

```python
try:
    eval_id = "eval-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    evaluation = client.get_evaluation(eval_id)
    
    print(f"Status: {evaluation.evaluation_status}")
    
    if evaluation.evaluation_result:
        print("\nResults:")
        for metric, result in evaluation.evaluation_result.items():
            score = result.get('score', 'N/A')
            print(f"  {metric}: {score}")
except Exception as e:
    print(f"Error: {e}")
```

## Available Metrics

- `correctness`: Measures factual accuracy by evaluating whether each claim in the output is true and verifiable.
- `completeness`: Assesses whether the response addresses all necessary parts of the prompt with sufficient detail and relevance.
- `instruction_adherence`: Checks whether the AI followed the explicit instructions in the prompt and system directives.
- `context_adherence`: Determines whether each factual claim is directly supported by the provided context.
- `ground_truth_adherence`: Measures how closely the output matches a known correct answer (gold standard).
- `comprehensive_safety`: Detects and categorizes safety violations across areas like PII, CBRN, hate speech, self-harm, and more.


## Error Handling

The SDK throws `DeepRailsAPIError` for API-related errors, with status code and detailed message.

```python
from deeprails import DeepRailsAPIError

try:
    # SDK operations
except DeepRailsAPIError as e:
    print(f"API Error: {e.status_code} - {e.error_detail}")
except Exception as e:
    print(f"Unexpected error: {e}")
```

## Support

For questions or support, please contact support@deeprails.ai.