Metadata-Version: 2.4
Name: weni-agenteval
Version: 1.0.7
Summary: A generative AI-powered framework for testing virtual agents.
Author-email: Weni <john.cordeiro@vtex.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/weni-ai/agent-evaluation
Project-URL: Documentation, https://github.com/weni-ai/agent-evaluation
Project-URL: Repository, https://github.com/weni-ai/agent-evaluation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Utilities
Classifier: Topic :: Software Development :: Testing
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: pyyaml~=6.0
Requires-Dist: boto3<2.0,>=1.34.20
Requires-Dist: click~=8.0
Requires-Dist: pydantic<3.0,>=2.1.0
Requires-Dist: rich<14.0,>=13.7.0
Requires-Dist: jinja2<4.0,>=3.1.3
Requires-Dist: jsonpath-ng<2.0,>=1.6.1
Requires-Dist: websocket-client<2.0,>=1.6.0
Requires-Dist: requests==2.32.5
Provides-Extra: dev
Requires-Dist: flake8; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: mkdocs; extra == "dev"
Requires-Dist: mkdocs-material; extra == "dev"
Requires-Dist: mkdocstrings[python]; extra == "dev"
Requires-Dist: mkdocs-click; extra == "dev"
Requires-Dist: bandit; extra == "dev"
Requires-Dist: pip-audit; extra == "dev"
Dynamic: license-file

![GitHub License](https://img.shields.io/github/license/awslabs/agent-evaluation)
[![PyPI version](https://badge.fury.io/py/weni-agenteval.svg)](https://pypi.org/project/weni-agenteval/)
[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Built with Material for MkDocs](https://img.shields.io/badge/Material_for_MkDocs-526CFE?style=for-the-badge&logo=MaterialForMkDocs&logoColor=white)](https://squidfunk.github.io/mkdocs-material/)

# Agent Evaluation - Weni Fork

> **Note**: This is a fork of the original [Agent Evaluation framework by AWS Labs](https://github.com/awslabs/agent-evaluation). This fork adds support for testing [Weni](https://weni.ai) conversational AI agents while maintaining all the original functionality for AWS services.

Agent Evaluation is a generative AI-powered framework for testing virtual agents.

Internally, Agent Evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

## ✨ Key features

- **🆕 Weni Agent Support**: Built-in support for testing [Weni](https://weni.ai) conversational AI agents through their API and WebSocket interface.
- Built-in support for popular AWS services including [Amazon Bedrock](https://aws.amazon.com/bedrock/), [Amazon Q Business](https://aws.amazon.com/q/business/), and [Amazon SageMaker](https://aws.amazon.com/sagemaker/). You can also [bring your own agent](https://awslabs.github.io/agent-evaluation/targets/custom_targets/) to test using Agent Evaluation.
- Orchestrate concurrent, multi-turn conversations with your agent while evaluating its responses.
- Define [hooks](https://awslabs.github.io/agent-evaluation/hooks/) to perform additional tasks such as integration testing.
- Can be incorporated into CI/CD pipelines to expedite the time to delivery while maintaining the stability of agents in production environments.

## 🚀 Quick Start with Weni

### Installation

Install the package from PyPI:

```bash
pip install weni-agenteval
```

**Alternative: Install from source**

If you want to install from source for development:

```bash
git clone https://github.com/weni-ai/agent-evaluation.git
cd agent-evaluation
pip install -e .
```

### Prerequisites for Weni Target

> **⚠️ Important**: You need both AWS and Weni credentials to run evaluations!

To test Weni agents, you'll need:

1. **AWS Credentials**: Required for the evaluator (Claude model via Bedrock)
   - AWS Access Key ID
   - AWS Secret Access Key
   - AWS Session Token

2. **A Weni Project**: An active project in the [Weni platform](https://weni.ai)

3. **Weni Authentication**: Choose one of the following methods:

   **🚀 Option 1: Weni CLI (Recommended)**
   
   Install and authenticate with the Weni CLI:
   ```bash
   # Install Weni CLI
   pip install weni-cli
   
   # Authenticate with Weni
   weni login
   
   # Select your project
   weni project use [your-project-uuid]
   ```
   
   Get the Weni CLI from: https://github.com/weni-ai/weni-cli
   
   **📋 Option 2: Environment Variables**
   
   Set these environment variables manually:
   - `WENI_PROJECT_UUID`: Your project's unique identifier
   - `WENI_BEARER_TOKEN`: Your authentication bearer token
   
   **⚙️ Option 3: Configuration File**
   
   Provide credentials directly in your test configuration file.

### Setting up Environment Variables

> **💡 Note**: If you're using Weni CLI (Option 1 above), you only need to set AWS credentials. The Weni credentials will be handled automatically by the CLI.

**macOS/Linux:**
```bash
# AWS Credentials (required for evaluator)
export AWS_ACCESS_KEY_ID="your-aws-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
export AWS_SESSION_TOKEN="your-aws-session-token"

# Weni Credentials (only needed if NOT using Weni CLI)
export WENI_PROJECT_UUID="your-project-uuid-here"
export WENI_BEARER_TOKEN="your-bearer-token-here"
```

**Windows (Command Prompt):**
```cmd
REM AWS Credentials (required for evaluator)
set AWS_ACCESS_KEY_ID=your-aws-access-key-id
set AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
set AWS_SESSION_TOKEN=your-aws-session-token

REM Weni Credentials (only needed if NOT using Weni CLI)
set WENI_PROJECT_UUID=your-project-uuid-here
set WENI_BEARER_TOKEN=your-bearer-token-here
```

**Windows (PowerShell):**
```powershell
# AWS Credentials (required for evaluator)
$env:AWS_ACCESS_KEY_ID="your-aws-access-key-id"
$env:AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
$env:AWS_SESSION_TOKEN="your-aws-session-token"

# Weni Credentials (only needed if NOT using Weni CLI)
$env:WENI_PROJECT_UUID="your-project-uuid-here"
$env:WENI_BEARER_TOKEN="your-bearer-token-here"
```

### Basic Usage

Create a test configuration file `agenteval.yml`:

```yaml
evaluator:
  model: claude-haiku-3_5-us
  aws_region: us-east-1

target:
  type: weni

tests:
  greeting:
    steps:
      - Send a greeting "Olá, bom dia!"
      - Ask what "com oq vc pode me ajudar?"
    expected_results:
      - Agent responds with a friendly greeting
      - Agent shows up a menu with options to help the user

  purchase_outside_postal_code:
    steps:
      - Ask information "quero comprar arroz"
      - Give the postal code "04538-132"
    expected_results:
      - Agent responds asking for postal code
      - Agent says it doesn't deliver to this postal code
```

Run the evaluation:

```bash
weni-agenteval run
```

> **Note**: The tool automatically looks for `agenteval.yml` in the current directory. You can also specify a different directory with `--plan-dir` if needed.

**Additional CLI options:**
```bash
# Run with verbose output
weni-agenteval run --verbose

# Run specific tests only
weni-agenteval run --filter "greeting,purchase_outside_postal_code"

# Run from a different directory
weni-agenteval run --plan-dir /path/to/test/directory

# Run in watch mode for real-time conversation monitoring
weni-agenteval run --watch

# Combine watch mode with other options
weni-agenteval run --watch --filter "greeting" --verbose

# Initialize a new test plan template
weni-agenteval init
```

### 🔍 Watch Mode

For real-time monitoring of your tests, use the `--watch` flag to see conversations as they happen:

```bash
weni-agenteval run --watch
```

Watch mode provides:
- **Real-time conversation display**: See user messages and agent responses as they occur
- **Immediate feedback**: User prompts appear instantly when sent to the agent
- **Visual test results**: Clear ✅ PASS / ❌ FAIL indicators for each test
- **Sequential execution**: Tests run one at a time for readable output
- **Progress tracking**: Shows current test progress and overall completion

Perfect for development, debugging, and demonstrations!

### Configuration Options for Weni Target

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `type` | string | Yes | - | Must be `"weni"` |
| `timeout` | integer | No | `30` | Max seconds to wait for response |
| `weni_project_uuid` | string | No | - | Project UUID (use Weni CLI or env var instead) |
| `weni_bearer_token` | string | No | - | Bearer token (use Weni CLI or env var instead) |

### Advanced Example

Here's a more comprehensive test plan:

```yaml
evaluator:
  model: claude-haiku-3_5-us
  aws_region: us-east-1

target:
  type: weni
  timeout: 45

tests:
  # Basic greeting test
  greeting:
    steps:
      - Send a greeting "Olá, como você está?"
    expected_results:
      - Agent responds politely
      - Agent asks how it can help

  # Multi-turn conversation
  product_inquiry:
    steps:
      - Ask "Quais produtos vocês têm?"
      - Follow up with "Qual é o preço do arroz?"
      - Ask "Vocês entregam em São Paulo?"
    expected_results:
      - Agent provides product information
      - Agent gives pricing details
      - Agent confirms delivery area

  # Error handling
  unclear_input:
    steps:
      - Send unclear text "xyz123 !!!"
    expected_results:
      - Agent handles gracefully
      - Agent asks for clarification
      - No error messages shown to user

  # Context maintenance
  context_test:
    steps:
      - Say "Quero comprar feijão"
      - Ask "Qual o prazo de entrega?"
      - Ask "E o frete?"
    expected_results:
      - Agent remembers the product context
      - Agent provides delivery timeframe
      - Agent gives shipping cost information
```

## 📚 Documentation

**📖 Full Documentation**: Visit our comprehensive documentation at [https://weni-ai.github.io/weni-agent-evaluation/](https://weni-ai.github.io/weni-agent-evaluation/)

The documentation includes:
- Complete installation guide with authentication setup
- Step-by-step user guide with Weni examples
- Detailed Weni target configuration
- CLI reference and troubleshooting guides
- CI/CD integration examples

For the original AWS Labs framework features, you can also visit [https://awslabs.github.io/agent-evaluation/](https://awslabs.github.io/agent-evaluation/).

To contribute, please refer to [CONTRIBUTING.md](./CONTRIBUTING.md)

## 🔧 Troubleshooting Weni Target

### Common Issues

**AWS Authentication Errors**
- Verify your AWS environment variables are set correctly (ACCESS_KEY_ID, SECRET_ACCESS_KEY, SESSION_TOKEN)
- Ensure you have access to Amazon Bedrock in your specified region
- Check that your AWS credentials have the necessary Bedrock permissions
- Verify the `aws_region` in your configuration matches your AWS account's region access

**Weni Authentication Errors**
- **Using Weni CLI (Recommended)**: Run `weni login` to re-authenticate, then `weni project use [project-uuid]` to select your project
- **Using Environment Variables**: Verify your `WENI_BEARER_TOKEN` is valid and not expired
- Check that the `WENI_PROJECT_UUID` matches your actual project
- Ensure the bearer token has the necessary permissions for the project
- Get Weni CLI at: https://github.com/weni-ai/weni-cli

**Connection Issues**
- Verify the Weni API endpoints are accessible from your network
- Check for any firewall or proxy settings blocking HTTPS/WSS connections
- Ensure your internet connection is stable

**Timeout Errors**
- Increase the `timeout` parameter if your agent requires more processing time
- Check if the agent is properly configured and active in the Weni platform
- Verify the agent is not stuck in a processing loop

**WebSocket Connection Failures**
- Ensure the `websocket-client` package is properly installed
- Check for any proxy configurations that might interfere with WebSocket connections
- Verify the WebSocket endpoint URL is correct for your project

## 🆚 Differences from Original

This fork maintains full compatibility with the original AWS Labs Agent Evaluation framework while adding:

- **Weni Target Support**: Native integration with Weni conversational AI platform
- **WebSocket Communication**: Real-time bidirectional communication with Weni agents
- **Session Isolation**: Each test case uses unique contact identifiers for proper conversation isolation

All original AWS targets (Bedrock, Q Business, SageMaker, etc.) continue to work exactly as in the original repository.

## 🤝 Contributing

We welcome contributions! This fork follows the same contribution guidelines as the original project. Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for details.

For Weni-specific contributions:
1. Test your changes with actual Weni agents
2. Update the Weni target documentation if needed
3. Ensure backward compatibility with existing configurations

## 👏 Contributors

Shout out to these awesome contributors:

<a href="https://github.com/awslabs/agent-evaluation/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=awslabs/agent-evaluation" />
</a>
