Metadata-Version: 2.4
Name: eval-protocol
Version: 0.2.10
Summary: The official Python SDK for Eval Protocol (EP.) EP is an open protocol that standardizes how developers author evals for large language model (LLM) applications.
Author-email: Fireworks AI <info@fireworks.ai>
License-Expression: MIT
Project-URL: Homepage, https://github.com/fireworks-ai/eval-protocol
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.25.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: dataclasses-json>=0.5.7
Requires-Dist: uvicorn>=0.15.0
Requires-Dist: python-dotenv>=0.19.0
Requires-Dist: openai==1.78.1
Requires-Dist: aiosqlite
Requires-Dist: aiohttp
Requires-Dist: mcp>=1.9.2
Requires-Dist: PyYAML>=5.0
Requires-Dist: datasets
Requires-Dist: fsspec
Requires-Dist: hydra-core>=1.3.2
Requires-Dist: omegaconf>=2.3.0
Requires-Dist: gymnasium>=0.29.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: anthropic>=0.59.0
Requires-Dist: ipykernel>=6.30.0
Requires-Dist: jupyter>=1.1.1
Requires-Dist: toml>=0.10.0
Requires-Dist: loguru>=0.6.0
Requires-Dist: docstring-parser>=0.15
Requires-Dist: rich>=12.0.0
Requires-Dist: psutil>=5.8.0
Requires-Dist: litellm>=1.0.0
Requires-Dist: addict>=2.4.0
Requires-Dist: deepdiff>=6.0.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: websockets>=15.0.1
Requires-Dist: fastapi>=0.116.1
Requires-Dist: pytest>=6.0.0
Requires-Dist: peewee>=3.18.2
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: pytest-httpserver; extra == "dev"
Requires-Dist: werkzeug>=2.0.0; extra == "dev"
Requires-Dist: black>=21.5b2; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: mypy>=0.812; extra == "dev"
Requires-Dist: flake8>=3.9.2; extra == "dev"
Requires-Dist: autopep8>=1.5.0; extra == "dev"
Requires-Dist: transformers>=4.0.0; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Requires-Dist: types-docker; extra == "dev"
Requires-Dist: versioneer>=0.20; extra == "dev"
Requires-Dist: openai==1.78.1; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: e2b; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Requires-Dist: docker==7.1.0; extra == "dev"
Requires-Dist: ipykernel>=6.30.0; extra == "dev"
Requires-Dist: jupyter>=1.1.1; extra == "dev"
Requires-Dist: pip>=25.1.1; extra == "dev"
Requires-Dist: haikus==0.3.8; extra == "dev"
Provides-Extra: trl
Requires-Dist: torch>=1.9; extra == "trl"
Requires-Dist: trl>=0.7.0; extra == "trl"
Requires-Dist: peft>=0.7.0; extra == "trl"
Requires-Dist: transformers>=4.0.0; extra == "trl"
Requires-Dist: accelerate>=0.28.0; extra == "trl"
Provides-Extra: openevals
Requires-Dist: openevals>=0.1.0; extra == "openevals"
Provides-Extra: fireworks
Requires-Dist: fireworks-ai>=0.19.12; extra == "fireworks"
Provides-Extra: box2d
Requires-Dist: swig; extra == "box2d"
Requires-Dist: gymnasium[box2d]>=0.29.0; extra == "box2d"
Requires-Dist: Pillow; extra == "box2d"
Provides-Extra: langfuse
Requires-Dist: langfuse>=2.0.0; extra == "langfuse"
Provides-Extra: huggingface
Requires-Dist: datasets>=2.0.0; extra == "huggingface"
Requires-Dist: transformers>=4.0.0; extra == "huggingface"
Provides-Extra: adapters
Requires-Dist: langfuse>=2.0.0; extra == "adapters"
Requires-Dist: datasets>=2.0.0; extra == "adapters"
Requires-Dist: transformers>=4.0.0; extra == "adapters"
Dynamic: license-file

# Eval Protocol (EP)

[![PyPI - Version](https://img.shields.io/pypi/v/eval-protocol)](https://pypi.org/project/eval-protocol/)

EP is an open specification, Python SDK, pytest wrapper, and suite of tools that
provides a standardized way to write evaluations for large language model (LLM)
applications. Start with simple single-turn evals for model selection and prompt
engineering, then scale up to complex multi-turn reinforcement learning (RL) for
agents using Model Context Protocol (MCP). EP ensures consistent patterns for
writing evals, storing traces, and saving results—enabling you to build
sophisticated agent evaluations that work across real-world scenarios, from
markdown generation tasks to customer service agents with tool calling
capabilities.

<p align="center">
	<img src="https://raw.githubusercontent.com/eval-protocol/python-sdk/refs/heads/main/assets/ui.png" alt="UI" />
	<br>
	<sub><b>Log Viewer: Monitor your evaluation rollouts in real time.</b></sub>
</p>

## Quick Example

Here's a simple test function that checks if a model's response contains **bold** text formatting:

```python test_bold_format.py
from eval_protocol.models import EvaluateResult, EvaluationRow
from eval_protocol.pytest import default_single_turn_rollout_processor, evaluation_test

@evaluation_test(
    input_messages=[
        [
            Message(role="system", content="You are a helpful assistant. Use bold text to highlight important information."),
            Message(role="user", content="Explain why **evaluations** matter for building AI agents. Make it dramatic!"),
        ],
    ],
    model=["accounts/fireworks/models/llama-v3p1-8b-instruct"],
    rollout_processor=default_single_turn_rollout_processor,
    mode="pointwise",
)
def test_bold_format(row: EvaluationRow) -> EvaluationRow:
    """
    Simple evaluation that checks if the model's response contains bold text.
    """

    assistant_response = row.messages[-1].content

    # Check if response contains **bold** text
    has_bold = "**" in assistant_response

    if has_bold:
        result = EvaluateResult(score=1.0, reason="✅ Response contains bold text")
    else:
        result = EvaluateResult(score=0.0, reason="❌ No bold text found")

    row.evaluation_result = result
    return row
```

## Documentation

See our [documentation](https://evalprotocol.io) for more details.

## Installation

**This library requires Python >= 3.10.**

Install with pip:

```
pip install eval-protocol
```

## License

[MIT](LICENSE)
