Metadata-Version: 2.4
Name: sweap-cli
Version: 0.1.0
Summary: CLI for authoring and running SWEAP benchmark tasks
Author: SWEAP Team
Keywords: sweap,cli,modal,benchmark,automation
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27
Requires-Dist: modal>=0.62
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.9
Provides-Extra: backend
Requires-Dist: fastapi>=0.111; extra == "backend"
Requires-Dist: pydantic>=2.6; extra == "backend"
Requires-Dist: pydantic-settings>=2.3; extra == "backend"
Requires-Dist: PyJWT>=2.8; extra == "backend"
Requires-Dist: uvicorn[standard]>=0.23; extra == "backend"
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: pytest>=8.1; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"

# SWEAP CLI

Command-line tooling for authoring, validating, and evaluating SWEAP benchmark
tasks. Each task is a self-contained bundle containing repository metadata,
guardrail tests, and a golden patch that can be reproduced locally or inside
Modal sandboxes.

- Documentation index: [docs/README.md](docs/README.md)
- Latest workflow guides:
  - [Task authoring](docs/task-authoring.md)
  - [Reviewer workflow](docs/reviewing.md)
  - [CLI reference](docs/reference/cli.md)
  - [FAQ & troubleshooting](docs/faq.md)

## Quick Start

```bash
# optional: create a virtual environment
python3 -m venv .venv
source .venv/bin/activate

pip install --upgrade pip
pip install sweap-cli

# scaffold a new task bundle
task init --repo https://github.com/example/project.git --commit deadbeef

# iterate locally until guardrails behave as expected
task validate

# run the modal evaluation pipeline (baseline + model + patched verification)
task run --model codex
```

### Required Credentials

- `SWEAP_API_URL` and `SWEAP_API_TOKEN` for remote submissions and runs (request an API token from the SWEAP team).
- `OPENAI_API_KEY` for Codex access (optional for local runs; mandatory for remote runs processed by our hosted worker).
- `modal` CLI credentials (`modal setup`) if you plan to run Modal evaluations locally.

Add `--runner node` or `--runner maven` during `task init` to scaffold non-Python
bundles. Use `task validate --modal` to reproduce validation inside Modal and
`task build` to cache Modal environments for pytest bundles.

## Core Commands

- `task init` – scaffold manifests, guardrail directories, and dependency stubs.
- `task validate` – run baseline vs. patched guardrails locally or in Modal.
- `task run` – execute the full evaluation loop (baseline, model attempt,
  patched verification, optional full suite) locally or via the backend.
- `task submit` – register/update tasks with the backend and upload bundle
  archives.
- `task build` – prebuild Modal environments for pytest bundles.
- `task info` / `task fetch-bundle` / `task runs-get` – inspect remote metadata,
  download bundles, and retrieve run artifacts.

See the [CLI reference](docs/reference/cli.md) for detailed options.

## Need Help?

- Troubleshooting and common questions: [docs/faq.md](docs/faq.md)
- Manifest schema and runner expectations:
  [docs/reference/manifest.md](docs/reference/manifest.md)
