# DeepBrief — Turning Research Into Narrative Intelligence

![Deepbrief Workflows](images/workflow.png)

DeepBrief is an LLM-based workflow that transforms research papers and security documents into narrative briefings, podcast-style episodes, and spoken summaries. It powers the [AI Security Voice podcast](https://open.spotify.com/show/7wq09WmVmtj3sq6lkg0eS8?si=OnuJqxfOQmSOcxM1R-JTIQ), which leverages [Cyb3rWard0g's](https://x.com/Cyb3rWard0g) cloned voice from [Eleven Labs](https://elevenlabs.io/), and generalizes to any intelligence or research team that wants conversational, hands-free debriefs.

## What

DeepBrief is an end-to-end research-to-audio pipeline. It pulls new papers, classifies relevance, chunks PDFs into structured context, prompts LLMs for dialogue-style summaries, and renders the result as narrated episodes. Teams can use it for:

- Long-form paper → conversational “deep dive” recaps
- Automated PDF → podcast production with custom host/participant voices
- Hands-free weekly intelligence updates
- Personalized offline “intelligence podcasts” generated from any curated corpus

The outcome is the same: complex research you can absorb while walking, commuting, or threat hunting—no dashboard staring required.

## ✨ Key Capabilities

- 🔍 **Research Paper Retrieval** – Uses the [arXiv API](https://info.arxiv.org/help/api/index.html) to run curated security/AI queries daily and extract canonical metadata for each paper.
- 🧠 **LLM-Based Relevance Classification** – Applies LLMs to each paper’s summary/abstract to filter out non-relevant work before downloading or indexing.
- 📝 **Narrative Transcript Generation** – Converts PDFs into structured, podcast-ready dialogue tailored for listening.
- 🎙️ **Audio Generation** – Currently leverages [ElevenLabs](https://elevenlabs.io/) voices to render transcripts into polished audio episodes.
- 🕸️ **Code-First Durable Workflows** – Built on [Dapr Workflows](https://docs.dapr.io/developing-applications/building-blocks/workflow/workflow-overview/) and [Dapr Agents](https://github.com/dapr/dapr-agents) to fan out/fan in transcript, audio, and episode stages with deterministic orchestration plus LLM-powered decisions.
- 🔬 **Extensible Components** – Uses Dapr’s component model so shared storage (e.g., local [MinIO](https://www.min.io/) or AWS S3) and other building blocks can be swapped without rewriting workflow code.

## Installation

DeepBrief uses [uv](https://docs.astral.sh/uv/) as the preferred package manager.

1. **Install uv**

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

2. **Create (or reuse) a virtual environment**

```bash
uv venv
source .venv/bin/activate
```

3. **Install DeepBrief in editable mode**

```bash
uv pip install -e .
```

## 🔧 Environment Setup

DeepBrief automatically loads `.env` at startup. Create one at the project root:

```
OPENAI_API_KEY="your-key"
OPENAI_API_MODEL="gpt-5-mini"
OPENAI_API_BASE_URL="https://api.openai.com/v1"
ELEVENLABS_API_KEY="your-key"

STORAGE_BINDING_NAME="bucketstore"
MINIO_ENDPOINT=http://localhost:9000
MINIO_ACCESS_KEY=miniokey
MINIO_SECRET_KEY=miniosecret

DOCLING_PICTURE_API_KEY="your-key"
TAVILY_API_KEY="your-key"
```

## 🔌 System Requirements

- **FFmpeg** for audio tooling

```bash
brew install ffmpeg
```

- **Dapr CLI** (DeepBrief relies on Dapr Workflow Runtime)

```bash
brew install dapr/tap/dapr-cli
dapr init
```

Verify the setup:

```bash
dapr -v
docker ps
```

## 🧩 Srarting DeepBrief Workflow Server (Dapr + uv)

DeepBrief runs as a packaged module. To run the local `FastAPI` + `Dapr workflows` you should always launch it under Dapr so the workflow runtime can coordinate activities. Locally, you can use the existing [dapr.yaml](deploy/local/dapr.yaml):

```yaml
version: 1
common:
  resourcesPath: ./components
  logLevel: info

apps:
  - appId: deepbrief
    appPort: 8080
    appDirPath: .
    command: ["uv", "run", "-m", "deepbrief"]
    maxBodySize: 256Mi
```

Run it the following way:

```bash
dapr run -f deploy/local/dapr.yaml
```

or launch with:

```bash
dapr run \
  --app-id researchpodcast \
  --app-port 8080 \
  --resources-path deploy/local/components \
  --max-body-size 256 \
  -- uv run -m deepbrief
```

## 🚀 API Endpoints

DeepBrief exposes a FastAPI service for workflow control:

| Method | Endpoint                               | Description                                             |
| ------ | -------------------------------------- | ------------------------------------------------------- |
| POST   | `/workflows/research-podcast`          | Start a research-paper → podcast workflow               |
| GET    | `/workflows/{instance_id}`             | Fetch workflow status                                   |
| GET    | `/workflows/{instance_id}/wait`        | Block until a workflow completes                        |
| POST   | `/workflows/{instance_id}/terminate`   | Terminate a workflow instance (optional recursive stop) |

## ⚡️ Triggering Workflow via Client

```bash
uv run python3 deploy/local/test.py start \  --podcast-name "AI Security Voice" \
  --host-name "Roberto Rodriguez" \
  --host-voice "Cyb3rWard0g" \
  --max-rounds 3 \
  --output-directory output \
  --persist-locally \
  --papers-storage-prefix papers \
  --indexes-storage-prefix indexes \
  --transcripts-storage-prefix transcripts \
  --markdowns-storage-prefix markdowns \
  --download-timeout-seconds 120 \
  --search-max-results 2
```

## 📦 Release Process

To publish a new release to PyPI:

1. **Install dev dependencies**

```bash
uv pip install -e ".[dev]"
```

2. **Ensure tests pass**

```bash
uv run pytest tests/
```

3. **Push mainline first, then tag**

```bash
git checkout main
git pull --ff-only
git merge <feature-branch>
git push origin main
```

4. **Tag and push the release**

```bash
git tag -a v0.1.0 -m "Release 0.1.0"
git push origin v0.1.0
git checkout v0.1.0
```

5. **Clean old artifacts**

```bash
rm -rf dist build src/*.egg-info
```

6. **Upgrade build tooling**

```bash
uv pip install --upgrade build twine packaging setuptools wheel setuptools_scm
```

7. **Build and verify**

```bash
uv run python -m build
uv run twine check dist/*
```

8. **Publish to PyPI**

```bash
uv run twine upload dist/*
```

**Notes**

- Twine ≥ 6 and packaging ≥ 24.2 are required for modern metadata support.
- Always build from the release tag (`git checkout vX.Y.Z`) so `setuptools_scm` resolves the exact version. Detached HEAD mode is expected; return to your branch later with `git switch -`.
- CI pipelines should fetch tags (`git fetch --tags --force --prune` and `git fetch --unshallow || true`).
