Metadata-Version: 2.4
Name: deepbrief
Version: 0.2.1
Summary: A workflow-driven system for converting PDF documents into structured narrative and audio briefings.
Author-email: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Cyb3rWard0g/deepbrief
Project-URL: Issues, https://github.com/Cyb3rWard0g/deepbrief/issues
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dapr<1.17,>=1.16
Requires-Dist: dapr-ext-workflow<1.17,>=1.16
Requires-Dist: dapr-agents<0.11,>=0.10
Requires-Dist: fastapi<0.122,>=0.121
Requires-Dist: pypdf<7.0,>=6.2
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: arxiv<3.0,>=2.3
Requires-Dist: elevenlabs<3.0,>=2.22
Requires-Dist: pydub<0.26,>=0.25
Requires-Dist: python-dotenv<2.0,>=1.0
Requires-Dist: audioop-lts<0.3,>=0.2
Requires-Dist: uvicorn[standard]<0.39,>=0.38
Requires-Dist: docling<3.0,>=2.61
Requires-Dist: tavily-python<0.8,>=0.7.13
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=8.4; extra == "dev"
Requires-Dist: pytest-asyncio<0.24,>=0.23; extra == "dev"
Requires-Dist: ruff<0.7,>=0.6; extra == "dev"
Requires-Dist: black<25.0,>=24.10; extra == "dev"
Requires-Dist: mypy<1.12,>=1.11; extra == "dev"
Requires-Dist: pytest-cov<6.0,>=5.0; extra == "dev"
Provides-Extra: release
Requires-Dist: build<2.0,>=1.2; extra == "release"
Requires-Dist: twine<6.0,>=5.0; extra == "release"
Requires-Dist: packaging<25.0,>=24.1; extra == "release"
Dynamic: license-file

# DeepBrief — Turning Research Into Narrative Intelligence

![Deepbrief Workflows](images/workflow.png)

DeepBrief is an LLM-based workflow that transforms research papers and security documents into narrative briefings, podcast-style episodes, and spoken summaries. It powers the [AI Security Voice podcast](https://open.spotify.com/show/7wq09WmVmtj3sq6lkg0eS8?si=OnuJqxfOQmSOcxM1R-JTIQ), which leverages [Cyb3rWard0g's](https://x.com/Cyb3rWard0g) cloned voice from [Eleven Labs](https://elevenlabs.io/), and generalizes to any intelligence or research team that wants conversational, hands-free debriefs.

## What

DeepBrief is an end-to-end research-to-audio pipeline. It pulls new papers, classifies relevance, chunks PDFs into structured context, prompts LLMs for dialogue-style summaries, and renders the result as narrated episodes. Teams can use it for:

- Long-form paper → conversational “deep dive” recaps
- Automated PDF → podcast production with custom host/participant voices
- Hands-free weekly intelligence updates
- Personalized offline “intelligence podcasts” generated from any curated corpus

The outcome is the same: complex research you can absorb while walking, commuting, or threat hunting—no dashboard staring required.

## ✨ Key Capabilities

- 🔍 **Research Paper Retrieval** – Uses the [arXiv API](https://info.arxiv.org/help/api/index.html) to run curated security/AI queries daily and extract canonical metadata for each paper.
- 🧠 **LLM-Based Relevance Classification** – Applies LLMs to each paper’s summary/abstract to filter out non-relevant work before downloading or indexing.
- 📝 **Narrative Transcript Generation** – Converts PDFs into structured, podcast-ready dialogue tailored for listening.
- 🎙️ **Audio Generation** – Currently leverages [ElevenLabs](https://elevenlabs.io/) voices to render transcripts into polished audio episodes.
- 🕸️ **Code-First Durable Workflows** – Built on [Dapr Workflows](https://docs.dapr.io/developing-applications/building-blocks/workflow/workflow-overview/) and [Dapr Agents](https://github.com/dapr/dapr-agents) to fan out/fan in transcript, audio, and episode stages with deterministic orchestration plus LLM-powered decisions.
- 🔬 **Extensible Components** – Uses Dapr’s component model so shared storage (e.g., local [MinIO](https://www.min.io/) or AWS S3) and other building blocks can be swapped without rewriting workflow code.

## Installation

DeepBrief uses [uv](https://docs.astral.sh/uv/) as the preferred package manager.

1. **Install uv**

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

2. **Create (or reuse) a virtual environment**

```bash
uv venv
source .venv/bin/activate
```

3. **Install DeepBrief in editable mode**

```bash
uv pip install -e .
```

## 🔧 Environment Setup

DeepBrief automatically loads `.env` at startup. Create one at the project root:

```
OPENAI_API_KEY="your-key"
OPENAI_API_MODEL="gpt-5-mini"
OPENAI_API_BASE_URL="https://api.openai.com/v1"
ELEVENLABS_API_KEY="your-key"

STORAGE_BINDING_NAME="bucketstore"
MINIO_ENDPOINT=http://localhost:9000
MINIO_ACCESS_KEY=miniokey
MINIO_SECRET_KEY=miniosecret

DOCLING_PICTURE_API_KEY="your-key"
TAVILY_API_KEY="your-key"
```

## 🔌 System Requirements

- **FFmpeg** for audio tooling

```bash
brew install ffmpeg
```

- **Dapr CLI** (DeepBrief relies on Dapr Workflow Runtime)

```bash
brew install dapr/tap/dapr-cli
dapr init
```

Verify the setup:

```bash
dapr -v
docker ps
```

## 🧩 Srarting DeepBrief Workflow Server (Dapr + uv)

DeepBrief runs as a packaged module. To run the local `FastAPI` + `Dapr workflows` you should always launch it under Dapr so the workflow runtime can coordinate activities. Locally, you can use the existing [dapr.yaml](deploy/local/dapr.yaml):

```yaml
version: 1
common:
  resourcesPath: ./components
  logLevel: info

apps:
  - appId: deepbrief
    appPort: 8080
    appDirPath: .
    command: ["uv", "run", "-m", "deepbrief"]
    maxBodySize: 256Mi
```

Run it the following way:

```bash
dapr run -f deploy/local/dapr.yaml
```

or launch with:

```bash
dapr run \
  --app-id researchpodcast \
  --app-port 8080 \
  --resources-path deploy/local/components \
  --max-body-size 256 \
  -- uv run -m deepbrief
```

## 🚀 API Endpoints

DeepBrief exposes a FastAPI service for workflow control:

| Method | Endpoint                               | Description                                             |
| ------ | -------------------------------------- | ------------------------------------------------------- |
| POST   | `/workflows/research-podcast`          | Start a research-paper → podcast workflow               |
| GET    | `/workflows/{instance_id}`             | Fetch workflow status                                   |
| GET    | `/workflows/{instance_id}/wait`        | Block until a workflow completes                        |
| POST   | `/workflows/{instance_id}/terminate`   | Terminate a workflow instance (optional recursive stop) |

## ⚡️ Triggering Workflow via Client

```bash
uv run python3 deploy/local/test.py start \  --podcast-name "AI Security Voice" \
  --host-name "Roberto Rodriguez" \
  --host-voice "Cyb3rWard0g" \
  --max-rounds 3 \
  --output-directory output \
  --persist-locally \
  --papers-storage-prefix papers \
  --indexes-storage-prefix indexes \
  --transcripts-storage-prefix transcripts \
  --markdowns-storage-prefix markdowns \
  --download-timeout-seconds 120 \
  --search-max-results 2
```

## 📦 Release Process

To publish a new release to PyPI:

1. **Install dev dependencies**

```bash
uv pip install -e ".[dev]"
```

2. **Ensure tests pass**

```bash
uv run pytest tests/
```

3. **Push mainline first, then tag**

```bash
git checkout main
git pull --ff-only
git merge <feature-branch>
git push origin main
```

4. **Tag and push the release**

```bash
git tag -a v0.1.0 -m "Release 0.1.0"
git push origin v0.1.0
git checkout v0.1.0
```

5. **Clean old artifacts**

```bash
rm -rf dist build src/*.egg-info
```

6. **Upgrade build tooling**

```bash
uv pip install --upgrade build twine packaging setuptools wheel setuptools_scm
```

7. **Build and verify**

```bash
uv run python -m build
uv run twine check dist/*
```

8. **Publish to PyPI**

```bash
uv run twine upload dist/*
```

**Notes**

- Twine ≥ 6 and packaging ≥ 24.2 are required for modern metadata support.
- Always build from the release tag (`git checkout vX.Y.Z`) so `setuptools_scm` resolves the exact version. Detached HEAD mode is expected; return to your branch later with `git switch -`.
- CI pipelines should fetch tags (`git fetch --tags --force --prune` and `git fetch --unshallow || true`).
