

![dbt-colibri header](https://github.com/b-ned/dbt-colibri/blob/d31ece39bacf862e485233aad3e84df9a7618946/static/colibri_header.png)

[![PyPI version](https://badge.fury.io/py/dbt-colibri.svg)](https://badge.fury.io/py/dbt-colibri)
[![Python Support](https://img.shields.io/pypi/pyversions/dbt-colibri.svg)](https://pypi.org/project/dbt-colibri/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A lightweight, developer-friendly CLI tool and self-hostable dashboard for extracting and visualizing **column-level lineage** from your dbt projects.

Built for data teams who want transparent, flexible lineage tracking without vendor lock-in or complex enterprise tooling.

## Why dbt-colibri?

- **🔍 Complete visibility**: Easy UI, track how every column flows through your dbt transformations
- **⚡ Fast & lightweight**: Generate reports in seconds from your existing dbt artifacts
- **🏠 Self-hosted**: No cloud dependencies or external services required

Live demo of dashboard: [https://demo.colibri-data.com/](https://demo.colibri-data.com/)

Documentation site: [https://www.colibri-data.com/docs](https://www.colibri-data.com/docs)

<!-- Add a screenshot here when available -->
![dbt-colibri dashboard](https://github.com/b-ned/dbt-colibri/blob/d31ece39bacf862e485233aad3e84df9a7618946/static/ui_colibri_readme.png)


## Quick Start

### Installation

```bash
# Using uv (recommended)
uv add dbt-colibri

# Using pip
pip install dbt-colibri
```

### Basic Usage

1. **Run dbt** to generate the required artifacts:
   ```bash
   dbt compile
   dbt docs generate
   ```

2. **Generate lineage report**:
   ```bash
   colibri generate
   ```

3. **View results**: Open `dist/index.html` in your browser

That's it! Your column lineage dashboard is ready. Note you can also use dbt run/build, to generate the `manifest.json`.

## Documentation

### CLI Commands

#### `colibri generate`

Generates column lineage reports from your dbt project.

```bash
colibri generate [OPTIONS]
```

**Options:**
- `--manifest-path`: Path to dbt manifest.json (default: `target/manifest.json`)
- `--catalog-path`: Path to dbt catalog.json (default: `target/catalog.json`)
- `--output-dir`: Output directory (default: `dist/`)
- `--help`: Show help message
- `--light`: For very big dbt projects, excludes attributes like compiled SQL and returns smaller HTML file.

### Output Files

- **`colibri-manifest.json`**: Lineage data
- **`index.html`**: Interactive (standalone) visualization dashboard


### Project Structure

``` 
your-dbt-project/
├── target/
│   ├── manifest.json    # Generated by dbt
│   └── catalog.json     # Generated by dbt docs generate
└── dist/                # Generated by colibri
    ├── index.html       # Interactive dashboard
    └── colibri-manifest.json
```

## Advanced Usage

### CI/CD Integration

The easiest way to deploy your static html is through github/gitlab pages (if you are on enterprise license you can do this privately)

You can find the full example workflow at [`docs/github_pages_example.yml`](docs/github_pages_example.yml).

#### General idea
1. After every change to the production dbt code (push the `main` branch), GitHub Actions will:
   - Set up Python and install dependencies with `uv`.
   - Compile and generate docs needed for colibri.
   - Run `colibri generate` to build the static HTML report in the `dist/` folder.
2. The `dist/` folder is uploaded as an artifact and deployed natively to GitHub Pages using the official `actions/deploy-pages` action.
3. The result is available at your repository’s Pages URL.

Gitlab has similar functionality. Other options are writing the file to a bucket and mount it into a web server container (nginx).

## Technical Details

### Requirements

- **Python**: tested on versions 3.9, 3.11, 3.13

- **Supported dbt Adapters**: 
   - Snowflake, 
   - BigQuery, 
   - Redshift, 
   - duckDB, 
   - Postgres
   - Databricks (**limited to SQL models**)
   - Athena

### dbt Compatibility

| dbt-core Version | Status |
|------------------|--------|
| 1.8.x           | ✅ Tested |
| 1.9.x           | ✅ Tested |
| 1.10.x          | ✅ Tested |

### Architecture

dbt-colibri leverages:
- **SQLGlot** for SQL parsing and column lineage extraction
- **dbt artifacts** (manifest.json, catalog.json) for metadata
- **Static HTML/JS** for zero-dependency dashboard deployment

## Contributing

We welcome contributions! Raise an issue or request a feature, if you are open to contribute you can let us now in the issue.

- **Issues**: [GitHub Issues](https://github.com/b-ned/dbt-colibri/issues)
- **Discussions**: [GitHub Discussions](https://github.com/b-ned/dbt-colibri/discussions)


### Development Setup

```bash
# Clone the repository
git clone https://github.com/your-org/dbt-colibri.git
cd dbt-colibri

# Install development dependencies
uv sync --dev

# Run tests
pytest

# Format code
ruff format
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

This project builds upon excellent open source work:

- **[dbt-column-lineage-extractor](https://github.com/canva-public/dbt-column-lineage-extractor)** - Original column lineage extraction logic
- **[SQLGlot](https://github.com/tobymao/sqlglot)** - SQL parsing and transformation
- **[elementary-data](https://github.com/elementary-data/elementary)** - Inspiration for static HTML report structure

---
