Metadata-Version: 2.4
Name: dbt-autodoc
Version: 1.0.12
Summary: Automated documentation generator for dbt projects using Google Gemini AI
Author-email: JustDataPlease <hey@justdataplease.com>
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: duckdb>=0.9.0
Requires-Dist: google-generativeai>=0.3.0
Requires-Dist: ruamel.yaml>=0.17.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: dbt-osmosis
Requires-Dist: dbt-duckdb
Requires-Dist: psycopg2-binary
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# DBT Autodoc Documentation

`dbt-autodoc` is the ultimate tool for **Automated Documentation** and **Logging** for your dbt projects. It combines the power of Google Gemini AI with a robust **Database Logging** system to ensure your documentation is always up-to-date, accurate, and auditable.

## 🌟 Why dbt-autodoc?

-   **🤖 Automatic AI Documentation:** Generate comprehensive descriptions for your tables and columns automatically.
-   **💾 Database Logging & History:** Every description is stored in a database (`duckdb` or `postgres`). This acts as a "Source of Truth" and provides a full history of changes.
-   **🔄 Full Synchronization:** Seamlessly integrates with `dbt-osmosis` to keep your YAML files in sync with your SQL models.
-   **🔒 Protect Manual Work:** Respects human-written documentation. If you write it, we lock it.
-   **👥 Team Ready:** Use Postgres to share documentation cache across your entire team.

## 🛠️ Setup

1.  **Install:**
    ```bash
    pip install dbt-autodoc
    ```

2.  **Configuration:**
    Run `dbt-autodoc` to generate `dbt-autodoc.yml`.
    **Important:** Edit `company_context` in this file to give the AI knowledge about your business logic.

3.  **Environment Variables:**
    ```env
    GEMINI_API_KEY=your_api_key_here
    POSTGRES_URL=postgresql://user:pass@host:port/db (optional)
    ```

## 📋 Recommended Workflow

For the best results, follow this step-by-step workflow to ensure accuracy and control:

1.  **Preparation:**
    Update your dbt project and context.
    ```bash
    dbt run
    # Edit dbt-autodoc.yml with company_context
    ```

2.  **Sync Structure (No AI):**
    Regenerate YAML files to match the SQL models. This ensures all new columns are present.
    ```bash
    dbt-autodoc --regenerate-yml
    ```

3.  **Generate Table Descriptions (SQL):**
    Generate AI descriptions for your models (tables/views).
    ```bash
    dbt-autodoc --generate-docs-config-ai --model-path models/staging
    ```

4.  **Manual Review (Important):**
    Open your YAML files. Review the structure and any existing descriptions. If you manually update a description here, it will be protected from AI overwrites in the next step.

5.  **Generate Column Descriptions (YAML):**
    Use AI to fill in the missing column descriptions.
    ```bash
    dbt-autodoc --generate-docs-yml-ai --model-path models/staging
    ```

6.  **Propagate & Save:**
    Run osmosis again to apply inheritance rules to all the dbt project, then run the tool again to save the final state (including inherited descriptions) to the database.
    ```bash
    dbt-autodoc --regenerate-yml
    dbt-autodoc --generate-docs-yml-ai --model-path models/staging
    ```

7.  **Next Layer:**
    Repeat steps 2-6 for `models/intermediate`, `models/marts`, etc.

## 🚀 Quick Start (Automated)

If you trust the process and just want to run everything at once:

```bash
dbt-autodoc --generate-docs-ai
```

## 🧠 How the AI Works

When generating a description for a column or table, the AI considers multiple inputs to produce the most accurate result:

1.  **Company Context:** The high-level business logic defined in your config.
2.  **Model SQL:** The actual code of the model being documented.
3.  **Existing Descriptions:** Any existing documentation or comments in the file.
4.  **Upstream Logic:** (Implicitly via Osmosis inheritance) Context from upstream models.

It synthesizes all these inputs to write a concise, technical description.

## 📖 Arguments Reference

| Argument | Description |
| :--- | :--- |
| `--regenerate-yml` | **Structure Only.** Only runs `dbt-osmosis` to regenerate YAML files from dbt models. Does not sync to DB or call AI. |
| `--generate-docs-ai` | **🔥 Full Auto.** Runs the complete workflow: SQL generation, Osmosis sync, and YAML generation using AI. |
| `--generate-docs` | **🔄 Full Sync.** Runs the complete workflow using only the database cache (no AI). |
| `--model-path` | Restrict processing to a specific directory (e.g. `models/staging`). |
| `--generate-docs-config-ai` | Generate table descriptions in `.sql` files using AI. |
| `--generate-docs-yml-ai` | Generate column descriptions in `.yml` files using AI. |
| `--generate-docs-config` | Sync `.sql` files from cache (no AI). |
| `--generate-docs-yml` | Sync `.yml` files from cache (no AI). |
| `--cleanup-db` | **Reset Database.** Wipes the description cache and history. |
| `--concurrency` | Max threads for AI/DB requests (default: 10). |

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

## 🙏 Attribution

Brought to you by [JustDataPlease](https://justdataplease.com/agency/).
