Metadata-Version: 2.4
Name: flordb
Version: 3.4.12
Summary: A hindsight logging dataabase for MLOps
Home-page: https://github.com/ucbrise/flor
Author: Rolando Garcia
Author-email: rolando.garcia@asu.edu
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: GitPython
Requires-Dist: cloudpickle
Requires-Dist: astunparse
Requires-Dist: pandas
Requires-Dist: bidict==0.21.3
Requires-Dist: apted
Requires-Dist: matplotlib
Requires-Dist: scikit-learn
Requires-Dist: numpy
Requires-Dist: tqdm
Requires-Dist: sh
Requires-Dist: ipython
Requires-Dist: ipykernel
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# FlorDB: Log-First Context Management for ML Practitioners

[![PyPI](https://img.shields.io/pypi/v/flordb.svg?nocache=1)](https://pypi.org/project/flordb/)


FlorDB brings experiment tracking, provenance, and reproducibility to your ML workflow—using the one thing every engineer already writes: **logs**.

Unlike heavyweight MLOps platforms, FlorDB doesn’t ask you to adopt a new UI, schema, or service. Just import it, log as you normally would, and gain full history, lineage, and replay capabilities across your training runs.

## 🚀 Why FlorDB?

- **Log-Driven Experiment Tracking**  
  No dashboards to configure or schemas to design. FlorDB turns your existing `print()` or `log()` calls into structured, queryable metadata.

- **Hindsight Logging & Replay**  
  Missed a metric? Add a log *after the fact* and replay past runs to capture it—no rerunning from scratch.

- **Reproducibility Without Friction**  
  Every run is versioned via Git, every hyperparameter is recorded, and every model checkpoint is linked and queryable—automatically.

- **Works With Your Stack**  
  Makefiles, Airflow, Slurm, HuggingFace, PyTorch—you don’t change your workflow. FlorDB fits in.

## 📦 Installation

```bash
pip install flordb
```

For contributors or bleeding-edge features:

```bash
git clone https://github.com/ucbrise/flor.git
cd flor
pip install -e .
```

---

## 📝 First Log in 30 Seconds

> *Requires a Git repository for automatic versioning.*

```bash
mkdir flor_sandbox
cd flor_sandbox
git init
ipython
```

```python
import flordb as flor
flor.log("message", "Hello ML World!")
```
```
message: Hello, ML World!
Changes committed successfully
```

Retrieve logs anytime:

```python
flor.dataframe("message")
```
```
         projid              tstamp filename          message
0  flor_sandbox 2025-10-13 18:13:48  ipython  Hello ML World!

```

## 🧪 Track Experiments with Zero Overhead

Drop FlorDB into your existing training script:

```python
import flordb as flor

# Hyperparameters
lr = flor.arg("lr", 1e-3)
batch_size = flor.arg("batch_size", 32)

with flor.checkpointing(model=net, optimizer=optimizer):
    for epoch in flor.loop("epoch", range(epochs)):
        for x, y in flor.loop("step", trainloader):
            ...
            flor.log("loss", loss.item())
```

**Change hyperparameters from the CLI:**

```bash
python train.py --kwargs lr=5e-4 batch_size=64
```

View metrics across runs:

```python
flor.dataframe("lr", "batch_size", "loss")
```

```
        projid              tstamp  filename  epoch  step      lr batch_size                 loss
0  ml_tutorial 2025-10-13 18:18:14  train.py      1   500  0.0005         64  0.20570574700832367
1  ml_tutorial 2025-10-13 18:18:14  train.py      2   500  0.0005         64   0.1964433193206787
2  ml_tutorial 2025-10-13 18:18:14  train.py      3   500  0.0005         64  0.11040152609348297
3  ml_tutorial 2025-10-13 18:18:14  train.py      4   500  0.0005         64    0.155434250831604
4  ml_tutorial 2025-10-13 18:18:14  train.py      5   500  0.0005         64   0.0741351768374443
```


## 🔍 Hindsight Logging: Fix It After You See It

Forgot to log gradient norms?

```python
flor.log("grad_norm", ...)
```

Just add the logging statement to the script and run:

```bash
python -m flordb replay grad_norm
```

FlorDB replays only what’s needed, injecting the new log across copies of historical versions and committing results.

## 🏗 Real ML Systems Built on FlorDB

FlorDB powers full AI/ML lifecycle tooling:

- **Feature Stores & Model Registries**
- **Document Parsing & Feedback Loops**
- **Continuous Training Pipelines**

See our [Scan Studio](https://github.com/bwerick/scan_studio) and [Document Parser](https://github.com/rlnsanz/document_parser) examples for real-world integration.


## 📚 Publications

FlorDB is based on research from UC Berkeley’s RISE Lab and Arizona State University.

- *Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle* ([CIDR 2025]((https://vldb.org/cidrdb/papers/2025/p33-garcia.pdf)))  
- *The Management of Context in the ML Lifecycle* ([UCB Tech Report 2024](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-142.html))  
- *Hindsight Logging for Model Training* ([PVLDB 2021](http://www.vldb.org/pvldb/vol14/p682-garcia.pdf))  


## 🛠 License

[Apache v2 License](https://www.apache.org/licenses/LICENSE-2.0) — free to use, modify, and distribute.

---

## 💡 Get Involved

FlorDB is actively developed. Contributions, issues, and real-world use cases are welcome!

**GitHub:** https://github.com/ucbrise/flor  
**Tutorial Video:** https://youtu.be/mKENSkk3S4Y
