Metadata-Version: 2.4
Name: kailash-dataflow
Version: 0.3.3
Summary: Workflow-native database framework for Kailash SDK
Home-page: https://github.com/kailash-sdk/kailash-dataflow
Author: Kailash SDK Team
Author-email: team@kailash.dev
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: kailash>=0.9.4
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: alembic>=1.12.0
Requires-Dist: psycopg2-binary>=2.9.0
Requires-Dist: redis>=4.5.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: click>=8.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: python-jose[cryptography]>=3.3.0; extra == "api"
Requires-Dist: passlib[bcrypt]>=1.7.4; extra == "api"
Provides-Extra: enterprise
Requires-Dist: cryptography>=3.4.0; extra == "enterprise"
Requires-Dist: flask>=2.0.0; extra == "enterprise"
Requires-Dist: flask-jwt-extended>=4.0.0; extra == "enterprise"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Kailash DataFlow

**Zero-Config Database Framework** - Django simplicity meets enterprise-grade production quality.

## 🚀 Quick Start (60 seconds)

```python
from kailash_dataflow import DataFlow

# That's it! No configuration needed
db = DataFlow()

# Define your model
@db.model
class User:
    id: int
    name: str
    email: str
    
# DataFlow automatically creates:
# ✅ Database schema (PostgreSQL, MySQL, SQLite)
# ✅ 9 workflow nodes per model (CRUD + bulk ops)
# ✅ Real SQL operations with security
# ✅ Connection pooling and transaction management
# ✅ MongoDB-style query builder (implemented!)
# ⚠️ Redis query cache (planned)
# ⚠️ Multi-database runtime (PostgreSQL only)
```

You now have a production-ready database layer!

## 🎯 What Makes DataFlow Different?

### Zero Configuration That Actually Works
```python
# Development? Uses SQLite automatically
db = DataFlow()  # Just works!

# Production? Reads from environment
# DATABASE_URL=postgresql://...
db = DataFlow()  # Still just works!

# Need control? Progressive enhancement
db = DataFlow(
    pool_size=50,
    read_replicas=['replica1', 'replica2'],
    monitoring=True
)
```

### Real Database Operations (Currently Available)
```python
# Traditional ORMs: Imperative code
User.objects.create(name="Alice")  # Django
user = User(name="Alice"); session.add(user)  # SQLAlchemy

# DataFlow: Workflow-native database operations
workflow = WorkflowBuilder()
workflow.add_node("UserCreateNode", "create_user", {
    "name": "Alice",
    "email": "alice@example.com"
})
workflow.add_node("UserListNode", "find_users", {
    "limit": 10,
    "offset": 0
})

# Real SQL is executed: INSERT INTO users (name, email) VALUES ($1, $2)
```

### MongoDB-Style Query Builder (NEW!)
```python
# Get QueryBuilder from any model
builder = User.query_builder()

# MongoDB-style operators
builder.where("age", "$gte", 18)
builder.where("status", "$in", ["active", "premium"])
builder.where("email", "$regex", "^[a-z]+@company\.com$")
builder.order_by("created_at", "DESC")
builder.limit(10)

# Generates optimized SQL for your database
sql, params = builder.build_select()
# PostgreSQL: SELECT * FROM "users" WHERE "age" >= $1 AND "status" IN ($2, $3) AND "email" ~ $4 ORDER BY "created_at" DESC LIMIT 10

# Works seamlessly with ListNode
workflow.add_node("UserListNode", "search", {
    "filter": {
        "age": {"$gte": 18},
        "status": {"$in": ["active", "premium"]},
        "email": {"$regex": "^admin"}
    }
})
```

### Database Requirements
```python
# Current limitation: PostgreSQL only for execution
db = DataFlow(database_url="postgresql://user:pass@localhost/db")

# Schema generation works for all databases
schema_sql = db.generate_complete_schema_sql("sqlite")  # ✅ Works
schema_sql = db.generate_complete_schema_sql("mysql")   # ✅ Works
schema_sql = db.generate_complete_schema_sql("postgresql")  # ✅ Works

# But execution currently requires PostgreSQL
runtime = LocalRuntime()
results, run_id = runtime.execute(workflow.build())  # ✅ PostgreSQL only
```

### Database Operations as Workflow Nodes
```python
# Traditional ORMs: Imperative code
user = User.objects.create(name="Alice")  # Django
user = User(name="Alice"); session.add(user)  # SQLAlchemy

# DataFlow: Workflow-native (9 nodes per model!)
workflow = WorkflowBuilder()
workflow.add_node("UserCreateNode", "create_user", {
    "name": "Alice",
    "email": "alice@example.com"
})
workflow.add_node("UserListNode", "find_users", {
    "filter": {"name": {"$like": "A%"}}
})
```

### Enterprise Configuration
```python
# Multi-tenancy configuration (query modification planned)
db = DataFlow(multi_tenant=True)

# Real SQL generation with security
db = DataFlow(
    database_url="postgresql://user:pass@localhost/db",
    pool_size=20,
    pool_max_overflow=30,
    monitoring=True,
    echo=False  # No SQL logging in production
)

# All generated nodes use parameterized queries for security
# INSERT INTO users (name, email) VALUES ($1, $2)  -- Safe from SQL injection
```

## 🚦 Implementation Status

### ✅ Currently Available (Production-Ready)
- **Database Schema Generation**: Complete CREATE TABLE for PostgreSQL, MySQL, SQLite
- **Real Database Operations**: All 9 CRUD + bulk nodes execute actual SQL
- **SQL Security**: Parameterized queries prevent SQL injection
- **Connection Management**: Connection pooling, DDL execution, error handling
- **Workflow Integration**: Full compatibility with WorkflowBuilder/LocalRuntime
- **Configuration System**: Zero-config to enterprise patterns
- **MongoDB-Style Query Builder**: Complete with all operators ($eq, $gt, $in, $regex, etc.)

### ⚠️ Limitations
- **Database Runtime**: PostgreSQL execution only (schema generation works for all)
- **AsyncSQLDatabaseNode**: Current limitation requires PostgreSQL connection string

### 🔄 Planned Features (Roadmap)
- **Redis Query Caching**: `User.cached_query()` with automatic invalidation
- **Multi-Database Runtime**: SQLite/MySQL execution support
- **Advanced Multi-Tenancy**: Automatic query modification for tenant isolation

## 📚 Documentation

### Getting Started
- **[5-Minute Tutorial](docs/getting-started/quickstart.md)** - Build your first app
- **[Core Concepts](docs/getting-started/concepts.md)** - Understand DataFlow
- **[Examples](examples/)** - Complete applications

### Development
- **[Models](docs/development/models.md)** - Define your schema
- **[CRUD Operations](docs/development/crud.md)** - Basic operations
- **[Relationships](docs/development/relationships.md)** - Model associations

### Production
- **[Deployment](docs/production/deployment.md)** - Go to production
- **[Performance](docs/production/performance.md)** - Optimization guide
- **[Monitoring](docs/advanced/monitoring.md)** - Observability

## 💡 Real-World Examples

### E-Commerce Platform
```python
# Define your models
@db.model
class Product:
    id: int
    name: str
    price: float
    stock: int

@db.model
class Order:
    id: int
    user_id: int
    total: float
    status: str

# Use in workflows
workflow = WorkflowBuilder()

# Check inventory
workflow.add_node("ProductGetNode", "check_stock", {
    "id": "{product_id}"
})

# Create order with transaction
workflow.add_node("TransactionContextNode", "tx_start")
workflow.add_node("OrderCreateNode", "create_order", {
    "user_id": "{user_id}",
    "total": "{total}"
})
workflow.add_node("ProductUpdateNode", "update_stock", {
    "id": "{product_id}",
    "stock": "{new_stock}"
})
```

### Multi-Tenant SaaS (Current Implementation)
```python
# Enable multi-tenancy configuration
db = DataFlow(
    database_url="postgresql://user:pass@localhost/db",
    multi_tenant=True
)

# Multi-tenant models get tenant_id field automatically
@db.model
class User:
    name: str
    email: str
    # tenant_id: str automatically added

# Use in workflows with real database operations
workflow.add_node("UserCreateNode", "create_user", {
    "name": "Alice",
    "email": "alice@acme-corp.com"
})
workflow.add_node("UserListNode", "list_users", {
    "limit": 10,
    "filter": {}
})
```

### High-Performance ETL (Current Implementation)
```python
# Bulk operations with real database execution
workflow.add_node("UserBulkCreateNode", "import_users", {
    "data": users_data,  # List of user records
    "batch_size": 1000,
    "conflict_resolution": "skip"
})

# Real bulk INSERT operations executed
# Uses parameterized queries for security
# Processes data in configurable batches

# List operations with filters
workflow.add_node("UserListNode", "active_users", {
    "limit": 1000,
    "offset": 0,
    "order_by": ["created_at"],
    "filter": {"active": True}
})
```

## 🏗️ Architecture

DataFlow seamlessly integrates with Kailash's workflow architecture:

```
┌─────────────────────────────────────────────────────┐
│                 Your Application                     │
├─────────────────────────────────────────────────────┤
│                    DataFlow                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│  │  Models  │  │   Nodes  │  │ Migrations│         │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘         │
│       └──────────────┴──────────────┘               │
│                Core Features                         │
│  QueryBuilder │ QueryCache │ Monitoring │ Multi-tenant │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│  │MongoDB-  │  │Redis     │  │Pattern   │         │
│  │style     │  │Caching   │  │Invalidate│         │
│  └──────────┘  └──────────┘  └──────────┘         │
├─────────────────────────────────────────────────────┤
│               Kailash SDK                           │
│         Workflows │ Nodes │ Runtime                 │
└─────────────────────────────────────────────────────┘
```

## 🧪 Testing

DataFlow includes comprehensive testing support:

```python
# Test with in-memory database
def test_user_creation():
    db = DataFlow(testing=True)

    @db.model
    class User:
        id: int
        name: str

    # Automatic test isolation
    user = db.test_create(User, name="Test User")
    assert user.name == "Test User"
```

## 🤝 Contributing

We welcome contributions! DataFlow follows Kailash SDK patterns:

1. Use SDK components and patterns
2. Maintain zero-config philosophy
3. Write comprehensive tests
4. Update documentation

See [CONTRIBUTING.md](CONTRIBUTING.md) for details.

## 📊 Performance

DataFlow provides real database performance with PostgreSQL:

- **Real SQL execution** with parameterized queries
- **Connection pooling** with configurable pool sizes
- **Bulk operations** with batching for large datasets
- **Production-ready** database operations

Performance testing requires PostgreSQL database setup.
Advanced caching and query optimization features are planned.

## ⚡ Why DataFlow?

- **Real Database Operations**: Actual SQL execution, not mocks
- **Workflow-Native**: Database ops as first-class nodes
- **Production-Ready**: PostgreSQL support with connection pooling
- **Progressive**: Simple to start, enterprise features available
- **100% Kailash**: Built on proven SDK components

---

**Built with Kailash SDK** | [Parent Project](../../README.md) | [SDK Docs](../../sdk-users/)
