# QuerySUTRA

**SUTRA: Structured-Unstructured-Text-Retrieval-Architecture**

Transform any data into structured, queryable databases with AI-powered entity extraction.

## 🎯 Key Features

✅ **Multi-Table Creation** - Automatically extracts entities and creates multiple related tables  
✅ **Smart Entity Extraction** - Identifies people, contacts, events, organizations from unstructured data  
✅ **Natural Language Queries** - Ask questions in plain English  
✅ **Multiple Data Formats** - CSV, Excel, JSON, PDF, DOCX, TXT, SQL, DataFrames  
✅ **Direct SQL Access** - Query without API costs  
✅ **Auto Visualization** - Built-in charts and graphs  
✅ **Cloud Export** - Save to MySQL, PostgreSQL, or local SQLite  

## 📦 Installation

```bash
pip install QuerySUTRA

# With MySQL support
pip install QuerySUTRA[mysql]

# With PostgreSQL support
pip install QuerySUTRA[postgres]

# With all database support
pip install QuerySUTRA[all]
```

## 🚀 Quick Start

```python
from sutra import SUTRA

# Initialize
sutra = SUTRA(api_key="your-openai-key")

# Upload any data - AI creates multiple structured tables!
sutra.upload("employee_story.pdf")

# View all created tables
sutra.tables()
# Output:
# 📋 TABLES IN DATABASE
# 1. employee_story_people (20 rows, 6 columns)
#    Columns: id, name, address, city, email, phone
# 2. employee_story_contacts (20 rows, 4 columns)
#    Columns: id, person_id, email, phone
# 3. employee_story_events (15 rows, 4 columns)
#    Columns: id, host_id, description, city

# View detailed schema
sutra.schema()

# Query with natural language
result = sutra.ask("Show all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show events by city", viz=True)

# Direct SQL (no API cost!)
result = sutra.sql("SELECT * FROM employee_story_people WHERE city='Dallas'")
print(result.data)
```

## 📊 How It Works

### From Unstructured PDF to Structured Tables

**Input:** PDF with employee information

**AI Automatically Creates:**
```
📋 Created 3 structured tables:
  📊 employee_story_people: 20 rows, 6 columns
     - id, name, address, city, email, phone
  📊 employee_story_contacts: 20 rows, 4 columns
     - id, person_id, email, phone  
  📊 employee_story_events: 15 rows, 4 columns
     - id, host_id, description, city
```

## 💡 Usage Examples

### 1. Upload Different Formats

```python
# CSV file
sutra.upload("sales_data.csv")

# Excel file
sutra.upload("quarterly_report.xlsx")

# PDF document (AI extracts entities!)
sutra.upload("company_directory.pdf")

# Word document
sutra.upload("meeting_notes.docx")

# Text file
sutra.upload("log_data.txt")

# DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'score': [95, 87]})
sutra.upload(df, name="test_scores")
```

### 2. View Your Data

```python
# List all tables with details
sutra.tables()

# Show schema with data types
sutra.schema()

# Show schema for specific table
sutra.schema("employee_story_people")

# Preview data
sutra.peek("employee_story_people", n=10)
```

### 3. Query Your Data

```python
# Natural language (uses OpenAI)
result = sutra.ask("What are the top 5 sales by region?")
print(result.data)

# With visualization
result = sutra.ask("Show sales trends by month", viz=True)

# Interactive mode (asks if you want viz)
result = sutra.interactive("Compare revenue across quarters")

# Direct SQL (free, no API!)
result = sutra.sql("SELECT city, COUNT(*) as count FROM employee_story_people GROUP BY city")
print(result.data)
```

### 4. Export Your Database

```python
# Export to MySQL (local or cloud)
sutra.save_to_mysql(
    host="localhost",
    user="root",
    password="password",
    database="my_database"
)

# Export to PostgreSQL
sutra.save_to_postgres(
    host="mydb.amazonaws.com",
    user="admin",
    password="password",
    database="production_db"
)

# Export to SQLite file
sutra.export_db("backup.db", format="sqlite")

# Export to SQL dump
sutra.export_db("schema.sql", format="sql")

# Export to JSON
sutra.export_db("data.json", format="json")

# Export to Excel (all tables as sheets)
sutra.export_db("data.xlsx", format="excel")

# Complete backup
sutra.backup("./backups")
```

## 🔥 Advanced Features

### Entity Extraction

QuerySUTRA automatically identifies and extracts:

- 👥 **People** - Names, addresses, contact info
- 📧 **Contacts** - Emails, phone numbers  
- 📅 **Events** - Meetings, activities, locations
- 🏢 **Organizations** - Companies, departments
- 📍 **Locations** - Cities, addresses, coordinates

### Multiple Table Relationships

```python
# AI creates relational structure
sutra.upload("company_data.pdf")

# Result:
# people table with person_id
# contacts table with foreign key to person_id
# events table with host_id linking to people
```

### Query Across Tables

```python
# Natural language handles joins automatically
result = sutra.ask("Show all events hosted by people from Dallas")

# Or write SQL joins manually
result = sutra.sql("""
    SELECT e.description, p.name, p.city
    FROM employee_story_events e
    JOIN employee_story_people p ON e.host_id = p.id
    WHERE p.city = 'Dallas'
""")
```

## 📈 Visualization

```python
# Auto-detect best chart type
result = sutra.ask("Show revenue by product", viz=True)

# Interactive charts with Plotly
# - Bar charts for categorical data
# - Line charts for time series  
# - Tables for detailed data
# - Pie charts for distributions
```

## 🌐 Cloud Database Integration

### AWS RDS MySQL
```python
sutra.save_to_mysql(
    host="mydb.xxxx.us-east-1.rds.amazonaws.com",
    user="admin",
    password="password",
    database="production",
    port=3306
)
```

### Google Cloud SQL
```python
sutra.save_to_postgres(
    host="35.123.456.789",
    user="postgres",
    password="password",
    database="analytics"
)
```

### Heroku Postgres
```python
sutra.save_to_postgres(
    host="ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com",
    user="username",
    password="password",
    database="dbname",
    port=5432
)
```

## ⚡ Performance Tips

```python
# Use direct SQL for complex queries (faster, no API cost)
result = sutra.sql("SELECT * FROM data WHERE status='active'")

# Cache is automatic for repeated questions
result1 = sutra.ask("Show total sales")  # Calls API
result2 = sutra.ask("Show total sales")  # From cache ⚡

# Export results for reuse
result.data.to_csv("results.csv")
```

## 🔒 API Key Security

```python
# Option 1: Pass directly (not recommended for production)
sutra = SUTRA(api_key="sk-...")

# Option 2: Environment variable (recommended)
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
sutra = SUTRA()

# Option 3: .env file
# Create .env file with: OPENAI_API_KEY=sk-...
from dotenv import load_dotenv
load_dotenv()
sutra = SUTRA()
```

## 🎓 Complete Example

```python
from sutra import SUTRA
import pandas as pd

# Initialize
sutra = SUTRA(api_key="your-openai-key")

# Upload PDF - creates multiple tables
sutra.upload("employee_directory.pdf")

# View what was created
tables_info = sutra.tables()
print(f"Created {len(tables_info)} tables")

# View detailed schema
sutra.schema()

# Query specific table
result = sutra.ask("How many people are in each city?", 
                   table="employee_directory_people")
print(result.data)

# Visualize
result = sutra.ask("Show distribution of people by city", viz=True)

# Export to MySQL
sutra.save_to_mysql("localhost", "root", "password", "company_db")

# Backup everything
sutra.backup("./backups")

# Close connection
sutra.close()
```

## 📚 Method Reference

### Core Methods

| Method | Description |
|--------|-------------|
| `upload(data, name)` | Upload any data format, creates multiple tables |
| `tables()` | List all tables with row/column counts |
| `schema(table)` | Show detailed schema with data types |
| `peek(table, n)` | Preview first n rows |
| `ask(question, viz)` | Natural language query |
| `sql(query, viz)` | Direct SQL query |
| `interactive(question)` | Query with viz prompt |

### Export Methods

| Method | Description |
|--------|-------------|
| `export_db(path, format)` | Export database (sqlite/sql/json/excel) |
| `save_to_mysql(...)` | Save to MySQL database |
| `save_to_postgres(...)` | Save to PostgreSQL database |
| `backup(path)` | Complete backup with timestamp |

## 🐛 Troubleshooting

**Q: Only one table created instead of multiple?**  
A: Make sure you have OpenAI API key set. Without it, falls back to simple parsing.

**Q: "No API key" error?**  
A: Set your OpenAI key: `sutra = SUTRA(api_key="sk-...")`

**Q: PDF extraction failed?**  
A: Install PyPDF2: `pip install PyPDF2`

**Q: MySQL export error?**  
A: Install extras: `pip install QuerySUTRA[mysql]`

## 📄 License

MIT License - see LICENSE file

## 🤝 Contributing

Contributions welcome! Open an issue or submit a PR.

## 📞 Support

- Issues: [GitHub Issues](https://github.com/yourusername/querysutra/issues)
- Email: your@email.com

---

**Made with ❤️ by Aditya Batta**
