Metadata-Version: 2.4
Name: embeddingframework
Version: 1.0.1
Summary: A high-performance, asynchronous, and extensible Python package for processing files, generating embeddings, and storing them in various vector databases with optional cloud storage integration.
Home-page: https://github.com/isathish/embeddingframework
Author: SathishKumar Nagarajan
Author-email: mail@sathishkumarnagarajan.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiofiles
Requires-Dist: asyncio
Requires-Dist: argparse
Requires-Dist: pytest
Requires-Dist: pytest-cov
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# EmbeddingFramework

A **modular, extensible, and production-ready** Python framework for working with embeddings, vector databases, and cloud storage providers.  
Designed for **AI, NLP, and semantic search** applications, EmbeddingFramework provides a unified API to process, store, and query embeddings across multiple backends.

---

## 🚀 Features

### 🔹 **Multi-Vector Database Support**
- **ChromaDB** – Local and persistent vector storage.
- **Milvus** – High-performance distributed vector database.
- **Pinecone** – Fully managed vector database service.
- **Weaviate** – Open-source vector search engine.

### 🔹 **Cloud Storage Integrations**
- **AWS S3** – Store and retrieve embeddings or documents.
- **Google Cloud Storage (GCS)** – Scalable object storage.
- **Azure Blob Storage** – Enterprise-grade cloud storage.

### 🔹 **Embedding Providers**
- **OpenAI Embeddings** – State-of-the-art embedding generation.
- Easily extendable to other providers.

### 🔹 **File Processing & Preprocessing**
- Automatic file type detection.
- Text extraction from multiple formats.
- Preprocessing utilities for cleaning and normalizing text.
- Intelligent text splitting for optimal embedding performance.

### 🔹 **Utilities**
- Retry logic for robust API calls.
- File utilities for safe and efficient I/O.
- Modular architecture for easy extension.

---

## 📦 Installation

```bash
# Basic installation
pip install embeddingframework

# With development dependencies
pip install embeddingframework[dev]
```

---

## ⚡ Quick Start

```python
from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import ChromaDBAdapter

# Initialize embedding provider
embedding_provider = OpenAIEmbeddingAdapter(api_key="YOUR_OPENAI_API_KEY")

# Initialize vector database
vector_db = ChromaDBAdapter(persist_directory="./chroma_store")

# Generate embeddings
embeddings = embedding_provider.embed_texts(["Hello world", "EmbeddingFramework is awesome!"])

# Store embeddings
vector_db.add_texts(["Hello world", "EmbeddingFramework is awesome!"], embeddings)
```

---

## 🛠 Project Structure

```
embeddingframework/
│
├── adapters/                # Vector DB & storage adapters
│   ├── base.py
│   ├── chromadb_adapter.py
│   ├── milvus_adapter.py
│   ├── pinecone_adapter.py
│   ├── weaviate_adapter.py
│   ├── storage/             # Cloud storage adapters
│
├── processors/              # File processing logic
├── utils/                    # Helper utilities
└── tests/                    # Test suite
```

---

## 🧪 Running Tests

```bash
pytest --maxfail=1 --disable-warnings -q
```

With coverage:

```bash
pytest --cov=embeddingframework --cov-report=term-missing
```

---

## 🔄 CI/CD Workflow

This project includes a **GitHub Actions** workflow (`.github/workflows/python-package.yml`) for:
- Automated testing with coverage.
- Version bumping & changelog generation.
- PyPI publishing.
- GitHub release creation.

---

## 📜 License

This project is licensed under the **MIT License** – see the [LICENSE](LICENSE) file for details.

---

## 🤝 Contributing

1. Fork the repository.
2. Create a new branch (`feature/my-feature`).
3. Commit your changes.
4. Push to your branch.
5. Open a Pull Request.

---

## 🌟 Why EmbeddingFramework?

- **Unified API** – Work with multiple vector DBs and storage providers seamlessly.
- **Extensible** – Add new adapters with minimal effort.
- **Production-Ready** – Built with scalability and reliability in mind.
- **Developer-Friendly** – Clean, modular, and well-documented codebase.
