Metadata-Version: 2.3
Name: chromadbx
Version: 0.0.8
Summary: A collection of experimental Chroma extensions.
Author: Trayan Azarov
Author-email: trayan.azarov@amikos.tech
Requires-Python: >=3.9,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: core
Provides-Extra: embeddings
Provides-Extra: ids
Requires-Dist: chromadb (>=0.4.0,<=0.6.0) ; extra == "core"
Requires-Dist: llama-embedder (>=0.0.7,<0.0.8) ; extra == "embeddings"
Requires-Dist: nanoid (>=2.0.0,<3.0.0) ; extra == "ids"
Requires-Dist: pydantic (>=2.7.2,<3.0.0)
Requires-Dist: ulid-py (>=1.1.0,<2.0.0) ; extra == "ids"
Project-URL: Bug Tracker, https://github.com/amikos-tech/chromadbx/issues
Project-URL: Homepage, https://cookbook.chromadb.dev/
Project-URL: Source, https://github.com/amikos-tech/chromadbx/
Description-Content-Type: text/markdown

# ChromaX: An experimental utilities package for Chroma AI application database

## Installation

```bash
pip install chromadbx
```

## Features

- [Query Builder](https://github.com/amikos-tech/chromadbx#queries) - build queries using a builder pattern
- [ID generation](https://github.com/amikos-tech/chromadbx#id-generation) - generate IDs for documents
- [Embeddings](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md) - generate embeddings for your documents:
    - [OnnxRuntime](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#onnx-runtime) embeddings
    - [Llama.cpp](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#llamacpp) embeddings
    - [Google Vertex AI](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#google-vertex-ai) embeddings
    - [Mistral AI](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#mistral-ai) embeddings
    - [Cloudflare Workers AI](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#cloudflare-workers-ai) embeddings
    - [SpaCy](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#spacy) embeddings
    - [Together](https://github.com/amikos-tech/chromadbx/blob/main/docs/embeddings.md#together) embeddings

## Usage

### Queries

Supported filters:

- `$eq` - equal to (string, int, float)
- `$ne` - not equal to (string, int, float)
- `$gt` - greater than (int, float)
- `$gte` - greater than or equal to (int, float)
- `$lt` - less than (int, float)
- `$lte` - less than or equal to (int, float)
- `$in` - in (list of strings, ints, floats,bools)
- `$nin` - not in (list of strings, ints, floats,bools)

**Where:**


```python
import chromadb

from chromadbx.core.queries import eq, where, ne, and_

client = chromadb.PersistentClient(path="path/to/db")
collection = client.get_collection("collection_name")
collection.query(where=where(and_(eq("a", 1), ne("b", "2"))))
# {'$and': [{'a': ['$eq', 1]}, {'b': ['$ne', '2']}]}
```

**Where Document:**

```python
import chromadb

from chromadbx.core.queries import where_document, contains, not_contains, LogicalOperator

client = chromadb.PersistentClient(path="path/to/db")
collection = client.get_collection("collection_name")
collection.query(where_document=where_document(contains("this is a document", "this is another document")))
# {'$and': [{'$contains': 'this is a document'}, {'$contains': 'this is another document'}]}
collection.query(
    where_document=where_document(contains("this is a document", "this is another document", op=LogicalOperator.OR)))
# {'$or': [{'$contains': 'this is a document'}, {'$contains': 'this is another document'}]}
```

### ID Generation

```python
import chromadb
from chromadbx import IDGenerator
from functools import partial
from typing import Generator

def sequential_generator(start: int = 0) -> Generator[str, None, None]:
        _next = start
        while True:
            yield f"{_next}"
            _next += 1
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
idgen = IDGenerator(len(my_docs), generator=partial(sequential_generator, start=10))
col.add(ids=idgen, documents=my_docs)
```

#### UUIDs (default)

```python
import chromadb
from chromadbx import UUIDGenerator

client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=UUIDGenerator(len(my_docs)), documents=my_docs)
```

#### ULIDs

```python
import chromadb
from chromadbx import ULIDGenerator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=ULIDGenerator(len(my_docs)), documents=my_docs)
```

#### Hashes

**Random SHA256:**

```python
import chromadb
from chromadbx import RandomSHA256Generator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=RandomSHA256Generator(len(my_docs)), documents=my_docs)
```

**Document-based SHA256:**

```python
import chromadb
from chromadbx import DocumentSHA256Generator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=DocumentSHA256Generator(documents=my_docs), documents=my_docs)
```

#### NanoID

```python
import chromadb
from chromadbx import NanoIDGenerator
client = chromadb.Client()
col = client.get_or_create_collection("test")
my_docs = [f"Document {_}" for _ in range(10)]
col.add(ids=NanoIDGenerator(len(my_docs)), documents=my_docs)
```

