Metadata-Version: 2.4
Name: bee-taxonomy
Version: 0.0.14
Summary: A library for taxonomy management.
Project-URL: Homepage, https://github.com/Davidoff2103/taxonomy-lib
Author-email: David Blanco Bautista <davidoff.d777@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Requires-Dist: googlesearch-python==1.3.0
Requires-Dist: langchain-community==0.3.27
Requires-Dist: langchain-huggingface==0.3.1
Requires-Dist: openai==1.101.0
Requires-Dist: python-dotenv==1.1.1
Description-Content-Type: text/markdown

# Taxonomy Library README

This module provides functions for taxonomy creation, text classification, and header translation using AI models and vector similarity search.

## Usage

Import the module with:
```python
from bee_taxonomy import taxonomy
```

## Installation

Install the library using pip:
```bash
pip install bee-taxonomy
```

## Main Functions

### 1. `taxonomy.propose_taxonomy(field: str, description: str, discrete_fields: list[str] = None)`
**Purpose**: Generate taxonomy suggestions using OpenAI

**Parameters**:
- `field`: Name of the field to categorize
- `description`: Description of the field's purpose
- `discrete_fields`: Optional specific values to consider

**Example**:
```python
taxonomy.propose_taxonomy(
    field="Color",
    description="Vehicle paint color classification",
    discrete_fields=["Red", "Blue", "Green", "Custom"]
)
# Returns: ["Red", "Blue", "Green", "Other"]
```

### 2. `taxonomy.apply_taxonomy_similarity(discrete_fields: list[str], taxonomy: list[str], category_type: str = None)`
**Purpose**: Classify values using semantic similarity with vector database

**Parameters**:
- `discrete_fields`: Values to classify
- `taxonomy`: List of allowed classification terms
- `category_type`: Special processing for categories like 'streets'

**Example**:
```python
taxonomy.apply_taxonomy_similarity(
    discrete_fields=["Rd", "Street", "Ave"],
    taxonomy=["Road", "Street", "Avenue"],
    category_type="streets"
)
# Returns: {'Rd': {'match': 'Road', 'score': 0.92}, ...}
```

### 3. `taxonomy.apply_taxonomy_reasoning(discrete_fields: list[str], taxonomy: list[str], classification_description: str, hash_file: str = None)`
**Purpose**: Use AI reasoning to classify values into taxonomy

**Parameters**:
- `discrete_fields`: List of values to classify
- `taxonomy`: List of allowed categories
- `classification_description`: Context for classification
- `hash_file`: Optional file hash for progress tracking

**Example**:
```python
taxonomy.apply_taxonomy_reasoning(
    discrete_fields=["Quick Brown Fox", "Lazy Dog"],
    taxonomy=["Animal", "Object", "Action"],
    classification_description="Classify animal-related phrases"
)
# Returns: {'Quick Brown Fox': 'Animal', 'Lazy Dog': 'Animal'
```

### 4. `taxonomy.translate_headers_reasoning(src_lang, dest_lang, headers)`
**Purpose**: Translate headers between languages using AI reasoning

**Parameters**:
- `src_lang`: Source language code
- `dest_lang`: Target language code
- `headers`: List of headers to translate

**Example**:
```python
taxonomy.translate_headers_reasoning(
    src_lang="en",
    dest_lang="es",
    headers=["Street Name", "Zip Code"]
)
# Returns: {'Street Name': 'Nombre de la Calle', 'Zip Code': 'Código Postal'
```

### 5. `taxonomy.analyze_text_field(field_name: str, field_value: str, task: Literal["label", "summarize"] = "label")`
**Purpose**: Analyze text fields for classification or summarization

**Parameters**:
- `field_name`: Name of the text field
- `field_value`: Text to analyze
- `task`: "label" for classification or "summarize" for text summary

**Example**:
```python
taxonomy.analyze_text_field(
    field_name="Product Description",
    field_value="This ergonomic chair provides lumbar support and adjustable height",
    task="label"
)
# Returns: "Office Furniture"
```

## Environment Variables
Users must rename `.env.example` to `.env` and fill in all the required fields with their specific values:
- `MODEL_NAME`: Hugging Face model identifier
- `SERVER_URL`: Base URL for OpenAI-compatible API
- `API_KEY`: Authentication token for the API
- `EMBEDDER_MODEL`: Embedding model for semantic similarity

## Features
- Validation workflow with Pydantic models
- Progress checkpointing for large datasets
- Google search integration for ambiguous classifications
