Metadata-Version: 2.4
Name: canonmap
Version: 0.4.17
Summary: A data matching and canonicalization library with multipl database connector support
Project-URL: Homepage, https://github.com/vinceberry/canonmap
Project-URL: Documentation, https://github.com/vinceberry/canonmap#readme
Project-URL: Repository, https://github.com/vinceberry/canonmap
Project-URL: Issues, https://github.com/vinceberry/canonmap/issues
Project-URL: Changelog, https://github.com/vinceberry/canonmap/blob/main/CHANGELOG.md
Author-email: Vince Berry <vincent.berry11@gmail.com>
Maintainer-email: Vince Berry <vincent.berry11@gmail.com>
License: MIT
License-File: LICENSE
License-File: NOTICE.txt
Keywords: canonicalization,data-matching,database,entity-resolution,etl,mysql,natural language to sql,ner,nl2sql,nlp,sql
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: chardet>=4.0.0
Requires-Dist: cohere>=4.0.0
Requires-Dist: jellyfish>=0.9.0
Requires-Dist: metaphone>=0.6
Requires-Dist: mysql-connector-python>=8.0.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: python-levenshtein>=0.20.0
Requires-Dist: python-multipart>=0.0.5
Requires-Dist: rich>=12.0.0
Provides-Extra: dev
Requires-Dist: black>=22.0.0; extra == 'dev'
Requires-Dist: build>=0.10.0; extra == 'dev'
Requires-Dist: flake8>=5.0.0; extra == 'dev'
Requires-Dist: isort>=5.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pre-commit>=2.20.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Provides-Extra: fastapi
Requires-Dist: fastapi>=0.100.0; extra == 'fastapi'
Requires-Dist: uvicorn>=0.20.0; extra == 'fastapi'
Description-Content-Type: text/markdown

# CanonMap

A powerful data matching and canonicalization library with MySQL connector support.

## Features

- **Data Matching**: Advanced algorithms for fuzzy string matching and record linkage
- **MySQL Integration**: Seamless connection and management of MySQL databases
- **Canonicalization**: Standardize and normalize data across different formats
- **Rich Logging**: Beautiful console output with structured logging
- **FastAPI Support**: Optional FastAPI integration for web services

## Installation

```bash
pip install canonmap
```

For development dependencies:
```bash
pip install canonmap[dev]
```

For FastAPI support:
```bash
pip install canonmap[fastapi]
```

## Quick Start

### Command Line Interface

CanonMap provides a CLI tool for quick project setup:

```bash
# Create a new API project (default name: app)
cm create-api

# Create a new API project with custom name
cm create-api --name my-api

# Create a new API project with spaces (will be normalized)
cm create-api --name "My API"
```

The CLI will automatically:
- Normalize directory names to follow Python conventions
- Auto-increment names if the directory already exists (app, app-2, app-3, etc.)
- Copy and customize the example API template
- Replace all references from "app" to your chosen name
- Install required dependencies (fastapi, uvicorn, python-dotenv)

### Basic Usage

```python
from canonmap import make_console_handler
from canonmap.connectors.mysql_connector import MySQLConnector

# Set up logging
make_console_handler(set_root=True)

# Create a MySQL connector
connector = MySQLConnector(
    host="localhost",
    port=3306,
    user="your_user",
    password="your_password",
    database="your_database"
)

# Use the connector for data operations
# ... your data matching and canonicalization code
```

### Data Matching Example

```python
from canonmap.connectors.mysql_connector.matching import Matcher

# Initialize matcher
matcher = Matcher()

# Perform fuzzy matching
matches = matcher.find_matches(
    source_data=source_records,
    target_data=target_records,
    fields_to_match=["name", "address"],
    threshold=0.8
)
```

## Documentation

For detailed documentation, visit [the project homepage](https://github.com/yourusername/canonmap).

## Development

### Setup

1. Clone the repository:
```bash
git clone https://github.com/yourusername/canonmap.git
cd canonmap
```

2. Install development dependencies:
```bash
pip install -e ".[dev]"
```

3. Run tests:
```bash
pytest
```

### Code Quality

This project uses several tools to maintain code quality:

- **Black**: Code formatting
- **isort**: Import sorting
- **flake8**: Linting
- **mypy**: Type checking
- **pytest**: Testing

Run all quality checks:
```bash
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/
pytest
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Ensure all tests pass
6. Submit a pull request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for a list of changes and version history.

## Support

- **Issues**: [GitHub Issues](https://github.com/yourusername/canonmap/issues)
- **Documentation**: [Project README](https://github.com/yourusername/canonmap#readme) 