# ChalkML

**Production-grade ML data processing framework with exceptional determinism**

[![PyPI version](https://badge.fury.io/py/chalkml.svg)](https://badge.fury.io/py/chalkml)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)


## Overview

ChalkML is a data preprocessing framework for machine learning to assist researchers and engineers with advanced data processing which are missing from Pandas. In addition to its innovative positional notation, ChalkML comes with advanced data processing design patterns. Each pattern addresses critical gaps in production ML systems: deterministic reproducibility, feature selection, privacy preservation, synthetic data generation, and regulatory compliance.



## Installation

```bash
pip install chalkml
```

## Position Notation

ChalkML uses intuitive position notation for columns:

| Notation | Meaning |
|----------|---------|
| `01N` | 1st from left |
| `02N` | 2nd from left |
| `N01` | Last (1st from right) |
| `N02` | 2nd last |

```bash
# Examples
chalkml -rm col 01N data.csv              # Remove 1st column
chalkml -mv col 02N N01 data.csv          # Move 2nd to last
chalkml -fillsmart col 03N mean data.csv  # Fill with mean
```

## Core Operations

```bash
# Data manipulation
chalkml -rm col 01N data.csv                    # Remove
chalkml -mv col 01N 03N data.csv                # Move
chalkml -rn col 02N "Age" data.csv              # Rename

# Imputation
chalkml -fillsmart col 03N mean data.csv        # Mean
chalkml -fillsmart col 04N knn data.csv         # KNN

# Feature engineering
chalkml -derive "BMI" "col:weight/col:height**2" data.csv
chalkml -onehot col 05N data.csv
chalkml -scale col 03N data.csv --method standard
```

## 5 Design Patterns

### MAP - Transform each element
```bash
chalkml -map col 05N "x*2" data.csv
chalkml -map col 06N "x**2" data.csv
```

### REDUCE - Aggregate columns
```bash
chalkml -reduce col 01N,02N,03N sum Total data.csv
```

### STENCIL - Sliding windows
```bash
chalkml -stencil col 03N 5 rolling_mean data.csv
```

### SCAN - Cumulative operations
```bash
chalkml -scan col 03N cumsum data.csv
```

### FARM - Parallel operations
```bash
chalkml -farm col range 01N:10N "x*2" data.csv
```

## 5 Advanced Patterns

### QUANTUM - Schema-based compression (70-90% reduction)
```bash
chalkml -quantum compress --schema medical patients.csv
chalkml -quantum decompress --schema medical compressed.csv
```

### RELEVANCE - Feature selection (95-98% reduction)
```bash
chalkml -relevance select --target 01N --threshold 0.1 data.csv
chalkml -relevance rank --target outcome --top 10 data.csv
```

### REDACT - Privacy preservation (HIPAA, GDPR)
```bash
chalkml -redact hipaa --identifiers all patients.csv
chalkml -redact differential --epsilon 1.0 sensitive.csv
```

### SCAFFOLD - Synthetic data generation
```bash
chalkml -scaffold sequence --type fibonacci --count 100
chalkml -scaffold distribution --dist normal --count 10000
```

### BOW - Compliance standardization
```bash
chalkml -bow standard --standard GAAP financial.csv
chalkml -bow format --column date --type date --pattern "YYYY-MM-DD" data.csv
```

## Python API

```python
from chalkml import get_chalkml_engine
from chalkml.quantum_engine import get_quantum_engine

# Core operations
engine = get_chalkml_engine()
engine.remove_column("data.csv", "01N")
engine.fill_smart("data.csv", "03N", "mean")

# Advanced patterns
quantum = get_quantum_engine()
quantum.compress_file("data.csv", "schema", "out.csv")
```


## License

MIT License

## Authors

Hope Mogale & MY Pitsane  
Mankind Research Labs (mankindlabs@protonmail.com)

**Version 1.0.0** | Production-ready | 15+ validated use cases
