# HuDa — Humanitarian Data Library

HuDa is a practical Python library for humanitarian data workflows. It provides simple, consistent functions to open, clean, transform, validate, analyze, map, visualize, automate, and share humanitarian datasets.

- Focused on survey, 5W, monitoring, and geo-enabled data
- Consistent API patterns across modules
- Returns lightweight specs for rendering/exports where appropriate

## Features
- **Opening**: CSV/Excel/JSON/SQL/API connectors
- **Cleaning**: normalize numbers/dates/text, translate categories, deduplicate, geocode
- **Transformation**: reshape, aggregate, indexes, ratios, growth, standardization
- **Validation & Quality**: ranges, missing/mandatory, country codes, dates, profiling
- **Geospatial**: folium maps, choropleths, overlays, heatmaps, clusters
- **Analysis**: correlation, time series, regression, PCA, coverage gaps (selected utilities)
- **Visualization**: chart specs for bar/line/pie/hist/box/heatmap, dashboards
- **Automation**: reports, snapshots, change detection (specs)
- **Interoperability**: export specs (CSV/Excel/JSON/Parquet/SQL/Stata/SPSS/GIS/HDX/HTML/API)

## Installation
HuDa is published on PyPI as `huda`.

```bash
pip install huda
```

Minimum Python version: 3.8

Some modules rely on optional libraries (e.g., folium, geopandas, scikit-learn). See Requirements below if you plan to use those features.

## Quickstart
```python
import polars as pl
from huda.cleaning import translate_categories
from huda.transformation import percentage_calculation
from huda.Interoperability import export_csv

# Example data
df = pl.DataFrame({
    "province": ["Kabul", "Herat"],
    "cluster": ["wash", "wash"],
    "reached": [1200, 900],
    "target": [2000, 1100],
})

# Cleaning
df2 = translate_categories(df, columns={"cluster": {"wash": "WASH"}})

# Transformation
df3 = percentage_calculation(df2, numerator_col="reached", denominator_col="target", output_col="coverage_pct")

# Interoperability (returns intent spec; does not write files)
spec = export_csv(df3, path="/tmp/coverage.csv")
print(spec)
```

## Module Highlights

### Opening
```python
from huda.opening import open_csv, open_excel, open_json
df = open_csv("/path/data.csv")
```

### Cleaning
```python
from huda.cleaning import numbers_standardization, dates_standardization, duplicate
df = numbers_standardization(df, columns=["reached"])  # normalize numeric fields
df = dates_standardization(df, column="report_date", style="iso")
df = duplicate(df, columns=["id"], keep="first")
```

### Transformation
```python
from huda.transformation import pivot_unpivot, severity_index_calculation
df_wide = pivot_unpivot(df, mode="pivot", index=["province"], columns="cluster", values="reached")
df_idx = severity_index_calculation(df, components=["fcs","rcsi"], weights={"fcs":0.6,"rcsi":0.4})
```

### Validation & Quality
```python
from huda.validation_and_quality import country_code_validation, automatic_data_profiling_report
report = automatic_data_profiling_report(df)
valid = country_code_validation(df, data_col="country")
```

### Geospatial
```python
from huda.geospatial import choropleth_maps_by_region
html_map = choropleth_maps_by_region(df, region_col="province", value_col="reached", geojson_path="/path/afg_provinces.geojson")
with open("map.html", "w", encoding="utf-8") as f:
    f.write(html_map)
```

### Visualization (specs)
```python
from huda.visualize import bar_chart, interactive_dashboard
chart = bar_chart(df, category_col="province", value_col="reached")
dashboard = interactive_dashboard(charts=[chart])
```

### Interoperability (specs)
These functions return intent specs you can pass to renderers/uploaders.

```python
from huda.Interoperability import (
    export_csv, export_excel, export_json, export_parquet,
    export_sql_database, export_stata, export_spss,
    export_shapefile, export_geojson, export_hdx_dataset,
    share_dashboard_html, api_integration_output,
)

spec_csv = export_csv(df, path="/tmp/data.csv")
spec_sql = export_sql_database(df, connection_uri="postgresql://user:pass@host:5432/db", table_name="huda_export")
spec_geo = export_geojson(df, path="/tmp/data.geojson", geometry_col="geom")
spec_dash = share_dashboard_html(dashboard, path="/tmp/dashboard.html", embed_assets=True)
```

## Requirements
Core requirements and optional dependencies are specified in `requirements.txt`.

If you plan to use geospatial and mapping utilities, you’ll need packages like `folium` and `geopandas` (which may require system libraries on some platforms). For ML utilities (e.g., outlier isolation), you’ll need `scikit-learn`.

## Development
```bash
python -m venv .venv
. .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

Run a quick sanity check:
```bash
python -c "import huda, polars as pl; print('HuDa OK')"
```

## Building & Publishing (maintainers)
HuDa uses PEP 517/518 builds via Hatchling (configured in `pyproject.toml`).

```bash
python -m pip install --upgrade build twine
python -m build
# TestPyPI upload
twine upload --repository testpypi dist/*
# PyPI upload
twine upload dist/*
```

## Contributing
Contributions are welcome. Please open an issue to discuss improvements or new utilities aligned with humanitarian workflows.

## License
MIT License. See `pyproject.toml` and add a `LICENSE` file for full text.

## Links
- **Repository**: https://github.com/fiafghan/HuDa
- **Issues**: https://github.com/fiafghan/HuDa/issues
- **Training website**: in `huda_website/` (React + Tailwind; run with Vite)

