Metadata-Version: 2.1
Name: padocc
Version: 1.3.0a0
Summary: Pipeline to Aggregate Data for Optimised Cloud Capabilities
License: {file='LICENSE'}
Author: Daniel Westwood
Author-email: daniel.westwood@stfc.ac.uk
Requires-Python: >=3.11,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: aiohttp (==3.10.10)
Requires-Dist: binpacking (>=1.5.2,<2.0.0)
Requires-Dist: cfapyx (==2024.11.27)
Requires-Dist: cfgrib (==0.9.14.1)
Requires-Dist: dask (==2024.7.0)
Requires-Dist: distributed (==2024.7.0)
Requires-Dist: fsspec (==2024.9.0)
Requires-Dist: h5py (==3.11.0)
Requires-Dist: kerchunk (==0.2.6)
Requires-Dist: matplotlib (==3.9.2)
Requires-Dist: myst-nb (>=1.1.2,<2.0.0)
Requires-Dist: rechunker (==0.5.2)
Requires-Dist: requests (==2.32.3)
Requires-Dist: s3fs (==2024.9.0)
Requires-Dist: scipy (==1.12.0)
Requires-Dist: sphinx (==7.1.2)
Requires-Dist: sphinx-rtd-theme (==2.0.0)
Requires-Dist: tifffile (>=2024.9.20,<2025.0.0)
Requires-Dist: types-pyyaml (>=6.0.12.20240917,<7.0.0.0)
Requires-Dist: xarray (==2024.6.0)
Description-Content-Type: text/markdown

# PADOCC Package

Now a repository under cedadev group!

Padocc (Pipeline to Aggregate Data for Optimal Cloud Capabilities) is a Data Aggregation pipeline for creating Kerchunk (or alternative) files to represent various datasets in different original formats.
Currently the Pipeline supports writing JSON/Parquet Kerchunk files for input NetCDF/HDF files. Further developments will allow GeoTiff, GRIB and possibly MetOffice (.pp) files to be represented, as well as using the Pangeo [Rechunker](https://rechunker.readthedocs.io/en/latest/) tool to create Zarr stores for Kerchunk-incompatible datasets.

[Example Notebooks at this link](https://mybinder.org/v2/gh/cedadev/padocc.git/main?filepath=showcase/notebooks)

[Documentation hosted at this link](https://cedadev.github.io/kerchunk-builder/)

![Kerchunk Pipeline](docs/source/_images/pipeline.png)

## Installation

To install this package, clone the repository using git clone (and switch to the MigrationOO branch - `git checkout MigrationOO` if release v1.3 has not been released.)

Then follow the steps below to install the package with the necessary dependencies.

```
python -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install
```

## Usage

Please refer to the `tests/` scripts for how to use the `GroupOperation` and `ProjectOperation` classes.

