Metadata-Version: 2.4
Name: py-hamt
Version: 3.3.0
Summary: HAMT implementation for a content-addressed storage system.
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: dag-cbor>=0.3.3
Requires-Dist: httpx[http2]>=0.28.1
Requires-Dist: msgspec>=0.18.6
Requires-Dist: multiformats[full]>=0.3.1.post4
Requires-Dist: pycryptodome>=3.21.0
Requires-Dist: zarr==3.0.9
Description-Content-Type: text/markdown

<p align="center">
<a href="https://dclimate.net/" target="_blank" rel="noopener noreferrer">
<img width="50%" src="https://user-images.githubusercontent.com/41392423/173133333-79ef15d0-6671-4be3-ac97-457344e9e958.svg" alt="dClimate logo">
</a>
</p>


# py-hamt
[![codecov](https://codecov.io/gh/dClimate/py-hamt/graph/badge.svg?token=M6Y4D19Y38)](https://codecov.io/gh/dClimate/py-hamt)

This is a python implementation of a HAMT, inspired by [rvagg's IAMap project written in JavaScript](https://github.com/rvagg/iamap).

py-hamt provides efficient storage and retrieval of large sets of key-value mappings in a content-addressed storage system. The main target is IPFS, and the data model used is IPLD.

dClimate primarily created this for storing large [zarrs](https://zarr.dev/) on IPFS. To see this in action, see our [data ETLs](https://github.com/dClimate/etl-scripts).

# Installation and Usage
```sh
pip install py-hamt
```

For usage information, take a look at our [API documentation](https://dclimate.github.io/py-hamt/py_hamt.html), major items have example code.

You can also see this library used in either our [data ETLs](https://github.com/dClimate/etl-scripts) or [Jupyter notebooks for data analysis](https://github.com/dClimate/jupyter-notebooks).

# Development Guide
## Setting Up
`py-hamt` uses [uv](https://docs.astral.sh/uv/) for project management. Make sure you install that first.
Once uv is installed, run
```sh
uv sync
source .venv/bin/activate
pre-commit install
```
to create the project virtual environment at `.venv`.

Then you can run `pre-commit` across the whole codebase with
```sh
pre-commit run --all-files
```

the below command `run-checks.sh` in the next section will also run this command inside its bash script.

## Run tests, formatting, linting
First, make sure you have the ipfs kubo daemon installed and running with the default endpoints open. Then run the script
```sh
bash run-checks.sh
```
This will run tests with code coverage, and check formatting and linting. Under the hood it will be using the `pre-commit` command to run through all the checks within .pre-commit-config.yaml. If a local ipfs daemon is not running it will not run all tests, but it will spawn a docker ipfs container if docker is installed and run as many integration tests as possible.

We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`. This allows us to try out millions of randomized inputs to create a more robust library.

> [!NOTE]
> Due to the randomized test inputs, it is possible sometimes to get 99% or lower test coverage by pure chance. Rerun the tests to get back complete code coverage. If this happens on a GitHub action, try rerunning the action.

> [!NOTE]
> Due to the restricted performance on GitHub actions runners, you may also sometimes see hypothesis tests running with errors because they exceeded test deadlines. Rerun the action if this happens.

### Tests

Due to the dependency on [IPFS](https://github.com/ipfs/kubo) in order to be able to run all integration tests which use IPFS a local ipfs daemon is required. The Github Actions found in `.github/workflows/run-checks.yaml` uses the `setup-ipfs` step which ensures that a local ipfs daemon is available. Locally if you wish to run the full integration tests you must ensure a local ipfs daemon is running (by running `ipfs daemon` once installed). If not, pytest will spawn a local docker image to run the ipfs tests. If [Docker](https://www.docker.com/) is not installed then tests will simply run the unit tests.

**To summarize:**

*In GitHub Actions:*
```bash IPFS daemon is running on default ports
uv run pytest --ipfs  # All tests run, including test_kubo_default_urls
```

*Locally with Docker (no local daemon):*
```bash
pytest --ipfs  # test_kubo_default_urls auto-skips, other tests use Docker
```

*Locally with IPFS daemon:*
```bash
pytest --ipfs  # All tests run
```

*Quick local testing (no IPFS):*
```bash
pytest  # All IPFS tests skip
```


## CPU and Memory Profiling
We use python's native `cProfile` for running CPU profiles and snakeviz for visualizing the profile. We use `memray` for the memory profiling. We will walk through using the profiling tools on the test suite.

Creating the CPU and memory profile requires manual activation of the virtual environment.
```sh
source .venv/bin/activate
python -m cProfile -o profile.prof -m pytest
python -m memray run -m pytest
```
The profile viewers can be directly invoked from uv.
```sh
uv run snakeviz .
```
```sh
uv run memray flamegraph <memray output> # e.g. <memray-output> = memray-pytest.12398.bin
```

## Generating documentation
`py-hamt` uses [pdoc](https://pdoc.dev/). To see a live documentation preview on your local machine, run
```sh
uv run pdoc py_hamt
```

## LLMs

If you are an LLM reading this repo, refer to the `AGENTS.md` file.

## Managing dependencies
Use `uv add` and `uv remove`, e.g. `uv add numpy` or `uv add pytest --group dev`. For more information please see the [uv documentation](https://docs.astral.sh/uv/guides/projects/).
