Metadata-Version: 2.4
Name: h5rdmtoolbox
Version: 2.5.0
Summary: Supporting a FAIR Research Data lifecycle using Python and HDF5.
Home-page: https://h5rdmtoolbox.readthedocs.io/en/latest/
Author: Matthias Probst
Author-email: matth.probst@gmail.com
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Scientific/Engineering
Requires-Python: <3.14,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: appdirs~=1.4
Requires-Dist: importlib_resources<7.0.0,>=6.5.2
Requires-Dist: numpy<3.0.0,>=1.20
Requires-Dist: h5py~=3.8
Requires-Dist: matplotlib>=3.5.2
Requires-Dist: IPython>=7.34.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: xarray>=2022.3.0
Requires-Dist: pint>=0.24.4
Requires-Dist: pint_xarray<=0.6.0,>=0.2.1
Requires-Dist: regex>=2020.7.9
Requires-Dist: packaging>=24.1
Requires-Dist: ontolutils<1.0.0,>=0.21.1
Requires-Dist: python-forge==18.6.0
Requires-Dist: requests>=2.32.4
Requires-Dist: pydantic~=2.8
Requires-Dist: rdflib~=7.1
Requires-Dist: click>=8.1.7
Requires-Dist: cftime>=1.6.4
Requires-Dist: pyshacl<1.0.0,>=0.30.1
Provides-Extra: database
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "database"
Provides-Extra: layout-validation
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "layout-validation"
Provides-Extra: csv
Requires-Dist: pandas>=1.4.3; extra == "csv"
Provides-Extra: snt
Requires-Dist: xmltodict<=0.13.0; extra == "snt"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "snt"
Requires-Dist: python-gitlab; extra == "snt"
Requires-Dist: pypandoc>=1.11; extra == "snt"
Provides-Extra: gui
Requires-Dist: PyQt5==5.15.10; extra == "gui"
Provides-Extra: test
Requires-Dist: pytest>=8.3.3; extra == "test"
Requires-Dist: pytest-cov>=5.0.0; extra == "test"
Requires-Dist: pylint; extra == "test"
Requires-Dist: ssnolib>=1.5.1.4; extra == "test"
Requires-Dist: mongomock==4.1.2; extra == "test"
Requires-Dist: xmltodict<=0.13.0; extra == "test"
Requires-Dist: scipy>=1.10.1; extra == "test"
Requires-Dist: scikit-image>=0.21.0; extra == "test"
Requires-Dist: scikit-learn; extra == "test"
Requires-Dist: pandas>=1.4.3; extra == "test"
Requires-Dist: xmltodict<=0.13.0; extra == "test"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "test"
Requires-Dist: python-gitlab; extra == "test"
Requires-Dist: pypandoc>=1.11; extra == "test"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "test"
Provides-Extra: docs
Requires-Dist: pandas>=1.4.3; extra == "docs"
Requires-Dist: xmltodict<=0.13.0; extra == "docs"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "docs"
Requires-Dist: python-gitlab; extra == "docs"
Requires-Dist: pypandoc>=1.11; extra == "docs"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "docs"
Requires-Dist: pytest>=8.3.3; extra == "docs"
Requires-Dist: pytest-cov>=5.0.0; extra == "docs"
Requires-Dist: pylint; extra == "docs"
Requires-Dist: ssnolib>=1.5.1.4; extra == "docs"
Requires-Dist: mongomock==4.1.2; extra == "docs"
Requires-Dist: xmltodict<=0.13.0; extra == "docs"
Requires-Dist: scipy>=1.10.1; extra == "docs"
Requires-Dist: scikit-image>=0.21.0; extra == "docs"
Requires-Dist: scikit-learn; extra == "docs"
Requires-Dist: pandas>=1.4.3; extra == "docs"
Requires-Dist: xmltodict<=0.13.0; extra == "docs"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "docs"
Requires-Dist: python-gitlab; extra == "docs"
Requires-Dist: pypandoc>=1.11; extra == "docs"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "docs"
Requires-Dist: Sphinx==8.0.0; extra == "docs"
Requires-Dist: sphinx_book_theme==1.1.3; extra == "docs"
Requires-Dist: sphinx-copybutton==0.5.2; extra == "docs"
Requires-Dist: sphinx-design==0.6.1; extra == "docs"
Requires-Dist: myst-nb==1.2.0; extra == "docs"
Requires-Dist: sphinxcontrib-bibtex==2.6.3; extra == "docs"
Requires-Dist: scikit-image>=0.21.0; extra == "docs"
Requires-Dist: scikit-learn; extra == "docs"
Provides-Extra: complete
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "complete"
Requires-Dist: PyQt5==5.15.10; extra == "complete"
Requires-Dist: pytest>=8.3.3; extra == "complete"
Requires-Dist: pytest-cov>=5.0.0; extra == "complete"
Requires-Dist: pylint; extra == "complete"
Requires-Dist: ssnolib>=1.5.1.4; extra == "complete"
Requires-Dist: mongomock==4.1.2; extra == "complete"
Requires-Dist: xmltodict<=0.13.0; extra == "complete"
Requires-Dist: scipy>=1.10.1; extra == "complete"
Requires-Dist: scikit-image>=0.21.0; extra == "complete"
Requires-Dist: scikit-learn; extra == "complete"
Requires-Dist: pandas>=1.4.3; extra == "complete"
Requires-Dist: xmltodict<=0.13.0; extra == "complete"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "complete"
Requires-Dist: python-gitlab; extra == "complete"
Requires-Dist: pypandoc>=1.11; extra == "complete"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "complete"
Provides-Extra: complete-with-docs
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "complete-with-docs"
Requires-Dist: PyQt5==5.15.10; extra == "complete-with-docs"
Requires-Dist: pytest>=8.3.3; extra == "complete-with-docs"
Requires-Dist: pytest-cov>=5.0.0; extra == "complete-with-docs"
Requires-Dist: pylint; extra == "complete-with-docs"
Requires-Dist: ssnolib>=1.5.1.4; extra == "complete-with-docs"
Requires-Dist: mongomock==4.1.2; extra == "complete-with-docs"
Requires-Dist: xmltodict<=0.13.0; extra == "complete-with-docs"
Requires-Dist: scipy>=1.10.1; extra == "complete-with-docs"
Requires-Dist: scikit-image>=0.21.0; extra == "complete-with-docs"
Requires-Dist: scikit-learn; extra == "complete-with-docs"
Requires-Dist: pandas>=1.4.3; extra == "complete-with-docs"
Requires-Dist: xmltodict<=0.13.0; extra == "complete-with-docs"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "complete-with-docs"
Requires-Dist: python-gitlab; extra == "complete-with-docs"
Requires-Dist: pypandoc>=1.11; extra == "complete-with-docs"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "complete-with-docs"
Requires-Dist: pandas>=1.4.3; extra == "complete-with-docs"
Requires-Dist: xmltodict<=0.13.0; extra == "complete-with-docs"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "complete-with-docs"
Requires-Dist: python-gitlab; extra == "complete-with-docs"
Requires-Dist: pypandoc>=1.11; extra == "complete-with-docs"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "complete-with-docs"
Requires-Dist: pytest>=8.3.3; extra == "complete-with-docs"
Requires-Dist: pytest-cov>=5.0.0; extra == "complete-with-docs"
Requires-Dist: pylint; extra == "complete-with-docs"
Requires-Dist: ssnolib>=1.5.1.4; extra == "complete-with-docs"
Requires-Dist: mongomock==4.1.2; extra == "complete-with-docs"
Requires-Dist: xmltodict<=0.13.0; extra == "complete-with-docs"
Requires-Dist: scipy>=1.10.1; extra == "complete-with-docs"
Requires-Dist: scikit-image>=0.21.0; extra == "complete-with-docs"
Requires-Dist: scikit-learn; extra == "complete-with-docs"
Requires-Dist: pandas>=1.4.3; extra == "complete-with-docs"
Requires-Dist: xmltodict<=0.13.0; extra == "complete-with-docs"
Requires-Dist: tabulate<=0.9.0,>=0.8.10; extra == "complete-with-docs"
Requires-Dist: python-gitlab; extra == "complete-with-docs"
Requires-Dist: pypandoc>=1.11; extra == "complete-with-docs"
Requires-Dist: pymongo<=4.10.1,>=4.2.0; extra == "complete-with-docs"
Requires-Dist: Sphinx==8.0.0; extra == "complete-with-docs"
Requires-Dist: sphinx_book_theme==1.1.3; extra == "complete-with-docs"
Requires-Dist: sphinx-copybutton==0.5.2; extra == "complete-with-docs"
Requires-Dist: sphinx-design==0.6.1; extra == "complete-with-docs"
Requires-Dist: myst-nb==1.2.0; extra == "complete-with-docs"
Requires-Dist: sphinxcontrib-bibtex==2.6.3; extra == "complete-with-docs"
Requires-Dist: scikit-image>=0.21.0; extra == "complete-with-docs"
Requires-Dist: scikit-learn; extra == "complete-with-docs"
Dynamic: license-file

# HDF5 Research Data Management Toolbox

![Tests](https://github.com/matthiasprobst/h5RDMtoolbox/actions/workflows/tests.yml/badge.svg)
[![codecov](https://codecov.io/gh/matthiasprobst/h5RDMtoolbox/graph/badge.svg?token=IVG4AQEW47)](https://codecov.io/gh/matthiasprobst/h5RDMtoolbox)
[![Documentation Status](https://readthedocs.org/projects/h5rdmtoolbox/badge/?version=latest)](https://h5rdmtoolbox.readthedocs.io/en/latest/?badge=latest)
![pyvers](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)

*Note, that the project is still under development!*

The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a Python package supporting everybody who is working with
HDF5 to achieve a sustainable data lifecycle which follows
the [FAIR (Findable, Accessible, Interoperable, Reusable)](https://www.nature.com/articles/sdata201618)
principles. It specifically supports the five main steps of *planning*, *collecting*, *analyzing*, *sharing* and
*reusing* data. Please visit the [documentation](https://h5rdmtoolbox.readthedocs.io/en/latest/) for detailed
information of try the [quickstart using colab](#quickstart).

## Highlights

- Combining HDF5 and [xarray](https://docs.xarray.dev/en/stable/) to allow easy access to metadata and data during
  analysis and processing (
  see [here](https://h5rdmtoolbox.readthedocs.io/en/latest/gettingstarted/quickoverview.html#datasets-xarray-interface)).
- Assigning [metadata with "globally unique and persistent identifiers"]() as required
  by [F1 of the FAIR principles](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).
  This can be achieved by using [RDF triples](https://www.w3.org/RDF/), which removes "ambiguity in the meaning of your
  published data".
- Define standard attributes through
  [conventions](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/convention/index.html) and enforce users to use
  certain attributes in their HDF5 files, such as units and a description, for example.
- Upload HDF5 files directly
  to [repositories](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/repository/index.html)
  like [Zenodo](https://zenodo.org/)
  or [use them with noSQL databases](https://h5rdmtoolbox.readthedocs.io/en/latest/userguide/database/index.html) like
  [mongoDB](https://www.mongodb.com/).

## Who is the package for?

For everybody, who is...

- ... looking for a management approach for his or her data.
- ... community has not yet established a stable convention.
- ... working with small and big data, that fits into HDF5 files.
- ... looking for an easy way to work with HDF5, especially through [Jupyter Notebooks](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html).
- ... trying to integrate HDF5 with repositories and databases.
- ... wishing to enrich data semantically with the RDF standard.
- ... looking for a way to do all the above whiles not needing to learn a new syntax.
- ... new to HDF5 and wants to learn about it, especially with respect to the FAIR principles and data management.

## Who is it not for?

For everybody, who ...

- ... is looking for a management approach which at the same time allows high-performance and/or parallel work with HDF5
- ... has already well-established conventions and managements approaches in his or her community

## Package Architecture/structure

The toolbox implements six modules, which are shown below. The numbers reference to their main usage in the stages in
the data lifecycle above. The wrapper module implements the main interface between the user and the HDF5 file. It 
extends the features of the underlying `h5py` library. Some of the features are implemented in other modules, hence the 
wrapper module depends on the convention, database and linked data (ld) module.

<a href="https://h5rdmtoolbox.readthedocs.io/en/latest/"><img src="docs/_static/h5tbx_modules.svg" alt="H5TBX modules" style="widht:600px;"></a>

Current implementation highlights in the modules:

- The **wrapper** module adds functionality on top of the `h5py` package. It allows to include so-called standard names,
  which are defined in conventions. And it implements interfaces, such as to the package `xarray`, which allows to carry
  metadata from HDF5 to the user. Other high-level interfaces like `.rdf` allows assigning semantic information to the
  HDF5 file.
- For the **database** module, `hdfDB` and `mongoDB` are implemented. The `hdfDB` module allows to use HDF5 files as a
  database. The `mongoDB` module allows to use mongoDB as a database by mapping the metadata of HDF5 files to the
  database.
- For the **repository** module, a Zenodo interface is implemented. Zenodo is a repository, which allows to upload and
  download data with a persistent identifier.
- For the **convention** module,
  the [standard attributes](https://h5rdmtoolbox.readthedocs.io/en/latest/conventions/standard_attributes_and_conventions.html)
  are implemented.
- The **layout** module allows to define expectations on the internal layout (object names, location, attributes,
  properties) of HDF5 files.

## Quickstart

A quickstart notebook can be tested by clicking on the following badge:

[![Open Quickstart Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/matthiasprobst/h5RDMtoolbox/blob/main/docs/colab/quickstart.ipynb)

## Documentation

Please find a comprehensive documentation with many examples [here](https://h5rdmtoolbox.readthedocs.io/en/latest/) or
by click on the image, which shows the research data lifecycle in the center and the respective toolbox features on the
outside:

A paper is published in the journal [inggrid](https://preprints.inggrid.org/repository/view/23/).

## Installation

Use python 3.9 or higher (automatic testing is performed until 3.13). If you are a regular user, you can install the
package via pip:

    pip install h5RDMtoolbox


### Install from source:

Developers may clone the repository and install the package from source. Clone the repository first:

    git clone https://github.com/matthiasprobst/h5RDMtoolbox.git@main

Then, run

    pip install h5RDMtoolbox/

Add `--user` if you do not have root access.

For development installation run

    pip install -e h5RDMtoolbox/

### Dependencies

The core functionality depends on the following packages. Some of them are for general management others are very
specific to the features of the package:

**General dependencies are ...**

- `numpy`: Scientific computing, handling of arrays
- `matplotlib`: Plotting
- `appdirs`: Managing user and application directories
- `packaging`: Version handling
- `IPython`: Pretty display of data in notebooks
- `regex`: Working with regular expressions

**Specific to the package are ...**

- `h5py`: HDF5 file interface
- `xarray`: Working with scientific arrays in combination with attributes. Allows carrying metadata from HDF5
  to user
- `pint`: Allows working with units
- `pint_xarray`: Working with units for usage with xarray
- `python-forge`: Used to update function signatures when using
  the [standard attributes](https://h5rdmtoolbox.readthedocs.io/en/latest/conventions/standard_attributes_and_conventions.html)
- `pydantic`: Used to
  validate [standard attributes](https://h5rdmtoolbox.readthedocs.io/en/latest/conventions/standard_attributes_and_conventions.html)
- `pyyaml`: Reading and writing of yaml files, e.g. metadata definitions (conventions). Note, lower versions
  collide with python 3.11
- `requests`: Used to download files from the internet or validate URLs, e.g. metadata definitions (conventions)
- `rdflib`: Used to enable working with RDF
- `ontolutils`: Required to work with RDF and derive semantic description of HDF5 file content

#### Optional dependencies

To run unit tests or to enable certain features, additional dependencies must be installed.

Install optional dependencies by specifying them in square brackets after the package name, e.g.:

    pip install h5RDMtoolbox[mongodb]

[mongodb]

- `pymongo`: Database solution for HDF5 files

[csv]

- `pandas`: Mainly used for reading csv and pretty printing

[snt]

- `xmltodict`: Reading of xml files
- `tabulate`: Pretty printing of tables
- `python-gitlab`: Access to gitlab repositories
- `pypandoc`: Conversion of markdown files to html

## Citing the package

If you intend to use the package in your work, you may cite the software itself as published on paper in the
[Zenodo (latest version)](https://zenodo.org/records/13309253) repository. A related paper is published in the
journal [inggrid](https://www.inggrid.org/article/id/4028/). Thank you!

Alternatively or additionally, you can consult the `CITATION.cff` file.

Here is the BibTeX entry:

```
@article{probst2024h5rdmtoolbox,
	author = {Matthias Probst, Balazs Pritz},
	title = {h5RDMtoolbox - A Python Toolbox for FAIR Data Management around HDF5},
	volume = {2},
	year = {2024},
	url = {https://www.inggrid.org/article/id/4028/},
	issue = {1},
	doi = {10.48694/inggrid.4028},
	month = {8},
	keywords = {Data management,HDF5,metadata,data lifecycle,Python,database},
	issn = {2941-1300},
	publisher={Universitäts- und Landesbibliothek Darmstadt},
	journal = {ing.grid}
}
```

## Contribution

Feel free to contribute. Make sure to write `docstrings` to your methods and classes and please write tests and use PEP
8 (https://peps.python.org/pep-0008/)

Please write tests for your code and put them into the `test/` folder. Visit the [README file](./tests/README.md) in the
test-folder for more information.

Pleas also add a jupyter notebook in the `docs/` folder in order to document your code. Please visit
the [README file](./docs/README.md) in the docs-folder for more information on how to compile the documentation.

Please use the **numpy style for the docstrings**:
https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy


