Metadata-Version: 2.4
Name: HIRID-MEDS
Version: 0.0.2
Summary: An ETL pipeline to extract HIRID data into the MEDS format.
Author-email: Robin van de Water <robin.vandewater@hpi.de>
Project-URL: Homepage, https://github.com/mmcdermott/ETL_MEDS_Template
Project-URL: Issues, https://github.com/mmcdermott/ETL_MEDS_Template/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: meds-transforms~=0.2.4
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: hydra-core
Requires-Dist: loguru
Requires-Dist: polars<1.30.0
Provides-Extra: dev
Requires-Dist: pre-commit<4; extra == "dev"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Provides-Extra: local-parallelism
Requires-Dist: hydra-joblib-launcher; extra == "local-parallelism"
Provides-Extra: slurm-parallelism
Requires-Dist: hydra-submitit-launcher; extra == "slurm-parallelism"
Dynamic: license-file

# HIRID MEDS ETL

[![codecov](https://codecov.io/gh/mmcdermott/ETL_MEDS_Template/graph/badge.svg?token=RW6JXHNT0W)](https://codecov.io/gh/mmcdermott/ETL_MEDS_Template)
[![tests](https://github.com/mmcdermott/ETL_MEDS_Template/actions/workflows/tests.yaml/badge.svg)](https://github.com/mmcdermott/ETL_MEDS_Template/actions/workflows/tests.yml)
[![code-quality](https://github.com/mmcdermott/ETL_MEDS_Template/actions/workflows/code-quality-main.yaml/badge.svg)](https://github.com/mmcdermott/ETL_MEDS_Template/actions/workflows/code-quality-main.yaml)
![python](https://img.shields.io/badge/-Python_3.11-blue?logo=python&logoColor=white)
[![license](https://img.shields.io/badge/License-MIT-green.svg?labelColor=gray)](https://github.com/mmcdermott/ETL_MEDS_Template#license)
[![PRs](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/mmcdermott/ETL_MEDS_Template/pulls)
[![contributors](https://img.shields.io/github/contributors/mmcdermott/ETL_MEDS_Template.svg)](https://github.com/mmcdermott/ETL_MEDS_Template/graphs/contributors)
![Static Badge](https://img.shields.io/badge/MEDS-0.3.3-blue)

Warning: This ETL currently needs a lot of resources to run.

This repository contains the ETL (Extract, Transform, Load) code to convert the HIRID dataset
into the [MEDS](https://medical-event-data-standard.github.io/) ecosystem.

```bash
pip install HIRID_MEDS # you can do this locally or via PyPI
# Download your data or set download credentials
MEDS_extract-HIRID root_output_dir=$ROOT_OUTPUT_DIR do_download=true raw_input_dir=$RAW_INPUT_DIR
```

## MEDS-transforms settings

If you want to convert a large dataset, you can use parallelization with MEDS-transforms
(the MEDS-transformation step that takes the longest).

Using local parallelization with the `hydra-joblib-launcher` package, you can set the number of workers:

```
pip install hydra-joblib-launcher --upgrade
```

Then, you can set the number of workers as environment variable:

```bash
export N_WORKERS=8
```

Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many
subjects you have in your dataset:

```bash
export N_SUBJECTS_PER_SHARD=100000
```

## Citation

If you use this dataset, please cite the original publication below and the ETL (see cite this repository):

```
Faltys, M., Zimmermann, M., Lyu, X., Hüser, M., Hyland, S., Rätsch, G., & Merz, T. (2021). HiRID, a high time-resolution ICU dataset (version 1.1.1). PhysioNet. https://doi.org/10.13026/nkwc-js72.

Hyland, S.L., Faltys, M., Hüser, M. et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat Med 26, 364–373 (2020). https://doi.org/10.1038/s41591-020-0789-4
```
