Metadata-Version: 2.4
Name: cumulus-etl
Version: 3.4.0
Summary: Turns FHIR data into de-identified & aggregated records
Author-email: "Andy McMurry, PhD" <andrew.mcmurry@childrens.harvard.edu>, Michael Terry <michael.terry@childrens.harvard.edu>
Requires-Python: >= 3.10
Description-Content-Type: text/markdown
License-Expression: Apache-2.0
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
License-File: LICENSE
Requires-Dist: aiobotocore < 2.22.0
Requires-Dist: ctakesclient >= 5.1
Requires-Dist: cumulus-fhir-support >= 1.5
Requires-Dist: delta-spark >= 4, < 5
Requires-Dist: fsspec[http, s3]
Requires-Dist: httpx
Requires-Dist: inscriptis
Requires-Dist: label-studio-sdk
Requires-Dist: nltk >= 3.9
Requires-Dist: openai
Requires-Dist: oracledb
Requires-Dist: philter-lite
Requires-Dist: pyarrow
Requires-Dist: pyathena
Requires-Dist: pydantic
Requires-Dist: rich
Requires-Dist: pre-commit ; extra == "dev"
Requires-Dist: ruff < 0.13 ; extra == "dev"
Requires-Dist: coverage ; extra == "tests"
Requires-Dist: ddt ; extra == "tests"
Requires-Dist: jwcrypto ; extra == "tests"
Requires-Dist: moto[server, s3] >= 5.0 ; extra == "tests"
Requires-Dist: pytest ; extra == "tests"
Requires-Dist: pytest-cov ; extra == "tests"
Requires-Dist: respx ; extra == "tests"
Requires-Dist: time-machine ; extra == "tests"
Project-URL: Homepage, https://github.com/smart-on-fhir/cumulus-etl
Provides-Extra: dev
Provides-Extra: tests

# Cumulus ETL

[Cumulus](https://docs.smarthealthit.org/cumulus/)
is an entire healthcare pipeline for population-scale clinical investigations.

Cumulus ETL is the first critical piece of that pipeline.
- It **extracts** bulk patient data from your EHR.
- It **transforms** that data by anonymizing it and running NLP on clinical notes
- It **loads** that data onto the cloud to be queried by
  [Cumulus Library](https://github.com/smart-on-fhir/cumulus-library) SQL

## Documentation

For guides on installing & using Cumulus ETL,
[read our documentation](https://docs.smarthealthit.org/cumulus/etl/).

## Example

A simple run of Cumulus ETL might look something like:
```shell
docker compose run \
  cumulus-etl \
  s3://my-input-bucket/bulk-export/ \
  s3://my-output-bucket/delta-lakes/ \
  s3://my-phi-bucket/build-and-phi-artifacts/
```

This line would read ndjson files from the input bucket,
drop the result as [Delta Lakes](https://delta.io/) into the output bucket,
and save some bookkeeping configuration to a build/phi bucket.

## Contributing

We love 💖 contributions!

If you have a good suggestion 💡 or found a bug 🐛,
[read our brief contributors guide](CONTRIBUTING.md)
for pointers to filing issues and what to expect.

If you're a programmer ⌨ and are looking for a starting place to help, we keep a
[list of good bite-size issues](https://github.com/smart-on-fhir/cumulus-etl/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
for first-time contributions.

