# QCatch

[![PyPI version][badge-pypi]][pypi]
[![Tests][badge-tests]][tests]
[![Documentation][badge-docs]][documentation]

[badge-pypi]: https://img.shields.io/pypi/v/qcatch
[pypi]: https://pypi.org/project/qcatch/

[badge-tests]: https://img.shields.io/github/actions/workflow/status/COMBINE-lab/QCatch/test.yaml?branch=main
[tests]: https://github.com/COMBINE-lab/QCatch/actions/workflows/test.yaml

[badge-docs]: https://img.shields.io/badge/docs-online-blue
[documentation]: https://COMBINE-lab.github.io/QCatch

QCatch: Automated quality control of single-cell quantifications from `alevin-fry` and `simpleaf`.

View the complete [QCatch documentation](https://COMBINE-lab.github.io/QCatch) with interactive examples, FAQs, and detailed usage guides.

## Installation

You need to have Python 3.11, 3.12, or 3.13 installed on your system.

There are several alternative options to install QCatch:


#### 1. Bioconda
You can install using [Conda](http://anaconda.org/)
from [Bioconda](https://bioconda.github.io/).

```bash
conda install -c bioconda qcatch
```

#### 2. PyPI

You can also install from [PyPI](https://pypi.org/project/qcatch/) using `pip`:

```bash
pip install qcatch
```

> Tips: If you run into environment issues, you can also use the provided Conda .yml file, which specifies the exact versions of all dependencies to ensure consistency.

```bash
conda env create -f qcatch_conda_env.yml
```


## Basic Usage
Provide the path to the parent folder for quantification results, or the direct path to a .h5ad file generated by `alevin-fry` or `simpleaf`. **QCatch** will automatically scan the input path, assess data quality, and generate an interactive HTML report that can be viewed directly in your browser.

```bash
qcatch \
    --input path/to/your/quantification/result \
    --output path/to/desired/QC/output/folder \ # if you want another folder for output
    --chemistry 10X_3p_v3
    --save_filtered_h5ad

```
## Tutorial: Run QCatch on Example data
### Step 1 — Download Dataset
```bash
#!/bin/bash
set -e  # Exit immediately if a command exits with a non-zero status

echo "📦 Downloading QCatch example dataset..."

# Define where to run the tutorial (you can change this path if desired)
CWD=$(pwd)  # Current working directory
TUTORIAL_DIR="${CWD}/qcatch_tutorial"

# Clean any existing tutorial directory to ensure a fresh download
rm -rf "$TUTORIAL_DIR" && mkdir -p "$TUTORIAL_DIR"
ZIP_FILE="data.zip"

# Download from Box
wget -O "$ZIP_FILE" "https://umd.box.com/shared/static/zd4sai70uw9fs24e1qx6r41ec50pf45g.zip?dl=1"

# Unzip and clean up
unzip "$ZIP_FILE" -d "$TUTORIAL_DIR"
rm "$ZIP_FILE"

echo "✅ Test data downloaded to $TUTORIAL_DIR"
```
### Step 2 - Run the qcatch
🎉 All set! Now let’s run QCatch:
```bash
#Set up output directory
OUT_DIR="${TUTORIAL_DIR}/output"
mkdir -p "$OUT_DIR"

# Step2 - Run QCatch
qcatch --input ${TUTORIAL_DIR}/test_data/simpleaf_with_map/quants.h5ad \
       --output ${OUT_DIR} \
```
### Tips
**1- Input path:**

Provide either:

- the **path to the parent directory** containing quantification results, or
- the **direct path to a .h5ad file** generated by those tools.

QCatch will automatically detect the input type:
- If a **.h5ad file** is provided, QCatch will process it directly.
- If a **directory** is provided, QCatch will first look for an existing .h5ad file inside. If not found, it will fall back to processing the mtx-based quantification results.

See the example directory structures at the end of the Tips section for reference:

**2- Output path:**

If you do not want any modifications in your input folder/files, speaficy the output path, we will save any new results and QC HTML report there.

**_By default_**, QCatch saves the QC report and all output files in your input directory. Therefore, specifying an output path is optional. Specifically,
- If QCatch finds the `.h5ad` file from input path, it will modify the original `.h5ad` file in place by appending cell filtering results to `anndata.obs` and create a separate QC report in HTML in the input folder.
- For `mtx-based` results, QCatch will generate text files for the cell calling reuslts as well as the QC report in the input folder."

**3- Chemistry:**

If you used a standard 10X chemistry (e.g. 10X 3' v2 and v3) and performed quantification with `simpleaf`(v0.19.5 or later), QCatch can usually infer the correct chemistry automatically from the metadata.

If this inference fails, QCatch will stop and prompt you to explicitly provide the chemistry version using the `--chemistry (-c)` argument before rerunning the command. The chemistry version you provide should be one of QCatch’s supported chemistries: '10X_3p_v2', '10X_3p_v3', '10X_3p_v4', '10X_3p_LT', '10X_5p_v3', or '10X_HT'.

In rare cases where you need a custom or unsupported chemistry, you can instead specify the number of partitions manually via the `--n_partitions (-n)` flag.
This option will override any chemistry-based setting and ensures accurate estimation for cell-calling.

**3- Gene gene mapping file:**

If you are using simpleaf v0.19.3 or later, the generated .h5ad file already includes gene names. In this case, you do not need to specify the --gene_id2name_file option.

To provide a 'gene id to name mapping' info, the file should be a **TSV** containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)— **without** header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed, but will not affect the QC report.

**4- Save filtered h5ad file:**

If you want to save filtered h5ad file separately, you can specify `--save_filtered_h5ad`, which is only applicable when QCatch detects the h5ad file as the input.

**5- Specify your desired cell list:**

If you want to use a specified list of valid cell barcodes, you can provide the file path with `--valid_cell_list`. QCatch will then skip the default cell calling step and use the supplied list instead. The updated .h5ad file will include only one additional column, 'is_retained_cells', containing boolean values based on the specified list.

**6- Skip clustering plots:**

To reduce runtime, you may enable the `--skip_umap_tsne` option to bypass dimensionality reduction and visualization steps.

**7- Export the summary metrics**

To export the summary metrics, enable the `--export_summary_table` flag. The summary table will be saved as a separate CSV file in the output directory.

**8- Debug-level message**

To get debug-level messages and more intermediate computation in cell calling step, you can specify `--verbose`

**9- Re-run QCatch on modified h5ad file**
If you re-run QCatch analysis on a modified `.h5ad` file (i.e., an `.h5ad` file with additional columns added for cell calling results), the existing cell calling-related columns will be removed and then replaced with new results. The new cell calling can be generated either through QCatch's internal method or based on a user-specified list of valid cell barcodes.

**Example directory structures:**

```bash
# simpleaf
parent_quant_dir/
├── af_map/
├── af_quant/
│   ├── alevin/
│   │   ├── quants_mat_cols.txt
│   │   ├── quants_mat_rows.txt
│   │   ├── quants_mat.mtx
│   │   └── quants.h5ad (available if you use simpleaf after v0.19.3)
│   │   ...
│   ├── featureDump.txt
│   └── quant.json
└── simpleaf_quant_log.json

# alevin-fry
parent_quant_dir/
├── alevin/
│   ├── quants_mat_cols.txt
│   ├── quants_mat_rows.txt
│   └── quants_mat.mtx
├── featureDump.txt
└── quant.json

```
For more advanced options and usage details, see the sections below.

## Command-Line Arguments

| Flag | Short | Type | Description |
|------|-------|------|-------------|
| `--input`  | `-i` | `str` (Required) | Path to the input directory containing the quantification output files or to the HDF5 file itself. |
| `--output` | `-o` | `str`(Required)  | Path to the output directory.|
| `--chemistry` | `-c` | `str`(Recommended) | Specifies the chemistry used in the experiment, which determines the range for the empty_drops step. **Supported options**: '10X_3p_v2', '10X_3p_v3', '10X_3p_v4', '10X_5p_v3', '10X_3p_LT', '10X_HT'. If you used a standard 10X chemistry (e.g., '10X_3p_v2', '10X_3p_v3') and performed quantification with `simpleaf`(v0.19.5 or later), QCatch can usually **infer** the correct chemistry automatically from the metadata. If inference fails, QCatch will stop and prompt you to provide the chemistry explicitly via this flag..
|
| `--save_filtered_h5ad` | `-s` | `flag` (Optional) |If enabled, `qcatch` will save a separate `.h5ad` file containing only the retained cells.|
| `--gene_id2name_file` | `-g` | `str` (Optional) |File provides a mapping from gene IDs to gene names. The file must be a TSV containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)—without a header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed.|
| `--valid_cell_list` | `-l` | `str` (Optional) |File provides a user-specified list of valid cell barcode. The file must be a TSV containing one column with cell barcodes without a header row. If provided, qcatch will skip the internal cell calling steps and and use the supplied list instead|
| `--n_partitions` | `-n` | `int` (Optional) | Number of partitions (max number of barcodes to consider for ambient estimation). Use `--n_partitions` only when working with a custom or unsupported chemistry. When provided, this value will override the chemistry-based configuration during the cell-calling step.|
| `--skip_umap_tsne` | `-u` | `flag` (Optional) | If provided, skips generation of UMAP and t-SNE plots. |
| `--export_summary_table` | `-x` | `flag` (Optional) | If enabled, QCatch will export the summary metrics as a separate CSV file. |
| `--verbose` | `-b` | `flag` (Optional) | Enable verbose logging with debug-level messages. |
| `--version` | `-v` | `flag` (Optional) | Display the installed version of qcatch. |

<!-- ## Contact

For questions and help requests, you can reach out in the [scverse discourse][].
If you found a bug, please use the [issue tracker][]. -->

<!-- ## Citation

> t.b.a

[uv]: https://github.com/astral-sh/uv
[scverse discourse]: https://discourse.scverse.org/
[issue tracker]: https://github.com/ygao61/QCatch/issues
[tests]: https://github.com/ygao61/QCatch/actions/workflows/test.yaml
[documentation]: https://QCatch.readthedocs.io
[changelog]: https://QCatch.readthedocs.io/en/latest/changelog.html
[api documentation]: https://QCatch.readthedocs.io/en/latest/api.html
[pypi]: https://pypi.org/project/QCatch -->
