Metadata-Version: 2.1
Name: projectionSVD
Version: 0.1.5
Summary: Projection into SVD space for genetic data
Home-page: https://github.com/Rosemeis/projectionSVD
Author: Jonas Meisner
Author-email: meisnerucph@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cython>3.0.0
Requires-Dist: numpy>2.0.0

# projectionSVD (v0.1.5)
[![DOI](https://zenodo.org/badge/866019962.svg)](https://doi.org/10.5281/zenodo.13881621)\
`projectionSVD` is a small command-line program written in Python/Cython to project a dataset onto a principal component space based on genotype data. It takes binary PLINK files as genotype input and works with PCA output from programs like [`halkoSVD`](https://github.com/Rosemeis/halkoSVD), `PLINK`, and `PCAone`. `projectionSVD` requires estimated allele frequencies, eigenvalues and SNP loadings to perform the projection.

## Installation
```bash
# Option 1: Build and install via PyPI
pip install projectionSVD

# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/projectionSVD.git
cd projectionSVD
pip install .

# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/projectionSVD.git
conda env create -f projectionSVD/environment.yml
conda activate projectionSVD
```
You can now run the program with the `projectionSVD` command. 


## Quick usage
```bash
# Check help message of the program
projectionSVD -h

# Perform projection using PCAone output
projectionSVD --bfile new --freqs old.afreq --eigvals old.eigvals --loadings old.loadings --threads 32 --out new

# Outputs eigenvectors of new dataset (new.eigvecs)
```

### Options
* `--freqs-col`, specify which column to use in frequency file (6)
* `--batch`, process projection in batches of specified number of SNPs (8192)
* `--raw`, only output eigenvectors without FID/IID
