# FOR ADAM:

pip install LRphase

This will install the current PyPI (https://pypi.org/project/LRphase/) build as a command line tool and as a python library that you can import and use to build phasing pipelines.

The current build has most if not all of the library available and functional. I will post examples of how to use the library soon.


# GREG UPLOAD NOTES:
```
rm -r LRphase_env/
python3 -m venv LRphase_env
source LRphase_env/bin/activate
rm -rf build
rm -rf src/*.egg-info
python3 -m pip install --upgrade pip setuptools wheel pysam pyliftover biopython twine build
python3 setup.py clean --all sdist bdist_wheel
twine upload --skip-existing dist/*tar.gz
pip install LRphase
```

# LRphase
A tool for phasing individual reads when haplotype information is available. 
* Main functions: (1) Assigns phase to individual reads and (2) provides an estimate of phasing quality (phred-scaled phasing error rate)
* Recommended Usage: (1) Phasing low coverage read data, (2) phasing noisy long-read data generated by nanopore sequencing, and (3) tagging or filtering reads according to phase, phasing quality, phase_set, and/or sample.

## Requirements

Full Requirements: 
* Python version 3.7 or higher and the numpy, matplotlib, scipy, pysam, and PyVCF Python packages. (Minimap2, samtools, bgzip, and tabix must be installed and added to PATH for full functionality)
```
conda create --name LRphase -c conda-forge python=3.7 pysam numpy matplotlib scipy pyvcf
conda activate LRphase
```
Minimum Requirements (phasing only): 
* The phasing module requires python version 3.7 or higher and the numpy, pysam, and PyVCF Python packages. 

## Installation

pysam installation: https://pysam.readthedocs.io/en/latest/installation.html#installation
conda config --add channels r
conda config --add channels bioconda
conda install pysam

From github
```
git clone https://github.com/castrocp/LRphase
cd LRphase/
python3.7 -m pip install -e .
```
From PyPI
```
python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps LRphase-gregfar
```
From conda
```
conda install LRphase
```
# LRphase phasing mode

## Input File Requirements

(1) Haplotype information:
* phased VCF file: must be bgzipped (.vcf.gz) and have a tabix index in same folder (.vcf.gz.tbi). If a .vcf file is provided but it is not bgzipped or indexed, LRphase will convert it to the correct format. (tabix and bgzip must be installed and available on the user's PATH for conversion)
          
(2) Long Reads that orginated from the genomic DNA of cells specific to the VCF file must be provided in one of two formats:
* Unaligned: FASTQ long read file (.fastq or .fastq.gz) and reference genome FASTA file. (minimap2 required on PATH for this option)
  OR
* Aligned: Alignment files in sorted BAM (.sorted.bam) format with an INDEX in same folder (.sorted.bam.bai). If a SAM or unsorted BAM or unindexed BAM alignment file is provided as input, LRphase will convert it to the proper format. 

## Output files
All LRphase options will produce at least two files: a text file with a summary of the run and a table of summary statistics for each read processed. If pysam is available LRphase will distribute the reads from the input BAM file into 4 separate BAM files based on their phasing assignment. (paternal_reads.bam, maternal_reads.bam, unphased_reads.bam, and nonphasable_reads.bam)
