# 🧬 PyBWA_lite — A Lightweight Burrows–Wheeler Aligner in Python

[![Build Status](https://img.shields.io/github/actions/workflow/status/soumyapriyagoswami/pybwa/tests.yml?branch=main)](https://github.com/soumyapriyagoswami/pybwa/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/pybwa.svg)](https://pypi.org/project/pybwa/)
[![Python](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/)
[![GitHub stars](https://img.shields.io/github/stars/soumyapriyagoswami/pybwa?style=social)](https://github.com/soumyapriyagoswami/pybwa)

---

### 🧠 Overview

**PyBWA_lite** is a pure-Python implementation of the **Burrows–Wheeler Aligner (BWA)** algorithm for fast, memory-efficient sequence alignment.
It supports **FM-index construction**, **banded Smith–Waterman extension**, **multithreaded alignment**, and **FASTQ I/O with quality scores** — making it a **compact, research-friendly alternative** to native BWA for teaching, prototyping, and custom pipelines.

---

### 🚀 Features

✅ **FM-Index Construction & Querying** — efficient backward search and rank/select
✅ **Banded Smith–Waterman Extension** — optimized local alignment with reduced search space
✅ **Parallel Read Alignment** — multithreaded support for FASTA/FASTQ reads
✅ **Quality-Aware FASTQ Parsing** — handles read qualities natively
✅ **SAM/BAM Output** — integrates with `pysam` for interoperability
✅ **Command-Line Interface (CLI)** — align reads or build indices directly from terminal
✅ **Lightweight & Extensible** — no compiled C/C++ backend required

---

### 🧩 Installation

#### From PyPI (once released)

```bash
pip install pybwa
```

#### From Source (Development)

```bash
git clone https://github.com/soumyapriyagoswami/pybwa.git
cd pybwa
pip install -e .
```

---

### ⚙️ Quick Start

#### 1️⃣ Build an Index

```python
from pybwa.index import build_index
fm = build_index("reference.fasta", out_path="ref_index.pkl")
```

#### 2️⃣ Align Reads (Single or FASTQ)

```python
from pybwa.align import align_reads
results = align_reads("reads.fastq", "ref_index.pkl", threads=4)
for r in results:
    print(r)
```

#### 3️⃣ Command-Line Usage

```bash
# Build index
pybwa index reference.fasta -o ref_index.pkl

# Align reads
pybwa align reads.fastq -x ref_index.pkl -t 4 -o output.sam
```

---

### 🧠 Architecture Overview

| Module        | Description                                                       |
| ------------- | ----------------------------------------------------------------- |
| `index.py`    | Builds and loads FM-index structures for references               |
| `fmidx.py`    | Implements FM-index, suffix array, and BWT                        |
| `align.py`    | Seeding, banded Smith–Waterman extension, multithreaded alignment |
| `samtools.py` | Utilities for SAM/BAM file handling using `pysam`                 |
| `cli.py`      | Command-line interface for index building and alignment           |
| `tests/`      | Pytest-based unit tests for FM-index and alignment verification   |

---

### 🧪 Running Tests

```bash
pytest -v
```

Expected output:

```
tests/test_alignment.py::test_single_read_alignment PASSED
tests/test_alignment.py::test_fastq_alignment_with_qualities PASSED
tests/test_alignment.py::test_multithreaded_alignment_consistency PASSED
tests/test_fmindex.py::test_suffix_array_sorted PASSED
tests/test_fmindex.py::test_backward_search_basic PASSED
```

---

### ⚡ Performance Highlights

| Feature         | Description                       | Speedup                  |
| --------------- | --------------------------------- | ------------------------ |
| FM-index lookup | Optimized with prefix caching     | ~3× faster               |
| Multithreading  | Parallel read alignment           | ~N× faster (per thread)  |
| Banded SW       | Reduced dynamic programming table | ~5× faster on long reads |

---

### 🧰 Dependencies

* `numpy`
* `pysam`
* `tqdm`
* `pytest` *(for testing only)*

---

### 📦 Project Structure

```
pybwa/
├── __init__.py
├── align.py
├── cli.py
├── fmidx.py
├── index.py
├── samtools.py
tests/
├── test_alignment.py
├── test_fmindex.py
```

---

### 🤝 Contributing

Contributions are welcome!
If you have ideas for improving FM-index efficiency, integrating GPU kernels, or extending to RNA-seq workflows:

1. Fork this repo
2. Create a feature branch
3. Submit a pull request 🚀

---

### 🧾 License

**MIT License**
© 2025 [Soumyapriya Goswami](https://github.com/soumyapriyagoswami)

---

### 🌐 Links

🔗 **GitHub Repository:** [soumyapriyagoswami/pybwa](https://github.com/soumyapriyagoswami/pybwa.git)
📘 **Documentation:** Coming Soon
🐍 **PyPI Package:** (after release)
