Metadata-Version: 2.4
Name: docktdeep
Version: 0.1.1
Summary: A deep learning model for protein-ligand binding affinity prediction
Author-email: "Matheus M. P. da Silva" <matheusp@posgrad.lncc.br>
License: LGPL-2.1-or-later
Project-URL: Homepage, https://github.com/gmmsb-lncc/docktdeep
Project-URL: Repository, https://github.com/gmmsb-lncc/docktdeep
Project-URL: Issues, https://github.com/gmmsb-lncc/docktdeep/issues
Keywords: protein-ligand,binding-affinity,deep-learning,drug-discovery
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: COPYING.LESSER
Requires-Dist: torch>=2.0.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: pytorch-lightning>=2.0.0
Requires-Dist: lightning>=2.5.0
Requires-Dist: docktgrid>=0.0.3
Requires-Dist: biopandas>=0.4.0
Requires-Dist: pandas>=1.3.0
Provides-Extra: full
Requires-Dist: aim>=3.29.0; extra == "full"
Requires-Dist: lightning>=2.5.0; extra == "full"
Requires-Dist: torchmetrics>=1.8.0; extra == "full"
Requires-Dist: scipy>=1.16.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: mypy>=0.910; extra == "dev"
Dynamic: license-file

# DockTDeep

Preprint: **"Data-centric training enables meaningful interaction learning in protein–ligand binding affinity prediction."** [ChemRXiv.](https://chemrxiv.org/engage/chemrxiv/article-details/68a52850728bf9025e40d9e4)

## 💾 Installation

> [!TIP]
> Always use a virtual environment to manage dependencies.
> 
> ```bash
> python -m venv .venv
> source .venv/bin/activate
> ```

### Using pip

Quick setup for inference. Install the package directly from PyPI:

```bash
pip install docktdeep
```



## 🚀 Quick start

### Basic usage

Predict binding affinities for protein-ligand pairs _(predictions are given in kcal/mol)_.

```bash
# single protein-ligand pair
docktdeep predict --proteins protein.pdb --ligands ligand.pdb --output-csv results.csv

# multiple pairs
docktdeep predict \
    --proteins protein1.pdb protein2.pdb \
    --ligands ligand1.pdb ligand2.pdb \
    --output-csv results.csv \
    --max-batch-size 16

# options available in help
docktdeep predict --help
```

> [!TIP]
> Use shell globbing patterns to process multiple files efficiently.
> ```bash
> # using regex expansion
> docktdeep predict \
>    --proteins $(ls path/to/proteins/*_protein.pdb) \
>    --ligands $(ls path/to/ligands/*_ligand.pdb)
>
> # another example using find command for more complex patterns
> docktdeep predict \
>    --proteins $(find /data/complexes -name "*_protein_prep.pdb" | sort) \
>    --ligands $(find /data/complexes -name "*_ligand_rnum.pdb" | sort)
> ```


## ⚙️ Development setup

For development and training custom models:

```bash
# clone the repository
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep

# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# install deps
python -m pip install -r requirements.txt

# run tests to verify installation
python -m pytest tests/
```

### Training models

Initialize a new aim repository for tracking experiments:

```bash
aim init

# to start the aim server
aim server
```

To see all available training options:

```bash
python train.py --help
```


Train a model with optimized hyperparameters:

```bash
python train.py \
    --model Baseline \
    --experiment experiment-name \
    --depthwise-convs \
    --adaptive-pooling \
    --optim AdamW \
    --max-epochs 1500 \
    --batch-size 64 \
    --lr 0.00087469 \
    --beta1 0.25693012 \
    --eps 0.00032933 \
    --dropout 0.25348994 \
    --wdecay 0.0000169 \
    --molecular-dropout 0.06 \
    --molecular-dropout-unit complex \
    --random-rotation \
    --dataframe-path path/to/dataframe.csv \
    --root-dir path/to/data/PDBbind2020 \
    --ligand-path-pattern "{c}/{c}_ligand_rnum.pdb" \
    --protein-path-pattern "{c}/{c}_protein_prep.pdb" \
    --split-column random_split
```



## 📝 Citation

If you use DockTDeep in your research, please cite:

```bibtex
@article{dasilva2025docktdeep,
  title={Data-centric training enables meaningful interaction learning in protein--ligand binding affinity prediction},
  author={da Silva, Matheus M. P. and Vidal, Lincon and Guedes, Isabella and de Magalh{\~a}es, Camila and Cust{\'o}dio, F{\'a}bio and Dardenne, Laurent},
  year={2025}
}
```

### Related
- **DockTGrid: a python package for generating deep learning-ready voxel grids of molecular complexes.** [GitHub](https://github.com/gmmsb-lncc/docktgrid).
