molecular_simulations.analysis.ipSAE module
- class molecular_simulations.analysis.ipSAE.ModelParser(structure)
Bases:
objectHelper class to read in and process a structure file for downstream scoring tasks. Capable of reading both PDB and CIF formats.
- Parameters:
structure (PathLike) – Path to PDB or CIF file.
- classify_chains()
Reads through residue data to assign the identity of each chain as either protein (by default) or nucleic acid if an NA residue is detected.
- Return type:
- Returns:
None
- static package_line(atom_num, atom_name, residue_name, chain_id, residue_id, x, y, z)
Packs various information from a single line of a structure file into a dictionary to maintain consistency.
- static parse_cif_line(line, fields)
Parses a single line of a CIF file, extracting atom and residue information. Processes this into a dictionary and returns the dict.
- static parse_pdb_line(line, *args)
Parses a single line of a PDB file, extracting atom and residue information. Processes this into a dictionary and returns the dict.
- class molecular_simulations.analysis.ipSAE.ScoreCalculator(chains, chain_pair_type, n_residues, pdockq_cutoff=8.0, pae_cutoff=12.0, dist_cutoff=10.0)
Bases:
objectComputes various model quality scores including: pDockQ, pDockQ2, LIS, ipTM and the ipSAE score.
- Parameters:
chains (np.ndarray) – Array of chainIDs.
chain_pair_type (dict[str, str]) – Dictionary mapping of chainID to chain type.
n_residues (int) – Number of residues total in structure.
pdockq_cutoff (float) – Defaults to 8.0 Å.
pae_cutoff (float) – Defaults to 12.0 Å.
dist_cutoff (float) – Defaults to 10.0 Å.
- compute_LIS(chain1, chain2)
Computes Local Interaction Score (LIS) which is based on a subset of the predicted aligned error using a cutoff of 12. Values range in the interval (0, 1] and can be interpreted as how accurate a fold is within the error cutoff where a mean error of 0 yields a LIS value of 1 and a mean error that approaches 12 has a LIS value that approaches 0. Adapted from: https://doi.org/10.1101/2024.02.19.580970.
- static compute_d0(L, pair_type)
- Return type:
-
Computes d0 term per the following equation. $d0 = min(1.0, 1.24 * (L - 15)^}(
- Parameters:
rac{1}{3})} - 1.8)$
- Arguments:
L (int): Length of sequence up to 27 residues. pair_type (str): Whether or not chain is a nucleic acid.
- Returns:
(float): d0
- compute_ipTM_ipSAE(chain1, chain2)
Computes the ipTM and ipSAE scores for a given pair of chains. These operations are combined since they rely on very similar processing of the data.
- compute_pDockQ_scores(chain1, chain2)
Computes both the pDockQ and pDockQ2 scores for the interface between two chains. pDockQ is dependent solely on the pLDDT matrix while pDockQ2 is dependent on both pLDDT and the PAE matrix.
- compute_pTM = <numpy.vectorize object>
- compute_scores(distances, pLDDT, PAE)
Based on the input distance, pLDDT and PAE matrices, compute the pairwise pDockQ, pDockQ2, LIS, ipTM and ipSAE scores.
- get_max_values()
Because some scores like ipSAE are not symmetric, meaning A->B != B->A, we take the maximal score for either direction to be the undirected score. Here we scrape through the internal dataframe and keeps only the rows with the maximal values.
- Return type:
- Returns:
None
- static pDockQ2_score(x)
- Return type:
-
Computes pDockQ2 score per the following equation. $pDockQ =
rac{1.31}{(1 + e^{-0.075 * (x - 84.733)}) + 0.005}$
Details on the pDockQ2 score at: https://doi.org/10.1093/bioinformatics/btad424
- Arguments:
x (float): Mean pLDDT score scaled by mean PAE score.
- Returns:
(float): pDockQ2 score
- static pDockQ_score(x)
- Return type:
-
Computes pDockQ score per the following equation. $pDockQ =
rac{0.724}{(1 + e^{-0.052 * (x - 152.611)}) + 0.018}$
Details on the pDockQ score at: https://doi.org/10.1038/s41467-022-28865-w
- Arguments:
- x (float): Mean pLDDT score scaled by the log10 number of residue pairs
that meet pLDDT and distance cutoffs.
- Returns:
(float): pDockQ score
- class molecular_simulations.analysis.ipSAE.ipSAE(structure_file, plddt_file, pae_file, out_path=None)
Bases:
objectCompute the interaction prediction Score from Aligned Errors for a model. Adapted from https://doi.org/10.1101/2025.02.10.637595. Currently supports only outputs which provide plddt and pae data which limits us to Boltz and AlphaFold.
- Parameters:
structure_file (PathLike) – Path to PDB/CIF model.
plddt_file (PathLike) – Path to plddt npy file.
pae_file (PathLike) – Path to pae npy file.
out_path (PathLike | None) – Defaults to None. Path for outputs, or if None, will use the parent path from the plddt file.
- load_PAE_file()
Loads PAE file and returns data.
- Returns:
Array of PAE values.
- Return type:
(np.ndarray)
- load_pLDDT_file()
Loads pLDDT file and scales data by 100.
- Returns:
Scaled pLDDT array.
- Return type:
(np.ndarray)
- parse_structure_file()
Runs parser to read in structure file and extract relevant details.
- Return type:
- Returns:
None
- run()
Main logic of class. Parses structure file, computes distogram, unpacks pLDDT and PAE, feeds data to scorer and saves out scores.
- Return type:
- Returns:
None