molecular_simulations.analysis.ipSAE module

class molecular_simulations.analysis.ipSAE.ModelParser(structure)

Bases: object

Helper class to read in and process a structure file for downstream scoring tasks. Capable of reading both PDB and CIF formats.

Parameters:

structure (PathLike) – Path to PDB or CIF file.

classify_chains()

Reads through residue data to assign the identity of each chain as either protein (by default) or nucleic acid if an NA residue is detected.

Return type:

None

Returns:

None

property nucleic_acids: list[str]

Stores the canonical resnames for RNA and DNA residues.

Returns:

List of nucleic acid resnames.

Return type:

(list[str])

static package_line(atom_num, atom_name, residue_name, chain_id, residue_id, x, y, z)

Packs various information from a single line of a structure file into a dictionary to maintain consistency.

Parameters:
  • atom_num (str) – Atom index.

  • atom_name (str) – Atom name.

  • residue_name (str) – Resname.

  • chain_id (str) – ChainID.

  • residue_id (str) – ResID.

  • x (str) – X coordinate.

  • y (str) – Y coordinate.

  • z (str) – Z coordinate.

Returns:

Dictionary representation of data.

Return type:

(dict[str, Any])

static parse_cif_line(line, fields)

Parses a single line of a CIF file, extracting atom and residue information. Processes this into a dictionary and returns the dict.

Parameters:
  • line (str) – Actual line from CIF file.

  • fields (dict[str, int]) – Definition of where each field is found.

Returns:

Dictionary representation of data.

Return type:

(dict[str, Any])

static parse_pdb_line(line, *args)

Parses a single line of a PDB file, extracting atom and residue information. Processes this into a dictionary and returns the dict.

Parameters:
  • line (str) – Actual line from PDB file.

  • *args – Just here so we can use the same API for PDB and CIF.

Returns:

Dictionary representation of data.

Return type:

(dict[str, Any])

parse_structure_file()

Identify filetype, and parses line by line, storing relevant data for all C-alpha, C-beta and C1, C3 atoms for proteins and nucleic acids alike.

Return type:

None

Returns:

None

class molecular_simulations.analysis.ipSAE.ScoreCalculator(chains, chain_pair_type, n_residues, pdockq_cutoff=8.0, pae_cutoff=12.0, dist_cutoff=10.0)

Bases: object

Computes various model quality scores including: pDockQ, pDockQ2, LIS, ipTM and the ipSAE score.

Parameters:
  • chains (np.ndarray) – Array of chainIDs.

  • chain_pair_type (dict[str, str]) – Dictionary mapping of chainID to chain type.

  • n_residues (int) – Number of residues total in structure.

  • pdockq_cutoff (float) – Defaults to 8.0 Å.

  • pae_cutoff (float) – Defaults to 12.0 Å.

  • dist_cutoff (float) – Defaults to 10.0 Å.

compute_LIS(chain1, chain2)

Computes Local Interaction Score (LIS) which is based on a subset of the predicted aligned error using a cutoff of 12. Values range in the interval (0, 1] and can be interpreted as how accurate a fold is within the error cutoff where a mean error of 0 yields a LIS value of 1 and a mean error that approaches 12 has a LIS value that approaches 0. Adapted from: https://doi.org/10.1101/2024.02.19.580970.

Parameters:
  • chain1 (str) – The string name of the first chain.

  • chain2 (str) – The string name of the second chain.

Returns:

The LIS value for both chains.

Return type:

(float)

static compute_d0(L, pair_type)
Return type:

float

Computes d0 term per the following equation. $d0 = min(1.0, 1.24 * (L - 15)^}(

Parameters:

rac{1}{3})} - 1.8)$

Arguments:

L (int): Length of sequence up to 27 residues. pair_type (str): Whether or not chain is a nucleic acid.

Returns:

(float): d0

compute_ipTM_ipSAE(chain1, chain2)

Computes the ipTM and ipSAE scores for a given pair of chains. These operations are combined since they rely on very similar processing of the data.

Parameters:
  • chain1 (str) – The first chain to compare.

  • chain2 (str) – The second chain to compare.

Returns:

A tuple containing the ipTM and ipSAE scores respectively.

Return type:

(tuple[float])

compute_pDockQ_scores(chain1, chain2)

Computes both the pDockQ and pDockQ2 scores for the interface between two chains. pDockQ is dependent solely on the pLDDT matrix while pDockQ2 is dependent on both pLDDT and the PAE matrix.

Parameters:
  • chain1 (str) – The string name of the first chain.

  • chain2 (str) – The string name of the first chain.

Returns:

A tuple of the pDockQ and pDockQ2 scores respectively.

Return type:

(tuple[float, float])

compute_pTM = <numpy.vectorize object>
compute_scores(distances, pLDDT, PAE)

Based on the input distance, pLDDT and PAE matrices, compute the pairwise pDockQ, pDockQ2, LIS, ipTM and ipSAE scores.

Return type:

None

Returns:

None

Parameters:
get_max_values()

Because some scores like ipSAE are not symmetric, meaning A->B != B->A, we take the maximal score for either direction to be the undirected score. Here we scrape through the internal dataframe and keeps only the rows with the maximal values.

Return type:

None

Returns:

None

static pDockQ2_score(x)
Return type:

float

Computes pDockQ2 score per the following equation. $pDockQ =

rac{1.31}{(1 + e^{-0.075 * (x - 84.733)}) + 0.005}$

Details on the pDockQ2 score at: https://doi.org/10.1093/bioinformatics/btad424

Arguments:

x (float): Mean pLDDT score scaled by mean PAE score.

Returns:

(float): pDockQ2 score

static pDockQ_score(x)
Return type:

float

Computes pDockQ score per the following equation. $pDockQ =

rac{0.724}{(1 + e^{-0.052 * (x - 152.611)}) + 0.018}$

Details on the pDockQ score at: https://doi.org/10.1038/s41467-022-28865-w

Arguments:
x (float): Mean pLDDT score scaled by the log10 number of residue pairs

that meet pLDDT and distance cutoffs.

Returns:

(float): pDockQ score

permute_chains()

Helper function that gives all permutations of chainID except the pair (self, self) for each chainID. This also ensures that if we have (A, B) we do not also store (B, A).

Return type:

None

Returns:

None

class molecular_simulations.analysis.ipSAE.ipSAE(structure_file, plddt_file, pae_file, out_path=None)

Bases: object

Compute the interaction prediction Score from Aligned Errors for a model. Adapted from https://doi.org/10.1101/2025.02.10.637595. Currently supports only outputs which provide plddt and pae data which limits us to Boltz and AlphaFold.

Parameters:
  • structure_file (PathLike) – Path to PDB/CIF model.

  • plddt_file (PathLike) – Path to plddt npy file.

  • pae_file (PathLike) – Path to pae npy file.

  • out_path (PathLike | None) – Defaults to None. Path for outputs, or if None, will use the parent path from the plddt file.

load_PAE_file()

Loads PAE file and returns data.

Returns:

Array of PAE values.

Return type:

(np.ndarray)

load_pLDDT_file()

Loads pLDDT file and scales data by 100.

Returns:

Scaled pLDDT array.

Return type:

(np.ndarray)

parse_structure_file()

Runs parser to read in structure file and extract relevant details.

Return type:

None

Returns:

None

prepare_scorer()

Prepares scorer for computing various scores.

Return type:

None

Returns:

None

run()

Main logic of class. Parses structure file, computes distogram, unpacks pLDDT and PAE, feeds data to scorer and saves out scores.

Return type:

None

Returns:

None

save_scores()

Saves scores dataframe to a parquet file.

Return type:

None

Returns:

None