molecular_simulations.simulate.mmpbsa module
- class molecular_simulations.simulate.mmpbsa.FileHandler(top, traj, path, sels, first, last, stride, cpptraj_binary)
Bases:
objectPerforms preprocessing for MM-PBSA runs and manages the pathing to all file inputs. Additionally used to write out various cpptraj input files by the MMPBSA class.
- Parameters:
- property files: tuple[list[str]]
Returns a zip generator containing the output paths, topologies, trajectories and pdbs for each system. This is done to ensure we have the correct order for housekeeping reasons.
- prepare_topologies()
Slices out each sub-topology for the desolvated complex, receptor and ligand using cpptraj due to the difficulty of working with AMBER FF files otherwise (including PARMED).
- Return type:
- Returns:
None
- prepare_trajectories()
Converts DCD trajectory to AMBER CRD format which is explicitly required by MM-G(P)BSA.
- Return type:
- Returns:
None
- static write_file(lines, filepath)
Given an input of either a list of strings or a single string, write input to file. If a list, join by newline characters.
- class molecular_simulations.simulate.mmpbsa.MMPBSA(top, dcd, selections, first_frame=0, last_frame=-1, stride=1, n_cpus=1, out='mmpbsa', solvent_probe=1.4, offset=0, gb_surften=0.0072, gb_surfoff=0.0, amberhome='', **kwargs)
Bases:
MMPBSA_settingsThis is an experiment in patience. What follows is a reconstruction of the various pieces of code that run MM-P(G)BSA from AMBER but written in a more digestible manner with actual documentation. Herein we have un-CLI’d what should never have been a CLI and piped together the correct pieces of the ambertools ecosystem to perform MM-P(G)BSA and that alone. Your trajectory is required to be concatenated into a single continuous trajectory - or you can run this serially over each by instancing this class for each trajectory you have. In this way we have also disentangled the requirement to parallelize by use of MPI, allowing the user to choose their own parallelization/scaling scheme.
- Parameters:
top (PathLike) – Input topology for a solvated system. Should match the input trajectory.
dcd (PathLike) – Input trajectory. Can be DCD format or MDCRD already.
selections (list[str]) – A list of residue ID selections for the receptor and ligand in that order. Should be formatted for cpptraj (e.g. :1-10).
first_frame (int) – Defaults to 0. The first frame of the input trajectory to begin the calculations on.
last_frame (int) – Defaults to -1. Optional final frame to cut trajectory at. If -1, acts as a flag to run the whole trajectory.
stride (int) – Defaults to 1. The number of frames to stride the trajectory by.
n_cpus (int) – Number of parallel processes
out (str) – The prefix name or path for output files.
solvent_probe (float) – Defaults to 1.4Å. The probe radius to use for SA calculations.
offset (int) – Defaults to 0Å. I don’t know what this does.
gb_surften (float) – Defaults to 0.0072.
gb_suroff (float) – Defaults to 0.0.
gb_surfoff (float)
- calculate_energy(pre, prm, trj, pdb, mdin, suf)
Runs mmpbsa_py_energy, an undocumented binary file which somehow mysteriously computes the energy of a system. This software is not only undocumented but is a binary which we cannot inspect ourselves.
- calculate_sasa(pre, prm, trj)
Runs the molsurf command in cpptraj to compute the SASA of a given system.
- run()
Main logic of MM-PBSA. Computes the SASA with molsurf in cpptraj, and the various energy terms for GB/PB using mmpbsa_py_energy from ambertools. Finally, parse outputs and collate into a neat form consisting of a json of raw data and a plain text file of the binding free energies.
- Return type:
- Returns:
None
- class molecular_simulations.simulate.mmpbsa.MMPBSA_settings(top, dcd, selections, first_frame=0, last_frame=-1, stride=1, n_cpus=1, out='mmpbsa', solvent_probe=1.4, offset=0, gb_surften=0.0072, gb_surfoff=0.0)
Bases:
object- Parameters:
- class molecular_simulations.simulate.mmpbsa.OutputAnalyzer(path, surface_tension=0.0072, sasa_offset=0.0, _tolerance=0.005)
Bases:
objectAnalyzes the outputs from an MM-PBSA run. Stores data in a Polars dataframe internally, and writes out data in the form of json/plain text.
- check_bonded_terms()
Performs a sanity check on the bonded terms which should perfectly cancel out (e.g. complex = receptor + ligand). If this is not the case something horrible has happened and we can’t trust the non-bonded energies either. Additionally sets a few terms we will need later such as the number of frames as given by the dataframe height and sqrt(n_frames).
- Return type:
- Returns:
None
- compute_dG()
For each energy dataframe (GB/PB) compute the ∆G of binding by subtracting out relevant contributions in accordance with how this is done under the hood of the MMPBSA code.
- Return type:
- Returns:
None
- generate_summary()
Summarizes all processed energy data into a single polars dataframe and dumps it to a json file.
- Return type:
- Returns:
None
- parse_energy_file(file_contents, data, system)
Parses the contents of an energy calculation using a dictionary of energy terms to extract theory-level observables (e.g. EGB vs EPB).
- Parameters:
file_contents (list[str]) – A list of each line from an energy calculation.
data (dict[str, list]) – The relevant energy terms to be scraped from input.
system (str) – The name of the system which will be included as an additional kv pair in the returned dataframe. This ensures we can track which portion of the calculation we are accounting for (e.g. complex, receptor, ligand).
- Returns:
A Polars dataframe of shape (n_frames, n_calculations + system).
- Return type:
(pl.DataFrame)
- static parse_line(line)
Parses a line from mmpbsa_energy to get the various energy terms and values.
- pretty_print(dfs)
Ingests a list of Polars dataframes for GB and PB and prints their contents in a human-readable form to STDIN. Also saves out the energies to a plain text file called deltaG.txt.
- read_GB(_file, system)
Read in the GB mdout files and returns a Polars dataframe of the values for each term for every frame. Also adds a system label to more easily compute summary statistics later.
- Parameters:
_file (PathLike) – Energy data file path.
system (str) – String label for which system we are processing (e.g. complex).
- Returns:
Polars dataframe containing the parsed energy data.
- Return type:
(pl.DataFrame)
- read_PB(_file, system)
Read in the PB mdout files and returns a Polars dataframe of the values for each term for every frame. Also adds a system label to more easily compute summary statistics later.
- Parameters:
_file (PathLike) – Energy data file path.
system (str) – String label for which system we are processing (e.g. complex).
- Returns:
Polars dataframe containing the parsed energy data.
- Return type:
(pl.DataFrame)
- read_sasa(_file)
Reads in the results of the cpptraj SASA calculation and returns the per-frame SASA scaled by a hardcoded value for surface tension that is a mostly undocumented heuristic.
- Parameters:
_file (PathLike) – Path to a file containing the SASA data.
- Returns:
A numpy array of the per-frame rescaled SASA energies.
- Return type:
(np.ndarray)