Docking Evaluation#
Classes and pipelines for docking-based evaluation of generated molecules.
Docking Classes#
AutoDock Vina docking evaluation pipeline.
VinaSmiles class adapted from Therapeutic Data Commons (TDC) [1].
Requires: vina, meeko; openbabel only if protonating ligands.
References
- shepherd_score.evaluations.docking.docking.embed_conformer_from_smiles_fixed(smiles, attempts=50, MMFF_optimize=True, random_seed=123456789)[source]#
Embed SMILES into a 3D RDKit mol with ETKDG and optional MMFF94.
- Parameters:
- Returns:
Molecule with 3D conformer.
- Return type:
Chem.Mol
- class shepherd_score.evaluations.docking.docking.VinaBase(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', path_to_bin='', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#
Bases:
objectBase class for Vina scoring and docking.
- Parameters:
- __init__(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', path_to_bin='', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#
- Parameters:
receptor_pdbqt_file (str) – Path to receptor PDBQT file.
center (tuple of float, length 3) – Pocket center coordinates.
box_size (tuple of float, length 3) – Search box edge lengths.
pH (float, optional) – pH for protonation. Default is 7.4.
scorefunction (str, optional) – ‘vina’ or ‘ad4’. Default is ‘vina’.
num_processes (int, optional) – CPUs for scoring. Default is 4.
verbose (int, optional) – Vina verbosity (0 = silent). Default is 0.
protonate_method ({'openbabel', 'molscrub', 'chemaxon'}, optional) – Protonation method. Default is ‘molscrub’.
path_to_bin (str, optional) – Path to OpenBabel binaries. Default is ‘’.
cxcalc_exe (str or None, optional) – Path to cxcalc. Default is None.
molconvert_exe (str or None, optional) – Path to molconvert. Default is None.
chemaxon_license_path (str or None, optional) – Path to ChemAxon license. Default is None.
- load_ligand_from_smiles(ligand_smiles, protonate=False, return_all=False)[source]#
Load ligand from SMILES; optionally protonate and embed.
- load_ligand_from_sdf(sdf_file)[source]#
Load ligand from SDF; embed from SMILES if no conformer.
- Parameters:
sdf_file (str) – Path to SDF file.
- Returns:
Molecule with 3D coords.
- Return type:
Chem.Mol
- Raises:
ValueError – If SDF has no conformer and embedding fails.
- dock_ligand(ligand, output_file=None, exhaustiveness=8, n_poses=5)[source]#
Given a ligand, do a global optimization and return the best energy and optionally the pose.
- Parameters:
- Returns:
(total_energy, torsion_energy, docked_mol) in kcal/mol, or None on failure.
- Return type:
tuple or None
- optimize_ligand(ligand, center=False, max_steps=10000, output_file=None)[source]#
Locally optimize ligand pose in the binding site.
- Parameters:
ligand (Chem.Mol) – Ligand to optimize.
center (bool or tuple of float, optional) – If True, center to receptor box. If tuple (x,y,z), center there. If False, use current coords. Default is False.
max_steps (int or None, optional) – Max optimization steps. None uses Vina default. Default is 10000.
output_file (str or None, optional) – Path to save optimized pose. Default is None.
- Returns:
(total_energy, torsion_energy, optimized_mol) in kcal/mol.
- Return type:
- class shepherd_score.evaluations.docking.docking.VinaSmiles(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#
Bases:
VinaBaseDocking from SMILES (embed + optional protonation). Adapted from TDC.
- Parameters:
- __init__(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#
- Parameters:
receptor_pdbqt_file (str) – Path to receptor PDBQT file.
center (tuple of float, length 3) – Pocket center coordinates.
box_size (tuple of float, length 3) – Search box edge lengths.
pH (float, optional) – pH for protonation. Default is 7.4.
scorefunction (str, optional) – ‘vina’ or ‘ad4’. Default is ‘vina’.
num_processes (int, optional) – CPUs for scoring. Default is 4.
verbose (int, optional) – Vina verbosity (0 = silent). Default is 0.
protonate_method ({'openbabel', 'molscrub', 'chemaxon'}, optional) – Protonation method. Default is ‘molscrub’.
cxcalc_exe (str or None, optional) – Path to cxcalc. Default is None.
molconvert_exe (str or None, optional) – Path to molconvert. Default is None.
chemaxon_license_path (str or None, optional) – Path to ChemAxon license. Default is None.
- __call__(ligand_smiles, output_file=None, exhaustiveness=8, n_poses=5, protonate=False, return_best_protomer=False)[source]#
Dock ligand SMILES in receptor; return best energy and pose.
- Parameters:
ligand_smiles (str) – SMILES of ligand to dock.
output_file (str or None, optional) – Path to save poses. Default is None.
exhaustiveness (int, optional) – Monte Carlo runs per pose. Default is 8.
n_poses (int, optional) – Number of poses to save. Default is 5.
protonate (bool, optional) – Protonate at instance pH. Default is False.
return_best_protomer (bool, optional) – If True, dock all protomers and return best by energy. Default is False. Returned SMILES may be different from the input SMILES due to protonation.
- Returns:
(energy in kcal/mol, docked Chem.Mol).
- Return type:
Docking Pipelines#
Autodock Vina Docking evaluation pipelines.
Requires: - vina - meeko - openbabel (if protonating ligands)
- class shepherd_score.evaluations.docking.pipelines.DockingEvalPipeline(pdb_id, num_processes=4, docking_target_info_dict=None, verbose=0, path_to_bin='')[source]#
Bases:
object- Parameters:
- __init__(pdb_id, num_processes=4, docking_target_info_dict=None, verbose=0, path_to_bin='')[source]#
Constructor for docking evaluation pipeline.
Initializes VinaSmiles with receptor pdbqt.
- Parameters:
pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.
num_processes (int, optional) – Number of CPUs to use for scoring. Default is 4.
docking_target_info_dict (dict, optional) –
Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:
{"1iep": {"center": (15.614, 53.380, 15.455), "size": (15, 15, 15), "pdbqt": "path_to_file.pdbqt"}}
verbose (int, optional) – Level of verbosity from vina.Vina (0 is silent). Default is 0.
path_to_bin (str, optional) – Path to environment bin containing
mk_prepare_ligand.py. Default is ‘’.
- evaluate(smiles_ls, exhaustiveness=32, n_poses=1, protonate=False, save_poses_dir_path=None, verbose=False, num_workers=1, num_processes=4, return_best_protomer=False, *, mp_context='spawn')[source]#
Loop through supplied list of SMILES strings, dock, and collect energies.
- Parameters:
exhaustiveness (int (default = 32) Number of Monte Carlo simulations to run per pose)
n_poses (int (default = 1) Number of poses to save)
protonate (bool (default = False) Use protonation protocol)
save_poses_dir_path (Optional[str] (default = None) Path to directory to save docked poses.)
verbose (bool (default = False) show tqdm progress bar for each SMILES.)
num_workers (int (default = 1) number of parallel worker processes.) – Only recommended if smiles_ls is > 100 due to start-up overhead of new processes.
num_processes (int (default = 4) number of processes each worker uses internally for Vina.) – Constraint: num_workers * num_processes <= available CPUs
mp_context (Literal['spawn', 'forkserver'] context for multiprocessing.)
return_best_protomer (bool)
- Return type:
List of energies (affinities) in kcal/mol
- evaluate_relax(mol_ls, center=False, max_steps=10000, save_poses_dir_path=None, verbose=False, num_workers=1, *, mp_context='spawn')[source]#
Loop through supplied list of mol objects, optimize, and collect energies.
- Parameters:
mol_ls (List[Chem.Mol] list of rdkit mol objects to relax)
center (bool or tuple of float (default = False)) – If a tuple, centers to those coordinates. If True, centers the ligand to the receptor’s center. If False, does not translate the ligand from its initial conformation.
max_steps (int or None (default = 10000) Maximum number of steps to take in the optimization.) – If None, uses the default value of 10000.
save_poses_dir_path (Optional[str] (default = None) Path to directory to save optimized poses.)
verbose (bool (default = False) show tqdm progress bar for each mol.)
num_workers (int (default = 1) number of parallel worker processes.)
mp_context (Literal['spawn', 'forkserver'] context for multiprocessing.)
- Return type:
List of energies (affinities) in kcal/mol
- benchmark(exhaustiveness=32, n_poses=5, protonate=False, save_poses_dir_path=None)[source]#
Run benchmark with experimental ligands.
- Parameters:
exhaustiveness (int (default = 32) Number of Monte Carlo simulations to run per pose)
n_poses (int (default = 5) Number of poses to save)
protonate (bool (default = False) (de-)protonate ligand with OpenBabel at pH=7.4)
save_poses_dir_path (Optional[str] (default = None) Path to directory to save docked poses.)
- Returns:
float
- Return type:
Energies (affinities) in kcal/mol
- to_pandas(docked_mol_as_molblock=False, sort_by_energies=True, reset_index=True)[source]#
Convert the attributes of generated smiles and the energies to a pd.DataFrame
- Parameters:
- Returns:
pd.DataFrame
- Return type:
attributes for each evaluated sample
- shepherd_score.evaluations.docking.pipelines.run_docking_benchmark(save_dir_path, pdb_id, num_processes=4, docking_target_info_dict=None, protonate=False)[source]#
Run docking benchmark on experimental SMILES.
Uses an exhaustiveness of 32 and saves the top-30 poses to a specified location.
- Parameters:
save_dir_path (str) – Path to save docked poses to.
pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.
num_processes (int, optional) – Number of CPUs to use for scoring. Default is 4.
docking_target_info_dict (dict, optional) –
Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:
{"1iep": {"center": (15.614, 53.380, 15.455), "size": (15, 15, 15), "pdbqt": "path_to_file.pdbqt", "ligand": "SMILES string of experimental ligand"}}
protonate (bool, optional) – Whether to protonate ligands at a given pH. Requires
"pH"field to be filled out in docking_target_info_dict. Default isFalse.
- Return type:
None
- shepherd_score.evaluations.docking.pipelines.run_docking_evaluation(atoms, positions, pdb_id, num_processes=4, docking_target_info_dict=None, exhaustiveness=32, n_poses=1, protonate=False, save_poses_dir_path=None, verbose=True, num_workers=1, *, mp_context='spawn')[source]#
Run docking evaluation with an exhaustiveness of 32.
- Parameters:
atoms (list) – List of np.ndarray (N,) of atomic numbers of the generated molecule or (N, M) one-hot encoding.
positions (list) – List of np.ndarray (N, 3) of coordinates for the generated molecule’s atoms.
pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.
num_processes (int, optional) – Number of CPUs to use for Autodock Vina. Default is 4.
docking_target_info_dict (dict, optional) –
Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:
{"1iep": {"center": (15.614, 53.380, 15.455), "size": (15, 15, 15), "pdbqt": "path_to_file.pdbqt"}}
exhaustiveness (int, optional) – Number of Monte Carlo simulations to run per pose. Default is 32.
n_poses (int, optional) – Number of poses to save. Default is 1.
protonate (bool, optional) – Use protonation protocol. Default is
False.save_poses_dir_path (str, optional) – Path to directory to save docked poses. Default is
None.verbose (bool, optional) – Show tqdm progress bar for each SMILES. Default is
True.num_workers (int, optional) – Number of parallel worker processes. Default is 1.
mp_context (str, optional) – Context for multiprocessing. One of ‘spawn’ or ‘forkserver’. Default is ‘spawn’.
- Returns:
Results are found in the
bufferattribute {‘smiles’: energy} or insmilesandenergieswhich preserves the order of provided atoms/positions as a list.- Return type:
- shepherd_score.evaluations.docking.pipelines.run_docking_evaluation_from_smiles(smiles, pdb_id, num_processes=4, docking_target_info_dict=None, exhaustiveness=32, n_poses=1, protonate=False, save_poses_dir_path=None, verbose=True, num_workers=1, *, mp_context='spawn')[source]#
Run docking evaluation with an exhaustiveness of 32.
- Parameters:
smiles (list) – List of SMILES strings. These must be valid or
None.pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.
num_processes (int, optional) – Number of CPUs to use for Autodock Vina. Default is 4.
docking_target_info_dict (dict, optional) –
Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:
{"1iep": {"center": (15.614, 53.380, 15.455), "size": (15, 15, 15), "pdbqt": "path_to_file.pdbqt"}}
exhaustiveness (int, optional) – Number of Monte Carlo simulations to run per pose. Default is 32.
n_poses (int, optional) – Number of poses to save. Default is 1.
protonate (bool, optional) – Use protonation protocol. Default is
False.save_poses_dir_path (str, optional) – Path to directory to save docked poses. Default is
None.verbose (bool, optional) – Show tqdm progress bar for each SMILES. Default is
True.num_workers (int, optional) – Number of parallel worker processes. Default is 1.
mp_context (str, optional) – Context for multiprocessing. One of ‘spawn’ or ‘forkserver’. Default is ‘spawn’.
- Returns:
Results are found in the
bufferattribute {‘smiles’: energy} or insmilesandenergieswhich preserves the order of provided SMILES as a list.- Return type:
Docking Targets#
Module contains target information for docking evaluation.