Docking Evaluation#

Classes and pipelines for docking-based evaluation of generated molecules.

Docking Classes#

AutoDock Vina docking evaluation pipeline.

VinaSmiles class adapted from Therapeutic Data Commons (TDC) [1].

Requires: vina, meeko; openbabel only if protonating ligands.

References

shepherd_score.evaluations.docking.docking.embed_conformer_from_smiles_fixed(smiles, attempts=50, MMFF_optimize=True, random_seed=123456789)[source]#

Embed SMILES into a 3D RDKit mol with ETKDG and optional MMFF94.

Parameters:
  • smiles (str) – SMILES string.

  • attempts (int, optional) – Max embedding attempts. Default is 50.

  • MMFF_optimize (bool, optional) – Run MMFF94 optimization. Default is True.

  • random_seed (int, optional) – Random seed for embedding. Default is 123456789.

Returns:

Molecule with 3D conformer.

Return type:

Chem.Mol

class shepherd_score.evaluations.docking.docking.VinaBase(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', path_to_bin='', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#

Bases: object

Base class for Vina scoring and docking.

Parameters:
  • receptor_pdbqt_file (str)

  • center (Tuple[float])

  • box_size (Tuple[float])

  • pH (float)

  • scorefunction (str)

  • num_processes (int)

  • verbose (int)

  • protonate_method (Literal['openbabel', 'molscrub', 'chemaxon'])

  • path_to_bin (str)

  • cxcalc_exe (str | None)

  • molconvert_exe (str | None)

  • chemaxon_license_path (str | None)

__init__(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', path_to_bin='', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#
Parameters:
  • receptor_pdbqt_file (str) – Path to receptor PDBQT file.

  • center (tuple of float, length 3) – Pocket center coordinates.

  • box_size (tuple of float, length 3) – Search box edge lengths.

  • pH (float, optional) – pH for protonation. Default is 7.4.

  • scorefunction (str, optional) – ‘vina’ or ‘ad4’. Default is ‘vina’.

  • num_processes (int, optional) – CPUs for scoring. Default is 4.

  • verbose (int, optional) – Vina verbosity (0 = silent). Default is 0.

  • protonate_method ({'openbabel', 'molscrub', 'chemaxon'}, optional) – Protonation method. Default is ‘molscrub’.

  • path_to_bin (str, optional) – Path to OpenBabel binaries. Default is ‘’.

  • cxcalc_exe (str or None, optional) – Path to cxcalc. Default is None.

  • molconvert_exe (str or None, optional) – Path to molconvert. Default is None.

  • chemaxon_license_path (str or None, optional) – Path to ChemAxon license. Default is None.

load_ligand_from_smiles(ligand_smiles, protonate=False, return_all=False)[source]#

Load ligand from SMILES; optionally protonate and embed.

Parameters:
  • ligand_smiles (str) – SMILES string.

  • protonate (bool, optional) – Protonate at instance pH. Default is False.

  • return_all (bool, optional) – If True and protonate=True, return all protomers. Default is False.

Returns:

RDKit mols with 3D conformers.

Return type:

list of Chem.Mol

load_ligand_from_sdf(sdf_file)[source]#

Load ligand from SDF; embed from SMILES if no conformer.

Parameters:

sdf_file (str) – Path to SDF file.

Returns:

Molecule with 3D coords.

Return type:

Chem.Mol

Raises:

ValueError – If SDF has no conformer and embedding fails.

dock_ligand(ligand, output_file=None, exhaustiveness=8, n_poses=5)[source]#

Given a ligand, do a global optimization and return the best energy and optionally the pose.

Parameters:
  • ligand (Chem.Mol) – Ligand to dock.

  • output_file (str or None, optional) – Path to save poses. Default is None.

  • exhaustiveness (int, optional) – Monte Carlo runs per pose. Default is 8.

  • n_poses (int, optional) – Number of poses to save. Default is 5.

Returns:

(total_energy, torsion_energy, docked_mol) in kcal/mol, or None on failure.

Return type:

tuple or None

score_ligand(ligand, center=False)[source]#

Score ligand in current conformation (no optimization).

Parameters:
  • ligand (Chem.Mol) – Ligand to score.

  • center (bool or tuple of float, optional) – If True, center to receptor box. If tuple (x,y,z), center there. If False, use current coords. Default is False.

Returns:

(total_energy, torsion_energy) in kcal/mol.

Return type:

tuple of np.float64

optimize_ligand(ligand, center=False, max_steps=10000, output_file=None)[source]#

Locally optimize ligand pose in the binding site.

Parameters:
  • ligand (Chem.Mol) – Ligand to optimize.

  • center (bool or tuple of float, optional) – If True, center to receptor box. If tuple (x,y,z), center there. If False, use current coords. Default is False.

  • max_steps (int or None, optional) – Max optimization steps. None uses Vina default. Default is 10000.

  • output_file (str or None, optional) – Path to save optimized pose. Default is None.

Returns:

(total_energy, torsion_energy, optimized_mol) in kcal/mol.

Return type:

tuple

save_pose_to_file(output_file, n_poses=1)[source]#

Write current pose(s) to file.

Parameters:
  • output_file (str) – Output path.

  • n_poses (int, optional) – Number of poses (only when state is ‘docked’). Default is 1.

class shepherd_score.evaluations.docking.docking.VinaSmiles(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#

Bases: VinaBase

Docking from SMILES (embed + optional protonation). Adapted from TDC.

Parameters:
  • receptor_pdbqt_file (str)

  • center (Tuple[float])

  • box_size (Tuple[float])

  • pH (float)

  • scorefunction (str)

  • num_processes (int)

  • verbose (int)

  • protonate_method (Literal['openbabel', 'molscrub', 'chemaxon'])

  • cxcalc_exe (str | None)

  • molconvert_exe (str | None)

  • chemaxon_license_path (str | None)

__init__(receptor_pdbqt_file, center, box_size, pH=7.4, scorefunction='vina', num_processes=4, verbose=0, *, protonate_method='molscrub', cxcalc_exe=None, molconvert_exe=None, chemaxon_license_path=None)[source]#
Parameters:
  • receptor_pdbqt_file (str) – Path to receptor PDBQT file.

  • center (tuple of float, length 3) – Pocket center coordinates.

  • box_size (tuple of float, length 3) – Search box edge lengths.

  • pH (float, optional) – pH for protonation. Default is 7.4.

  • scorefunction (str, optional) – ‘vina’ or ‘ad4’. Default is ‘vina’.

  • num_processes (int, optional) – CPUs for scoring. Default is 4.

  • verbose (int, optional) – Vina verbosity (0 = silent). Default is 0.

  • protonate_method ({'openbabel', 'molscrub', 'chemaxon'}, optional) – Protonation method. Default is ‘molscrub’.

  • cxcalc_exe (str or None, optional) – Path to cxcalc. Default is None.

  • molconvert_exe (str or None, optional) – Path to molconvert. Default is None.

  • chemaxon_license_path (str or None, optional) – Path to ChemAxon license. Default is None.

__call__(ligand_smiles, output_file=None, exhaustiveness=8, n_poses=5, protonate=False, return_best_protomer=False)[source]#

Dock ligand SMILES in receptor; return best energy and pose.

Parameters:
  • ligand_smiles (str) – SMILES of ligand to dock.

  • output_file (str or None, optional) – Path to save poses. Default is None.

  • exhaustiveness (int, optional) – Monte Carlo runs per pose. Default is 8.

  • n_poses (int, optional) – Number of poses to save. Default is 5.

  • protonate (bool, optional) – Protonate at instance pH. Default is False.

  • return_best_protomer (bool, optional) – If True, dock all protomers and return best by energy. Default is False. Returned SMILES may be different from the input SMILES due to protonation.

Returns:

(energy in kcal/mol, docked Chem.Mol).

Return type:

tuple

Docking Pipelines#

Autodock Vina Docking evaluation pipelines.

Requires: - vina - meeko - openbabel (if protonating ligands)

class shepherd_score.evaluations.docking.pipelines.DockingEvalPipeline(pdb_id, num_processes=4, docking_target_info_dict=None, verbose=0, path_to_bin='')[source]#

Bases: object

Parameters:
  • pdb_id (str)

  • num_processes (int)

  • docking_target_info_dict (Dict | None)

  • verbose (int)

  • path_to_bin (str)

__init__(pdb_id, num_processes=4, docking_target_info_dict=None, verbose=0, path_to_bin='')[source]#

Constructor for docking evaluation pipeline.

Initializes VinaSmiles with receptor pdbqt.

Parameters:
  • pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.

  • num_processes (int, optional) – Number of CPUs to use for scoring. Default is 4.

  • docking_target_info_dict (dict, optional) –

    Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:

    {"1iep": {"center": (15.614, 53.380, 15.455),
              "size": (15, 15, 15),
              "pdbqt": "path_to_file.pdbqt"}}
    

  • verbose (int, optional) – Level of verbosity from vina.Vina (0 is silent). Default is 0.

  • path_to_bin (str, optional) – Path to environment bin containing mk_prepare_ligand.py. Default is ‘’.

evaluate(smiles_ls, exhaustiveness=32, n_poses=1, protonate=False, save_poses_dir_path=None, verbose=False, num_workers=1, num_processes=4, return_best_protomer=False, *, mp_context='spawn')[source]#

Loop through supplied list of SMILES strings, dock, and collect energies.

Parameters:
  • smiles_ls (List[str] list of SMILES to dock)

  • exhaustiveness (int (default = 32) Number of Monte Carlo simulations to run per pose)

  • n_poses (int (default = 1) Number of poses to save)

  • protonate (bool (default = False) Use protonation protocol)

  • save_poses_dir_path (Optional[str] (default = None) Path to directory to save docked poses.)

  • verbose (bool (default = False) show tqdm progress bar for each SMILES.)

  • num_workers (int (default = 1) number of parallel worker processes.) – Only recommended if smiles_ls is > 100 due to start-up overhead of new processes.

  • num_processes (int (default = 4) number of processes each worker uses internally for Vina.) – Constraint: num_workers * num_processes <= available CPUs

  • mp_context (Literal['spawn', 'forkserver'] context for multiprocessing.)

  • return_best_protomer (bool)

Return type:

List of energies (affinities) in kcal/mol

evaluate_relax(mol_ls, center=False, max_steps=10000, save_poses_dir_path=None, verbose=False, num_workers=1, *, mp_context='spawn')[source]#

Loop through supplied list of mol objects, optimize, and collect energies.

Parameters:
  • mol_ls (List[Chem.Mol] list of rdkit mol objects to relax)

  • center (bool or tuple of float (default = False)) – If a tuple, centers to those coordinates. If True, centers the ligand to the receptor’s center. If False, does not translate the ligand from its initial conformation.

  • max_steps (int or None (default = 10000) Maximum number of steps to take in the optimization.) – If None, uses the default value of 10000.

  • save_poses_dir_path (Optional[str] (default = None) Path to directory to save optimized poses.)

  • verbose (bool (default = False) show tqdm progress bar for each mol.)

  • num_workers (int (default = 1) number of parallel worker processes.)

  • mp_context (Literal['spawn', 'forkserver'] context for multiprocessing.)

Return type:

List of energies (affinities) in kcal/mol

benchmark(exhaustiveness=32, n_poses=5, protonate=False, save_poses_dir_path=None)[source]#

Run benchmark with experimental ligands.

Parameters:
  • exhaustiveness (int (default = 32) Number of Monte Carlo simulations to run per pose)

  • n_poses (int (default = 5) Number of poses to save)

  • protonate (bool (default = False) (de-)protonate ligand with OpenBabel at pH=7.4)

  • save_poses_dir_path (Optional[str] (default = None) Path to directory to save docked poses.)

Returns:

float

Return type:

Energies (affinities) in kcal/mol

to_pandas(docked_mol_as_molblock=False, sort_by_energies=True, reset_index=True)[source]#

Convert the attributes of generated smiles and the energies to a pd.DataFrame

Parameters:
  • docked_mol_as_molblock (bool (default = False) Whether to convert the docked mol to a molblock)

  • reset_index (bool (default = True) Whether to reset the index)

  • sort_by_energies (bool (default = True) Whether to sort the dataframe by energies)

Returns:

pd.DataFrame

Return type:

attributes for each evaluated sample

to_pandas_relaxed(docked_mol_as_molblock=False, sort_by_energies=True, reset_index=True)[source]#

Convert the attributes of relaxed mols and the energies to a pd.DataFrame

Returns:

pd.DataFrame

Return type:

attributes for each relaxed sample

Parameters:
  • docked_mol_as_molblock (bool)

  • sort_by_energies (bool)

  • reset_index (bool)

shepherd_score.evaluations.docking.pipelines.run_docking_benchmark(save_dir_path, pdb_id, num_processes=4, docking_target_info_dict=None, protonate=False)[source]#

Run docking benchmark on experimental SMILES.

Uses an exhaustiveness of 32 and saves the top-30 poses to a specified location.

Parameters:
  • save_dir_path (str) – Path to save docked poses to.

  • pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.

  • num_processes (int, optional) – Number of CPUs to use for scoring. Default is 4.

  • docking_target_info_dict (dict, optional) –

    Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:

    {"1iep": {"center": (15.614, 53.380, 15.455),
              "size": (15, 15, 15),
              "pdbqt": "path_to_file.pdbqt",
              "ligand": "SMILES string of experimental ligand"}}
    

  • protonate (bool, optional) – Whether to protonate ligands at a given pH. Requires "pH" field to be filled out in docking_target_info_dict. Default is False.

Return type:

None

shepherd_score.evaluations.docking.pipelines.run_docking_evaluation(atoms, positions, pdb_id, num_processes=4, docking_target_info_dict=None, exhaustiveness=32, n_poses=1, protonate=False, save_poses_dir_path=None, verbose=True, num_workers=1, *, mp_context='spawn')[source]#

Run docking evaluation with an exhaustiveness of 32.

Parameters:
  • atoms (list) – List of np.ndarray (N,) of atomic numbers of the generated molecule or (N, M) one-hot encoding.

  • positions (list) – List of np.ndarray (N, 3) of coordinates for the generated molecule’s atoms.

  • pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.

  • num_processes (int, optional) – Number of CPUs to use for Autodock Vina. Default is 4.

  • docking_target_info_dict (dict, optional) –

    Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:

    {"1iep": {"center": (15.614, 53.380, 15.455),
              "size": (15, 15, 15),
              "pdbqt": "path_to_file.pdbqt"}}
    

  • exhaustiveness (int, optional) – Number of Monte Carlo simulations to run per pose. Default is 32.

  • n_poses (int, optional) – Number of poses to save. Default is 1.

  • protonate (bool, optional) – Use protonation protocol. Default is False.

  • save_poses_dir_path (str, optional) – Path to directory to save docked poses. Default is None.

  • verbose (bool, optional) – Show tqdm progress bar for each SMILES. Default is True.

  • num_workers (int, optional) – Number of parallel worker processes. Default is 1.

  • mp_context (str, optional) – Context for multiprocessing. One of ‘spawn’ or ‘forkserver’. Default is ‘spawn’.

Returns:

Results are found in the buffer attribute {‘smiles’: energy} or in smiles and energies which preserves the order of provided atoms/positions as a list.

Return type:

DockingEvalPipeline

shepherd_score.evaluations.docking.pipelines.run_docking_evaluation_from_smiles(smiles, pdb_id, num_processes=4, docking_target_info_dict=None, exhaustiveness=32, n_poses=1, protonate=False, save_poses_dir_path=None, verbose=True, num_workers=1, *, mp_context='spawn')[source]#

Run docking evaluation with an exhaustiveness of 32.

Parameters:
  • smiles (list) – List of SMILES strings. These must be valid or None.

  • pdb_id (str) – PDB ID of receptor. Natively only supports: 1iep, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11.

  • num_processes (int, optional) – Number of CPUs to use for Autodock Vina. Default is 4.

  • docking_target_info_dict (dict, optional) –

    Dict holding minimum information needed for docking. Defaults to the built-in target info for the seven targets listed above. Custom dicts must follow the format:

    {"1iep": {"center": (15.614, 53.380, 15.455),
              "size": (15, 15, 15),
              "pdbqt": "path_to_file.pdbqt"}}
    

  • exhaustiveness (int, optional) – Number of Monte Carlo simulations to run per pose. Default is 32.

  • n_poses (int, optional) – Number of poses to save. Default is 1.

  • protonate (bool, optional) – Use protonation protocol. Default is False.

  • save_poses_dir_path (str, optional) – Path to directory to save docked poses. Default is None.

  • verbose (bool, optional) – Show tqdm progress bar for each SMILES. Default is True.

  • num_workers (int, optional) – Number of parallel worker processes. Default is 1.

  • mp_context (str, optional) – Context for multiprocessing. One of ‘spawn’ or ‘forkserver’. Default is ‘spawn’.

Returns:

Results are found in the buffer attribute {‘smiles’: energy} or in smiles and energies which preserves the order of provided SMILES as a list.

Return type:

DockingEvalPipeline

Docking Targets#

Module contains target information for docking evaluation.

Protein-Ligand Interactions#