MoleculePair#

MoleculePair holds a reference and a fit Molecule and provides methods for scoring and aligning them using shape, ESP, or pharmacophore interaction profiles.

class shepherd_score.container._core.MoleculePair(ref_mol, fit_mol, num_surf_points=None, density=None, do_center=False, device=-1)[source]#

Bases: object

Pair of Molecule objects to facilitate alignment.

Parameters:
__init__(ref_mol, fit_mol, num_surf_points=None, density=None, do_center=False, device=-1)[source]#

A pair of molecules. A refence molecule and a fit molecule that can be aligned to the fit. There are a number of alignments that can be done:

  • Volumetric (with and without hydrogens)

  • Volumetric with partial charge weighting (with and without hydrogens)

  • Surface

  • Surface with electrostatic potential weighting

  • ShaEP scoring (esp-combo)

  • Pharmacophores (with various settings for using extended points rather than vectors)

Similarly, you can score with surface, Surf+ESP, and pharmacophores

Parameters:
  • ref_mol (Union[rdkit.Chem.rdchem.Mol, container.Molecule]) – Reference molecule. If a RDKit Mol object is provided, it will be converted to a Molecule object. If a Molecule object is given, it will NOT regenerate the surface.

  • fit_mol (Union[rdkit.Chem.rdchem.Mol, container.Molecule]) – Molecule to fit to the reference. If a RDKit Mol object is provided, it will be converted to a Molecule object. If a Molecule object is given, it will NOT regenerate the surface.

  • num_surf_points (Optional[int] (default = None)) – Number of surface points to sample if rdkit Mol objects are given. MUST provide a value for surface or ESP alignment.

  • density (Optional[float] (default = None)) – Density of points to sample if rdkit Mol objects are given. An integer intput for num_surf_points supercedes the density call.

  • do_center (bool (default = False)) – THIS IS CRUCIAL Whether to initially align molecule centers together. For global optimizations, set to True. For scoring of current alignment or local alignment set to False.

  • device (pytorch Device (default = -1)) – Device to use if you want to align with PyTorch downstream. Default places alignment computation on CPU.

align_with_vol(no_H=True, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using volumetric similarity.

Optimally aligned score found in self.sim_aligned_vol and the optimal SE(3) transformation is at self.transform_vol. If no_H is True, append ‘_noH’ to them.

Parameters:
  • no_H (bool) – Whether to not include hydrogens in volumetric similarity. Default is True.

  • num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.

  • trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s center of mass (COM) is translated to each ref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. If None, then num_repeats rotations are done with aligned COMs.

  • lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.

  • max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.

  • use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.

  • use_analytical (bool, optional) – Whether to use analytical gradients instead of PyTorch autograd. Ignored if use_jax=True. Default is True.

  • verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_vol_esp(lam, no_H=True, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using volume similarity weighted by partial charge Toggle no_H parameter for scoring with or without hydrogens.

Typically lam=0.1 is used. Optimally aligned score found in self.sim_aligned_vol_esp and the optimal SE(3) transformation is at self.transform_vol_esp. If no_H is True, append ‘_noH’ to them.

Parameters:
  • lam (float) – Partial charge weighting parameter.

  • no_H (bool) – Whether to not include hydrogens in volumetric similarity. Default is True.

  • num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.

  • trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s center of mass (COM) is translated to each ref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. If None, then num_repeats rotations are done with aligned COMs. Default is False.

  • lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.

  • max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.

  • use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.

  • verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

  • use_analytical (bool)

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_surf(alpha, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using surface similarity.

Optimally aligned score found in self.sim_aligned_surf and the optimal SE(3) transformation is at self.transform_surf.

Parameters:
  • alpha (float) – Gaussian width parameter for overlap.

  • num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.

  • trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s center of mass (COM) is translated to each ref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. If None, then num_repeats rotations are done with aligned COMs. Default is False.

  • lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.

  • max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.

  • use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.

  • use_analytical (bool, optional) – Whether to use analytical gradients instead of PyTorch autograd. Ignored if use_jax=True. Default is True.

  • verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_esp(alpha, lam=0.3, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using ESP+surface similarity. lam is scaled by (1e4/(4*55.263*np.pi))**2 for correct units.

Typically, lam=0.3 is used and is scaled internally.

Optimally aligned score found in self.sim_aligned_esp and the optimal SE(3) transformation is at self.transform_esp.

Parameters:
  • alpha (float) – Gaussian width parameter for overlap.

  • lam (float, optional) – Weighting factor for ESP scoring. Scaled internally. Default is 0.3.

  • num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.

  • trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s COM is translated to each ref_molecs’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default is False.

  • lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.

  • max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.

  • use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.

  • verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

  • use_analytical (bool)

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_esp_combo(alpha, lam=0.001, probe_radius=1.0, esp_weight=0.5, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, verbose=False)[source]#

Align using ShaEP similarity score. If alpha is 0.81, then it automatically uses volumetric shape similarity. Otherwise, it uses surface shape similarity.

Optimally aligned score found in self.sim_aligned_esp_combo and the optimal SE(3) transformation is at self.transform_esp_combo.

Parameters:
  • alpha (float) – Gaussian width parameter for overlap.

  • lam (float, optional) – ESP weighting parameter. Default is 0.001.

  • probe_radius (float, optional) – Surface points found within vdW radii + probe radius will be masked out. Surface generation uses a probe radius of 1.2 by default (radius of hydrogen) so we use a slightly lower radius for be more tolerant. Default is 1.0.

  • esp_weight (float, optional) – How much to weight shape vs esp_combo similarity ([0,1]). Default is 0.5.

  • num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.

  • trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s COM is translated to each ref_molecs’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default is False.

  • lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.

  • max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.

  • use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.

  • verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray (N, 3)

align_with_pharm(similarity='tanimoto', extended_points=False, only_extended=False, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, verbose=False, use_vectorized=True, use_analytical=True)[source]#

Align fit_molec to ref_molec using pharmacophore similarity.

Optimally aligned score found in self.sim_aligned_pharm and the optimal SE(3) transformation is at self.transform_pharm.

Parameters:
  • similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) – Specifies what similarity function to use. Options are: ‘tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.

  • extended_points (bool, optional) – Whether to score HBA/HBD with gaussian overlaps of extended points. Default is False.

  • only_extended (bool, optional) – When extended_points is True, decide whether to only score the extended points (ignore anchor overlaps). Default is False.

  • num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.

  • trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s COM is translated to each ref_molecs’s pharmacophore, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default is False.

  • lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.

  • max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.

  • use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.

  • verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

  • use_vectorized (bool, optional) – Whether to use the vectorized version of the pharmacophore scoring function. This is only relevant if use_jax=True. Default is True.

  • use_analytical (bool, optional) – Whether to use the analytical version of the pharmacophore scoring function. Currently only implemented for PyTorch. Default is True.

Returns:

aligned_fit_anchorsnp.ndarray

Aligned coordinates of pharmacophores positions. Shape: (P, 3).

aligned_fit_vectorsnp.ndarray

Aligned coordinates of pharmacophore vectors. Shape: (P, 3).

Return type:

tuple

score_with_surf(alpha, use='np')[source]#

Score fit_molec to ref_molec using surface similarity given current alignment. By default it uses the numpy implementation.

Parameters:
  • alpha (float) – Gaussian width parameter for overlap.

  • use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.

Returns:

score – Similarity score. Shape: (1,).

Return type:

np.ndarray

score_with_esp(alpha, lam=0.3, use='np')[source]#

Score fit_molec to ref_molec using ESP+surface similarity given current alignment. lam is scaled by (1e4/(4*55.263*np.pi))**2 for correct units.

Typically lam = 0.3 is used and is scaled internally. By default it uses the numpy implementation.

Parameters:
  • alpha (float) – Gaussian width parameter for overlap.

  • lam (float, optional) – Weighting factor for ESP scoring. Default is 0.3.

  • use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.

Returns:

score – Similarity score. Shape: (1,).

Return type:

np.ndarray

score_with_pharm(similarity='tanimoto', extended_points=False, only_extended=False, use='np')[source]#

Score fit_molec to ref_molec using pharmacophore similarity given current alignment. By default it uses the numpy implementation.

Parameters:
  • similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) – Specifies what similarity function to use. Options are: ‘tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.

  • extended_points (bool, optional) – Whether to score HBA/HBD with gaussian overlaps of extended points. Default is False.

  • only_extended (bool, optional) – When extended_points is True, decide whether to only score the extended points (ignore anchor overlaps). Default is False.

  • use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.

Returns:

score – Similarity score. Shape: (1,).

Return type:

np.ndarray

get_transformed_mol_and_feats(se3_transform)[source]#

Get an RDKit mol object and applicable features with a transformation applied.

Parameters:

se3_transform (np.ndarray) – SE(3) transformation matrix. Shape: (4,4).

Returns:

transformed_molrdkit.Chem.Mol

Molecule with transformed coordinates.

transformed_surf_posnp.ndarray

Transformed surface points. Shape: (N, 3).

transformed_pharm_ancsnp.ndarray

Transformed pharmacophore anchor positions. Shape: (P, 3).

transformed_pharm_vecsnp.ndarray

Transformed pharmacophore vector positions. Shape: (P, 3).

Return type:

tuple

get_transformed_molecule(se3_transform)[source]#

Get Molecule object transformation applied to all applicable features for the fit molecule.

Parameters:

se3_transform (np.ndarray) – SE(3) transformation matrix. Shape: (4,4).

Returns:

Molecule with transformed features.

Return type:

Molecule