MoleculePair#
MoleculePair holds a reference and a fit
Molecule and provides methods for scoring
and aligning them using shape, ESP, or pharmacophore interaction profiles.
- class shepherd_score.container._core.MoleculePair(ref_mol, fit_mol, num_surf_points=None, density=None, do_center=False, device=-1)[source]#
Bases:
objectPair of Molecule objects to facilitate alignment.
- Parameters:
ref_mol (rdkit.Chem.rdchem.Mol | Molecule)
fit_mol (rdkit.Chem.rdchem.Mol | Molecule)
num_surf_points (int | None)
density (float | None)
do_center (bool)
- __init__(ref_mol, fit_mol, num_surf_points=None, density=None, do_center=False, device=-1)[source]#
A pair of molecules. A refence molecule and a fit molecule that can be aligned to the fit. There are a number of alignments that can be done:
Volumetric (with and without hydrogens)
Volumetric with partial charge weighting (with and without hydrogens)
Surface
Surface with electrostatic potential weighting
ShaEP scoring (esp-combo)
Pharmacophores (with various settings for using extended points rather than vectors)
Similarly, you can score with surface, Surf+ESP, and pharmacophores
- Parameters:
ref_mol (Union[rdkit.Chem.rdchem.Mol, container.Molecule]) – Reference molecule. If a RDKit Mol object is provided, it will be converted to a Molecule object. If a Molecule object is given, it will NOT regenerate the surface.
fit_mol (Union[rdkit.Chem.rdchem.Mol, container.Molecule]) – Molecule to fit to the reference. If a RDKit Mol object is provided, it will be converted to a Molecule object. If a Molecule object is given, it will NOT regenerate the surface.
num_surf_points (Optional[int] (default = None)) – Number of surface points to sample if rdkit Mol objects are given. MUST provide a value for surface or ESP alignment.
density (Optional[float] (default = None)) – Density of points to sample if rdkit Mol objects are given. An integer intput for num_surf_points supercedes the density call.
do_center (bool (default = False)) – THIS IS CRUCIAL Whether to initially align molecule centers together. For global optimizations, set to True. For scoring of current alignment or local alignment set to False.
device (pytorch Device (default = -1)) – Device to use if you want to align with PyTorch downstream. Default places alignment computation on CPU.
- align_with_vol(no_H=True, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#
Align fit_molec to ref_molec using volumetric similarity.
Optimally aligned score found in
self.sim_aligned_voland the optimal SE(3) transformation is atself.transform_vol. Ifno_HisTrue, append ‘_noH’ to them.- Parameters:
no_H (bool) – Whether to not include hydrogens in volumetric similarity. Default is
True.num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment.
fit_molec’s center of mass (COM) is translated to eachref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. IfNone, thennum_repeatsrotations are done with aligned COMs.lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is
False.use_analytical (bool, optional) – Whether to use analytical gradients instead of PyTorch autograd. Ignored if
use_jax=True. Default isTrue.verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is
False.
- Returns:
aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).
- Return type:
np.ndarray
- align_with_vol_esp(lam, no_H=True, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#
Align fit_molec to ref_molec using volume similarity weighted by partial charge Toggle
no_Hparameter for scoring with or without hydrogens.Typically
lam=0.1is used. Optimally aligned score found inself.sim_aligned_vol_espand the optimal SE(3) transformation is atself.transform_vol_esp. Ifno_HisTrue, append ‘_noH’ to them.- Parameters:
lam (float) – Partial charge weighting parameter.
no_H (bool) – Whether to not include hydrogens in volumetric similarity. Default is
True.num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment.
fit_molec’s center of mass (COM) is translated to eachref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. IfNone, thennum_repeatsrotations are done with aligned COMs. Default isFalse.lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is
False.verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is
False.use_analytical (bool)
- Returns:
aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).
- Return type:
np.ndarray
- align_with_surf(alpha, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#
Align fit_molec to ref_molec using surface similarity.
Optimally aligned score found in
self.sim_aligned_surfand the optimal SE(3) transformation is atself.transform_surf.- Parameters:
alpha (float) – Gaussian width parameter for overlap.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment.
fit_molec’s center of mass (COM) is translated to eachref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. IfNone, thennum_repeatsrotations are done with aligned COMs. Default isFalse.lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is
False.use_analytical (bool, optional) – Whether to use analytical gradients instead of PyTorch autograd. Ignored if
use_jax=True. Default isTrue.verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is
False.
- Returns:
aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).
- Return type:
np.ndarray
- align_with_esp(alpha, lam=0.3, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#
Align fit_molec to ref_molec using ESP+surface similarity.
lamis scaled by(1e4/(4*55.263*np.pi))**2for correct units.Typically,
lam=0.3is used and is scaled internally.Optimally aligned score found in
self.sim_aligned_espand the optimal SE(3) transformation is atself.transform_esp.- Parameters:
alpha (float) – Gaussian width parameter for overlap.
lam (float, optional) – Weighting factor for ESP scoring. Scaled internally. Default is 0.3.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment.
fit_molec’s COM is translated to eachref_molecs’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default isFalse.lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is
False.verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is
False.use_analytical (bool)
- Returns:
aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).
- Return type:
np.ndarray
- align_with_esp_combo(alpha, lam=0.001, probe_radius=1.0, esp_weight=0.5, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, verbose=False)[source]#
Align using ShaEP similarity score. If alpha is 0.81, then it automatically uses volumetric shape similarity. Otherwise, it uses surface shape similarity.
Optimally aligned score found in
self.sim_aligned_esp_comboand the optimal SE(3) transformation is atself.transform_esp_combo.- Parameters:
alpha (float) – Gaussian width parameter for overlap.
lam (float, optional) – ESP weighting parameter. Default is 0.001.
probe_radius (float, optional) – Surface points found within vdW radii + probe radius will be masked out. Surface generation uses a probe radius of 1.2 by default (radius of hydrogen) so we use a slightly lower radius for be more tolerant. Default is 1.0.
esp_weight (float, optional) – How much to weight shape vs esp_combo similarity ([0,1]). Default is 0.5.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment.
fit_molec’s COM is translated to eachref_molecs’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. IfNone, thennum_repeatsrotations are done with aligned COM’s. Default isFalse.lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is
False.verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is
False.
- Returns:
aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).
- Return type:
np.ndarray (N, 3)
- align_with_pharm(similarity='tanimoto', extended_points=False, only_extended=False, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, verbose=False, use_vectorized=True, use_analytical=True)[source]#
Align fit_molec to ref_molec using pharmacophore similarity.
Optimally aligned score found in
self.sim_aligned_pharmand the optimal SE(3) transformation is atself.transform_pharm.- Parameters:
similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) – Specifies what similarity function to use. Options are: ‘tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.
extended_points (bool, optional) – Whether to score HBA/HBD with gaussian overlaps of extended points. Default is
False.only_extended (bool, optional) – When
extended_pointsisTrue, decide whether to only score the extended points (ignore anchor overlaps). Default isFalse.num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment.
fit_molec’s COM is translated to eachref_molecs’s pharmacophore, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. IfNone, thennum_repeatsrotations are done with aligned COM’s. Default isFalse.lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is
False.verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is
False.use_vectorized (bool, optional) – Whether to use the vectorized version of the pharmacophore scoring function. This is only relevant if
use_jax=True. Default isTrue.use_analytical (bool, optional) – Whether to use the analytical version of the pharmacophore scoring function. Currently only implemented for PyTorch. Default is
True.
- Returns:
- aligned_fit_anchorsnp.ndarray
Aligned coordinates of pharmacophores positions. Shape: (P, 3).
- aligned_fit_vectorsnp.ndarray
Aligned coordinates of pharmacophore vectors. Shape: (P, 3).
- Return type:
- score_with_surf(alpha, use='np')[source]#
Score fit_molec to ref_molec using surface similarity given current alignment. By default it uses the numpy implementation.
- Parameters:
- Returns:
score – Similarity score. Shape: (1,).
- Return type:
np.ndarray
- score_with_esp(alpha, lam=0.3, use='np')[source]#
Score fit_molec to ref_molec using ESP+surface similarity given current alignment.
lamis scaled by(1e4/(4*55.263*np.pi))**2for correct units.Typically
lam = 0.3is used and is scaled internally. By default it uses the numpy implementation.- Parameters:
alpha (float) – Gaussian width parameter for overlap.
lam (float, optional) – Weighting factor for ESP scoring. Default is 0.3.
use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.
- Returns:
score – Similarity score. Shape: (1,).
- Return type:
np.ndarray
- score_with_pharm(similarity='tanimoto', extended_points=False, only_extended=False, use='np')[source]#
Score fit_molec to ref_molec using pharmacophore similarity given current alignment. By default it uses the numpy implementation.
- Parameters:
similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) – Specifies what similarity function to use. Options are: ‘tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.
extended_points (bool, optional) – Whether to score HBA/HBD with gaussian overlaps of extended points. Default is
False.only_extended (bool, optional) – When
extended_pointsisTrue, decide whether to only score the extended points (ignore anchor overlaps). Default isFalse.use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.
- Returns:
score – Similarity score. Shape: (1,).
- Return type:
np.ndarray
- get_transformed_mol_and_feats(se3_transform)[source]#
Get an RDKit mol object and applicable features with a transformation applied.
- Parameters:
se3_transform (np.ndarray) – SE(3) transformation matrix. Shape: (4,4).
- Returns:
- transformed_molrdkit.Chem.Mol
Molecule with transformed coordinates.
- transformed_surf_posnp.ndarray
Transformed surface points. Shape: (N, 3).
- transformed_pharm_ancsnp.ndarray
Transformed pharmacophore anchor positions. Shape: (P, 3).
- transformed_pharm_vecsnp.ndarray
Transformed pharmacophore vector positions. Shape: (P, 3).
- Return type: