MoleculePair

MoleculePair#

MoleculePair holds a reference and a fit Molecule and provides methods for scoring and aligning them using shape, ESP, or pharmacophore interaction profiles.

Each alignment mode stores its result in an AlignmentResult (transform_<mode>, sim_aligned_<mode>).

class shepherd_score.container._core.AlignmentResult(score=None, transform=<factory>)[source]#

Bases: object

Result of a single alignment mode: the optimal similarity score and SE(3) transform.

Parameters:

score (ndarray | None)
transform (ndarray)

score#

Optimally aligned similarity score. None until an alignment is run.

Type:: np.ndarray or None

transform#

SE(3) transformation matrix, shape (4, 4). Defaults to the identity.

Type:: np.ndarray

class shepherd_score.container._core.MoleculePair(ref_mol, fit_mol, num_surf_points=None, density=None, do_center=False, device=-1)[source]#

Bases: object

Pair of Molecule objects to facilitate alignment.

Parameters:

ref_mol (rdkit.Chem.rdchem.Mol | Molecule)
fit_mol (rdkit.Chem.rdchem.Mol | Molecule)
num_surf_points (int | None)
density (float | None)
do_center (bool)

__init__(ref_mol, fit_mol, num_surf_points=None, density=None, do_center=False, device=-1)[source]#

A pair of molecules. A refence molecule and a fit molecule that can be aligned to the fit. There are a number of alignments that can be done:

Volumetric (with and without hydrogens)
Volumetric with partial charge weighting (with and without hydrogens)
Surface
Surface with electrostatic potential weighting
ShaEP scoring (esp-combo)
Pharmacophore (with various settings for using extended points rather than vectors)

Similarly, you can score with surface, Surf+ESP, and pharmacophore

Parameters:

ref_mol (Union[rdkit.Chem.rdchem.Mol, container.Molecule]) – Reference molecule. If a RDKit Mol object is provided, it will be converted to a Molecule object. If a Molecule object is given, it will NOT regenerate the surface.
fit_mol (Union[rdkit.Chem.rdchem.Mol, container.Molecule]) – Molecule to fit to the reference. If a RDKit Mol object is provided, it will be converted to a Molecule object. If a Molecule object is given, it will NOT regenerate the surface.
num_surf_points (Optional[int] (default = None)) – Number of surface points to sample if rdkit Mol objects are given. MUST provide a value for surface or ESP alignment.
density (Optional[float] (default = None)) – Density of points to sample if rdkit Mol objects are given. An integer intput for num_surf_points supercedes the density call.
do_center (bool (default = False)) – THIS IS CRUCIAL Whether to initially align molecule centers together. For global optimizations, set to True. For scoring of current alignment or local alignment set to False.
device (pytorch Device (default = -1)) – Device to use if you want to align with PyTorch downstream. Default places alignment computation on CPU.

property sim_aligned_esp#

property sim_aligned_esp_combo#

property sim_aligned_pharm#

property sim_aligned_surf#

property sim_aligned_vol#

property sim_aligned_vol_esp#

property sim_aligned_vol_esp_noH#

property sim_aligned_vol_noH#

property transform_esp#

property transform_esp_combo#

property transform_pharm#

property transform_surf#

property transform_vol#

property transform_vol_esp#

property transform_vol_esp_noH#

property transform_vol_noH#

align_with_vol(no_H=True, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using volumetric similarity.

Optimally aligned score found in self.sim_aligned_vol and the optimal SE(3) transformation is at self.transform_vol. If no_H is True, append ‘_noH’ to them.

Parameters:

no_H (bool) – Whether to not include hydrogens in volumetric similarity. Default is True.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s center of mass (COM) is translated to each ref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. If None, then num_repeats rotations are done with aligned COMs.
lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.
use_analytical (bool, optional) – Whether to use analytical gradients instead of PyTorch autograd. Ignored if use_jax=True. Default is True.
verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_vol_esp(lam, no_H=True, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using volume similarity weighted by partial charge Toggle no_H parameter for scoring with or without hydrogens.

Typically lam=0.1 is used. Optimally aligned score found in self.sim_aligned_vol_esp and the optimal SE(3) transformation is at self.transform_vol_esp. If no_H is True, append ‘_noH’ to them.

Parameters:

lam (float) – Partial charge weighting parameter.
no_H (bool) – Whether to not include hydrogens in volumetric similarity. Default is True.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s center of mass (COM) is translated to each ref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. If None, then num_repeats rotations are done with aligned COMs. Default is False.
lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.
verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.
use_analytical (bool)

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_surf(alpha, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using surface similarity.

Optimally aligned score found in self.sim_aligned_surf and the optimal SE(3) transformation is at self.transform_surf.

Parameters:

alpha (float) – Gaussian width parameter for overlap.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s center of mass (COM) is translated to each ref_molec’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COMs. If None, then num_repeats rotations are done with aligned COMs. Default is False.
lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.
use_analytical (bool, optional) – Whether to use analytical gradients instead of PyTorch autograd. Ignored if use_jax=True. Default is True.
verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_esp(alpha, lam=0.3, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, use_analytical=True, verbose=False)[source]#

Align fit_molec to ref_molec using ESP+surface similarity. lam is scaled by (1e4/(4*55.263*np.pi))**2 for correct units.

Typically, lam=0.3 is used and is scaled internally.

Optimally aligned score found in self.sim_aligned_esp and the optimal SE(3) transformation is at self.transform_esp.

Parameters:

alpha (float) – Gaussian width parameter for overlap.
lam (float, optional) – Weighting factor for ESP scoring. Scaled internally. Default is 0.3.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s COM is translated to each ref_molecs’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default is False.
lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.
verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.
use_analytical (bool)

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray

align_with_esp_combo(alpha, lam=0.001, probe_radius=1.0, esp_weight=0.5, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, verbose=False)[source]#

Align using ShaEP similarity score. If alpha is 0.81, then it automatically uses volumetric shape similarity. Otherwise, it uses surface shape similarity.

Optimally aligned score found in self.sim_aligned_esp_combo and the optimal SE(3) transformation is at self.transform_esp_combo.

Parameters:

alpha (float) – Gaussian width parameter for overlap.
lam (float, optional) – ESP weighting parameter. Default is 0.001.
probe_radius (float, optional) – Surface points found within vdW radii + probe radius will be masked out. Surface generation uses a probe radius of 1.2 by default (radius of hydrogen) so we use a slightly lower radius for be more tolerant. Default is 1.0.
esp_weight (float, optional) – How much to weight shape vs esp_combo similarity ([0,1]). Default is 0.5.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s COM is translated to each ref_molecs’s atoms, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default is False.
lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.
verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.

Returns:

aligned_fit_points – Coordinates of transformed atoms. Shape: (N, 3).

Return type:

np.ndarray (N, 3)

align_with_pharm(similarity='tanimoto', extended_points=False, only_extended=False, num_repeats=50, trans_init=False, lr=0.1, max_num_steps=200, use_jax=False, verbose=False, use_vectorized=True, use_analytical=True)[source]#

Align fit_molec to ref_molec using pharmacophore similarity.

Optimally aligned score found in self.sim_aligned_pharm and the optimal SE(3) transformation is at self.transform_pharm.

Parameters:

similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) – Specifies what similarity function to use. Options are: ‘tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.
extended_points (bool, optional) – Whether to score HBA/HBD with gaussian overlaps of extended points. Default is False.
only_extended (bool, optional) – When extended_points is True, decide whether to only score the extended points (ignore anchor overlaps). Default is False.
num_repeats (int, optional) – Number of different random initializations of SO(3) transformation parameters. Default is 50.
trans_init (bool, optional) – Apply translation initializiation for alignment. fit_molec’s COM is translated to each ref_molecs’s pharmacophore, with 10 rotations for each translation. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s. Default is False.
lr (float, optional) – Learning rate or step-size for optimization. Default is 0.1.
max_num_steps (int, optional) – Maximum number of steps to optimize over. Default is 200.
use_jax (bool, optional) – Whether to use Jax instead of PyTorch. Default is False.
verbose (bool, optional) – Print initial and final similarity scores with scores every 100 steps. Default is False.
use_vectorized (bool, optional) – Whether to use the vectorized version of the pharmacophore scoring function. This is only relevant if use_jax=True. Default is True.
use_analytical (bool, optional) – Whether to use the analytical version of the pharmacophore scoring function. Currently only implemented for PyTorch. Default is True.

Returns:

aligned_fit_anchorsnp.ndarray: Aligned coordinates of pharmacophore positions. Shape: (P, 3).
aligned_fit_vectorsnp.ndarray: Aligned coordinates of pharmacophore vectors. Shape: (P, 3).

Return type:

tuple

score_with_surf(alpha, use='np')[source]#

Score fit_molec to ref_molec using surface similarity given current alignment. By default it uses the numpy implementation.

Parameters:

alpha (float) – Gaussian width parameter for overlap.
use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.

Returns:

score – Similarity score. Shape: (1,).

Return type:

np.ndarray

score_with_esp(alpha, lam=0.3, use='np')[source]#

Score fit_molec to ref_molec using ESP+surface similarity given current alignment. lam is scaled by (1e4/(4*55.263*np.pi))**2 for correct units.

Typically lam = 0.3 is used and is scaled internally. By default it uses the numpy implementation.

Parameters:

alpha (float) – Gaussian width parameter for overlap.
lam (float, optional) – Weighting factor for ESP scoring. Default is 0.3.
use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.

Returns:

score – Similarity score. Shape: (1,).

Return type:

np.ndarray

score_with_pharm(similarity='tanimoto', extended_points=False, only_extended=False, use='np')[source]#

Score fit_molec to ref_molec using pharmacophore similarity given current alignment. By default it uses the numpy implementation.

Parameters:

similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) – Specifies what similarity function to use. Options are: ‘tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.
extended_points (bool, optional) – Whether to score HBA/HBD with gaussian overlaps of extended points. Default is False.
only_extended (bool, optional) – When extended_points is True, decide whether to only score the extended points (ignore anchor overlaps). Default is False.
use (str, optional) – Specifies what implementation to use. Options are: - ‘np’ or ‘numpy’ (numpy implementation) - ‘jax’ or ‘jnp’ (Jax implementation) - ‘torch’ or ‘pytorch’ (PyTorch implementation) Default is ‘np’.

Returns:

score – Similarity score. Shape: (1,).

Return type:

np.ndarray

get_transformed_mol_and_feats(se3_transform)[source]#

Get an RDKit mol object and applicable features with a transformation applied.

Parameters:

se3_transform (np.ndarray) – SE(3) transformation matrix. Shape: (4,4).

Returns:

transformed_molrdkit.Chem.Mol: Molecule with transformed coordinates.
transformed_surf_posnp.ndarray: Transformed surface points. Shape: (N, 3).
transformed_pharm_ancsnp.ndarray: Transformed pharmacophore anchor positions. Shape: (P, 3).
transformed_pharm_vecsnp.ndarray: Transformed pharmacophore vector positions. Shape: (P, 3).

Return type:

tuple

get_transformed_molecule(se3_transform)[source]#

Get Molecule object transformation applied to all applicable features for the fit molecule.

Parameters:: se3_transform (np.ndarray) – SE(3) transformation matrix. Shape: (4,4).
Returns:: Molecule with transformed features.
Return type:: Molecule

MoleculePair

Contents

MoleculePair#