Main Alignment Module#

Alignment algorithms using Torch-based scoring functions.

shepherd_score.alignment.objective_ROCS_overlay(se3_params, ref_points, fit_points, alpha, precomputed_U=None)[source]#

Objective function to optimize ROCS overlay. Supports batched and non-batched inputs. If the inputs are batched, the loss is the average across the batch.

Parameters:
  • se3_params (torch.Tensor (batch, 7) or (7,)) – Parameters for SE(3) transformation. The first 4 values in the last dimension are quaternions of form (r,i,j,k) and the last 3 values of the last dimension are the translations in (x,y,z).

  • ref_points (torch.Tensor (batch, N, 3) or (N,3)) – Reference points. If you want to optimize to the same ref_points, with a batch of different se3_params, try use torch.Tensor.repeat((batch, 1, 1)).

  • fit_points (torch.Tensor (batch, M, 3) or (M,3)) – Set of points to apply SE(3) transformations to maximize shape similarity with ref_points. If you want to optimize to the same fit_points, with a batch of different se3_params, try use torch.Tensor.repeat((batch, 1, 1)).

  • alpha (float) – Gaussian width parameter used in scoring function.

  • precomputed_U (torch.Tensor | None)

Returns:

loss – 1 - average(Tanimoto score).

Return type:

torch.Tensor (1,)

shepherd_score.alignment.score_ROCS_overlay_with_avoid(ref_points, fit_points, alpha, fit_points_for_avoid, avoid_points, avoid_min_dist, avoid_weight, precomputed_U=None)[source]#

See objective_ROCS_overlay_with_avoid for parameter descriptions.

Parameters:
  • ref_points (torch.Tensor)

  • fit_points (torch.Tensor)

  • alpha (float)

  • fit_points_for_avoid (torch.Tensor)

  • avoid_points (torch.Tensor)

  • avoid_min_dist (float)

  • avoid_weight (float)

  • precomputed_U (torch.Tensor | None)

Return type:

torch.Tensor

shepherd_score.alignment.objective_ROCS_overlay_with_avoid(se3_params, ref_points, fit_points, alpha, fit_points_for_avoid, avoid_points, avoid_min_dist, avoid_weight, precomputed_U=None)[source]#

Objective function to optimize ROCS overlay. Supports batched and non-batched inputs. If the inputs are batched, the loss is the average across the batch.

Parameters:
  • se3_params (torch.Tensor (batch, 7) or (7,)) – Parameters for SE(3) transformation. The first 4 values in the last dimension are quaternions of form (r,i,j,k) and the last 3 values of the last dimension are the translations in (x,y,z).

  • ref_points (torch.Tensor (batch, N, 3) or (N,3)) – Reference points. If you want to optimize to the same ref_points, with a batch of different se3_params, try use torch.Tensor.repeat((batch, 1, 1)).

  • fit_points (torch.Tensor (batch, M, 3) or (M,3)) – Set of points to apply SE(3) transformations to maximize shape similarity with ref_points. If you want to optimize to the same fit_points, with a batch of different se3_params, try use torch.Tensor.repeat((batch, 1, 1)).

  • alpha (float) – Gaussian width parameter used in scoring function.

  • fit_points_for_avoid (torch.Tensor (M,3)) – Set of points to apply SE(3) transformations to then compare to avoid_points

  • avoid_points (torch.Tensor (K,3) (default=None)) – If not None, these are points that are used in an additional term in the objective function to penalize overlap with these points.

  • avoid_min_dist (float (default=2.0)) – Minimum distance with no penalization between fit_points_for_avoid and avoid_points.

  • avoid_weight (float (default=1.0)) – Weight for the avoid_points term in the scoring function.

  • precomputed_U (torch.Tensor | None)

Returns:

loss

1 - (average(Tanimoto score fit_points to ref_points)
  • avoid_weight * average(hard sphere overlap of fit_points_for_avoid to avoid_points)).

Return type:

torch.Tensor (1,)

shepherd_score.alignment.objective_ROCS_esp_overlay(se3_params, ref_points, fit_points, ref_charges, fit_charges, alpha, lam, precomputed_U=None)[source]#

Objective function to optimize ROCS overlay. Supports batched and non-batched inputs. If the inputs are batched, the loss is the average across the batch.

Parameters:
  • se3_params (torch.Tensor (batch, 7) or (7,)) – Parameters for SE(3) transformation. The first 4 values in the last dimension are quaternions of form (r,i,j,k) and the last 3 values of the last dimension are the translations in (x,y,z).

  • ref_points (torch.Tensor (batch, N, 3) or (N,3)) – Reference points.

  • fit_points (torch.Tensor (batch, M, 3) or (M,3)) – Set of points to apply SE(3) transformations to maximize shape similarity with ref_points.

  • ref_charges (torch.Tensor (batch, N) or (N,)) – Electric potential at the corresponding ref_points coordinates.

  • fit_charges (torch.Tensor (batch, M) or (M,)) – Electric potential at the corresponding fit_points coordinates

  • alpha (float) – Gaussian width parameter used in scoring function.

  • lam (float) – Scaling term for charges used in the exponential kernel of the ESP scoring function.

  • precomputed_U (torch.Tensor | None)

Returns:

loss – 1 - mean(ESP Tanimoto score).

Return type:

torch.Tensor (1,)

shepherd_score.alignment.objective_esp_combo_score_overlay(se3_params, ref_centers_w_H, fit_centers_w_H, ref_centers, fit_centers, ref_points, fit_points, ref_partial_charges, fit_partial_charges, ref_surf_esp, fit_surf_esp, ref_radii, fit_radii, alpha, lam, probe_radius, esp_weight)[source]#

Objective for ESP combo score. Handles broadcasting for ref_* inputs. fit_* inputs are expected to be repeated if se3_params is batched.

Parameters:
  • se3_params (torch.Tensor)

  • ref_centers_w_H (torch.Tensor)

  • fit_centers_w_H (torch.Tensor)

  • ref_centers (torch.Tensor)

  • fit_centers (torch.Tensor)

  • ref_points (torch.Tensor)

  • fit_points (torch.Tensor)

  • ref_partial_charges (torch.Tensor)

  • fit_partial_charges (torch.Tensor)

  • ref_surf_esp (torch.Tensor)

  • fit_surf_esp (torch.Tensor)

  • ref_radii (torch.Tensor)

  • fit_radii (torch.Tensor)

  • alpha (float)

  • lam (float)

  • probe_radius (float)

  • esp_weight (float)

Return type:

torch.Tensor

shepherd_score.alignment.objective_pharm_overlay(se3_params, ref_pharms, fit_pharms, ref_anchors, fit_anchors, ref_vectors, fit_vectors, similarity='tanimoto', extended_points=False, only_extended=False, precomputed_self_overlaps=None)[source]#

Objective function to optimize ROCS overlay. Supports batched and non-batched inputs. If the inputs are batched, the loss is the average across the batch.

Parameters:
  • se3_params (torch.Tensor (batch, 7) or (7,)) – Parameters for SE(3) transformation. The first 4 values in the last dimension are quaternions of form (r,i,j,k) and the last 3 values of the last dimension are the translations in (x,y,z).

  • ref_anchors (torch.Tensor (batch, N, 3) or (N,3)) – Reference anchors. If you want to optimize to the same ref_anchors, with a batch of different se3_params, try use torch.Tensor.repeat((batch, 1, 1)).

  • fit_anchors (torch.Tensor (batch, M, 3) or (M,3)) – Set of anchors to apply SE(3) transformations to maximize shape similarity with ref_anchors. If you want to optimize to the same fit_anchors, with a batch of different se3_params, try use torch.Tensor.repeat((batch, 1, 1)).

  • ref_charges (torch.Tensor (batch, N) or (N,)) – Electric potential at the corresponding ref_anchors coordinates.

  • fit_charges (torch.Tensor (batch, N) or (N,)) – Electric potential at the corresponding fit_anchors coordinates

  • alpha (float) – Gaussian width parameter used in scoring function.

  • lam (float) – Scaling term for charges used in the exponential kernel of the ESP scoring function.

  • ref_pharms (torch.Tensor)

  • fit_pharms (torch.Tensor)

  • ref_vectors (torch.Tensor)

  • fit_vectors (torch.Tensor)

  • similarity (Literal['tanimoto', 'tversky', 'tversky_ref', 'tversky_fit'])

  • extended_points (bool)

  • only_extended (bool)

  • precomputed_self_overlaps (Tuple[torch.Tensor, torch.Tensor] | None)

Returns:

loss – 1 - mean(ESP Tanimoto score).

Return type:

torch.Tensor (1,)

shepherd_score.alignment.crippen_align(ref_rdmol, fit_rdmol)[source]#

Align fit_rdmol with respect to ref_rdmol with rdkit’s Crippen Alignment algorithm.

Parameters:
Returns:

aligned_fit_rdmol – Fit molecule with new aligned coordinates.

Return type:

rdkit.Chem.rdchem.Mol

shepherd_score.alignment.optimize_ROCS_overlay(ref_points, fit_points, alpha, *, fit_points_for_avoid=None, avoid_points=None, avoid_min_dist=2.0, avoid_weight=1.0, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize alignment of fit_points with respect to ref_points using SE(3) transformations and maximizing gaussian overlap score.

If num_repeats is 1, the initial guess for alignment is an identity rotation and aligned COMs. If num_repeats is 5 or greater, four initial guesses are aligned using principal components.

Parameters:
  • ref_points (torch.Tensor (N,3)) – Reference points.

  • fit_points (torch.Tensor (M,3)) – Set of points to apply SE(3) transformations to maximize shape similarity with ref_points.

  • alpha (float) – Gaussian width parameter used in scoring function.

  • fit_points_for_avoid (torch.Tensor (M,3)) – Set of points to apply SE(3) transformations to then compare to avoid_points

  • avoid_points (torch.Tensor (K,3) (default=None)) – If not None, these are points that are used in an additional term in the objective function to penalize overlap with these points.

  • avoid_min_dist (float (default=2.0)) – Minimum distance with no penalization between fit_points_for_avoid and avoid_points.

  • avoid_weight (float (default=1.0)) – Weight for the avoid_points term in the scoring function.

  • num_repeats (int (default=50)) – Number of different random initializations of SE(3) transformation parameters.

  • trans_centers (torch.Tensor (P, 3) (default=None)) – Locations to translate fit_points’ center of mass as an initial guesses for optimization. At each translation center, 10 rotations are also sampled. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s.

  • lr (float (default=0.1)) – Learning rate or step-size for optimization

  • max_num_steps (int (default=200)10) – Maximum number of steps to optimize over.

  • verbose (bool (False)) – Print initial and final similarity scores with scores every 100 steps.

Returns:

aligned_pointstorch.Tensor (M,3)

The transformed point cloud for fit_points using the optimized SE(3) transformation for alignment with ref_points.

SE3_transformtorch.Tensor (4,4)

Optimized SE(3) transformation matrix used to obtain aligned_points from fit_points.

scoretorch.Tensor (1,)

Tanimoto shape similarity score for the optimal transformation.

Return type:

tuple

shepherd_score.alignment.optimize_ROCS_overlay_analytical(ref_points, fit_points, alpha, *, fit_points_for_avoid=None, avoid_points=None, avoid_min_dist=2.0, avoid_weight=1.0, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize shape alignment using analytical gradients instead of autograd.

Same interface and behavior as optimize_ROCS_overlay, but uses hand-derived analytical gradients with a manual Adam optimizer, eliminating PyTorch autograd overhead.

Parameters:
  • ref_points (torch.Tensor (N,3))

  • fit_points (torch.Tensor (M,3))

  • alpha (float)

  • fit_points_for_avoid (torch.Tensor (M2,3) or None) – Points to penalize for overlap with avoid_points. Defaults to fit_points if None.

  • avoid_points (torch.Tensor (K,3) or None) – Fixed points to avoid overlapping with.

  • avoid_min_dist (float) – Distance threshold for avoid penalty.

  • avoid_weight (float) – Weight of the avoid penalty term.

  • num_repeats (int)

  • trans_centers (torch.Tensor or None)

  • lr (float)

  • max_num_steps (int)

  • verbose (bool)

Return type:

tuple of (aligned_points, SE3_transform, score)

shepherd_score.alignment.optimize_ROCS_esp_overlay(ref_points, fit_points, ref_charges, fit_charges, alpha, lam, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize alignment of fit_points with respect to ref_points using SE(3) transformations and maximizing electrostatic-weighted gaussian overlap score.

Parameters:
  • ref_points (torch.Tensor (N,3)) – Reference points.

  • fit_points (torch.Tensor (M,3)) – Set of points to apply SE(3) transformations to maximize shape similarity with ref_points.

  • ref_charges (torch.Tensor (batch, N) or (N,)) – Electric potential at the corresponding ref_points coordinates.

  • fit_charges (torch.Tensor (batch, N) or (N,)) – Electric potential at the corresponding fit_points coordinates

  • alpha (float) – Gaussian width parameter used in scoring function.

  • lam (float) – Scaling term for charges used in the exponential kernel of the ESP scoring function.

  • num_repeats (int (default=50)) – Number of different random initializations of SE(3) transformation parameters.

  • trans_centers (torch.Tensor (P, 3) (default=None)) – Locations to translate fit_points’ center of mass as an initial guesses for optimization. At each translation center, 10 rotations are also sampled. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s.

  • lr (float (default=0.1)) – Learning rate or step-size for optimization

  • max_num_steps (int (default=200)) – Maximum number of steps to optimize over.

  • verbose (bool (False)) – Print initial and final similarity scores with scores every 100 steps.

Returns:

aligned_pointstorch.Tensor (M,3)

The transformed point cloud for fit_points using the optimized SE(3) transformation for alignment with ref_points.

SE3_transformtorch.Tensor (4,4)

Optimized SE(3) transformation matrix used to obtain aligned_points from fit_points.

scoretorch.Tensor (1,)

Tanimoto shape similarity score for the optimal transformation.

Return type:

tuple

shepherd_score.alignment.optimize_ROCS_esp_overlay_analytical(ref_points, fit_points, ref_charges, fit_charges, alpha, lam, *, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize ESP alignment using analytical gradients instead of autograd.

Same interface and behavior as optimize_ROCS_esp_overlay, but uses hand-derived analytical gradients with a manual Adam optimizer.

Parameters:
  • ref_points (torch.Tensor (N,3))

  • fit_points (torch.Tensor (M,3))

  • ref_charges (torch.Tensor (N,))

  • fit_charges (torch.Tensor (M,))

  • alpha (float)

  • lam (float) – Pre-scaled lam (e.g. LAM_SCALING * lam_user).

  • num_repeats (int)

  • trans_centers (torch.Tensor or None)

  • lr (float)

  • max_num_steps (int)

  • verbose (bool)

Return type:

tuple of (aligned_points, SE3_transform, score)

shepherd_score.alignment.optimize_esp_combo_score_overlay(ref_centers_w_H, fit_centers_w_H, ref_centers, fit_centers, ref_points, fit_points, ref_partial_charges, fit_partial_charges, ref_surf_esp, fit_surf_esp, ref_radii, fit_radii, alpha, lam, probe_radius=1.0, esp_weight=0.5, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize alignment using ESP combo score.

Parameters:
  • ref_centers_w_H (torch.Tensor)

  • fit_centers_w_H (torch.Tensor)

  • ref_centers (torch.Tensor)

  • fit_centers (torch.Tensor)

  • ref_points (torch.Tensor)

  • fit_points (torch.Tensor)

  • ref_partial_charges (torch.Tensor)

  • fit_partial_charges (torch.Tensor)

  • ref_surf_esp (torch.Tensor)

  • fit_surf_esp (torch.Tensor)

  • ref_radii (torch.Tensor)

  • fit_radii (torch.Tensor)

  • alpha (float)

  • lam (float)

  • probe_radius (float)

  • esp_weight (float)

  • num_repeats (int)

  • trans_centers (torch.Tensor | None)

  • lr (float)

  • max_num_steps (int)

  • verbose (bool)

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

shepherd_score.alignment.optimize_pharm_overlay(ref_pharms, fit_pharms, ref_anchors, fit_anchors, ref_vectors, fit_vectors, similarity='tanimoto', extended_points=False, only_extended=False, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize alignment of fit_anchors with respect to ref_anchors using SE(3) transformations and maximizing electrostatic-weighted gaussian overlap score.

Parameters:
  • ref_pharms (torch.Tensor (N,) Indices reflecting pharmacophore type of reference molecule)

  • fit_pharms (torch.Tensor (N,) Indices reflecting pharmacophore type of fit molecule)

  • ref_anchors (torch.Tensor (N,3) Reference pharmacophore positions (anchors).)

  • fit_anchors (torch.Tensor (M,3) Set of anchors to align pharmacophores to ref.)

  • ref_vectors (torch.Tensor (batch, N, 3) or (N,3) Relative unit vectors to the anchor anchors.)

  • fit_vectors (torch.Tensor (batch, N, 3) or (N,3) Relative unit vectors to the anchor anchors.)

  • similarity (str from ('tanimoto', 'tversky', 'tversky_ref', 'tversky_fit')) –

    Specifies what similarity function to use.

    ’tanimoto’ – symmetric scoring function ‘tversky’ – asymmetric -> Uses OpenEye’s formulation 95% normalization by molec 1 ‘tversky_ref’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 1. ‘tversky_fit’ – asymmetric -> Uses Pharao’s formulation 100% normalization by molec 2.

  • extended_points (bool of whether to score HBA/HBD with gaussian overlaps of extended points.)

  • only_extended (bool for when extended_points is True, decide whether to only score the) – extended points (ignore anchor overlaps)

  • num_repeats (int (default=50)) – Number of different random initializations of SE(3) transformation parameters.

  • trans_centers (torch.Tensor (P, 3) (default=None)) – Locations to translate fit_points’ center of mass as an initial guesses for optimization. At each translation center, 10 rotations are also sampled. So the number of initializations scales as (# translation centers * 10 + 5) where 5 is from the identity and 4 PCA with aligned COM’s. If None, then num_repeats rotations are done with aligned COM’s.

  • lr (float (default=0.1) Learning rate or step-size for optimization)

  • max_num_steps (int (default=200) Maximum number of steps to optimize over.)

  • verbose (bool (False) Print initial and final similarity scores with scores every 100 steps.)

Returns:

aligned_pointstorch.Tensor (M,3)

The transformed point cloud for fit_points using the optimized SE(3) transformation for alignment with ref_points.

aligned_vectorstorch.Tensor (M,3)

The transformed vectors for fit_vectors using the optimized SO(3) transformation for aligment with ref_points.

SE3_transformtorch.Tensor (4,4)

Optimized SE(3) transformation matrix used to obtain aligned_points from fit_points.

scoretorch.Tensor (1,)

Tanimoto shape similarity score for the optimal transformation.

Return type:

tuple

shepherd_score.alignment.optimize_pharm_overlay_analytical(ref_pharms, fit_pharms, ref_anchors, fit_anchors, ref_vectors, fit_vectors, similarity='tanimoto', extended_points=False, only_extended=False, num_repeats=50, trans_centers=None, lr=0.1, max_num_steps=200, verbose=False)[source]#

Optimize pharmacophore alignment using analytical gradients instead of autograd.

Same interface and behavior as optimize_pharm_overlay, but uses hand-derived analytical gradients with PyTorch’s Adam optimizer, eliminating PyTorch autograd overhead.

Supports similarity='tanimoto', 'tversky', 'tversky_ref', and 'tversky_fit', and extended_points=True.

Parameters:
  • ref_pharms (torch.Tensor (N,))

  • fit_pharms (torch.Tensor (M,))

  • ref_anchors (torch.Tensor (N,3))

  • fit_anchors (torch.Tensor (M,3))

  • ref_vectors (torch.Tensor (N,3))

  • fit_vectors (torch.Tensor (M,3))

  • similarity (str)

  • extended_points (bool)

  • only_extended (bool)

  • num_repeats (int)

  • trans_centers (torch.Tensor or None)

  • lr (float)

  • max_num_steps (int)

  • verbose (bool)

Return type:

tuple of (aligned_anchors, aligned_vectors, SE3_transform, score)