Evaluation

Evaluation#

Classes and pipelines for evaluating generated 3D conformers.

All eval classes and pipelines accept timeout_minutes to cap per-molecule xTB wall time. ConditionalEval and ConditionalEvalPipeline accept priority_pharm_indices for subset pharmacophore Tversky scoring.

Evaluation Classes#

Evaluation pipeline classes for generated molecules.

class shepherd_score.evaluations.evaluate.evals.ConfEval(atoms, positions, solvent=None, num_processes=1, timeout_minutes=None)[source]#

Bases: object

Generated conformer evaluation pipeline

Parameters:

atoms (ndarray)
positions (ndarray)
solvent (str | None)
num_processes (int)
timeout_minutes (float | None)

__init__(atoms, positions, solvent=None, num_processes=1, timeout_minutes=None)[source]#

Base class for evaluation of a single generated conformer.

Checks validity with RDKit pipeline and xTB single point calculation and optimization. Calculates 2D graph properties for valid molecules.

Automatically aligns relaxed structure to the original structure via rdkit RMS.

Parameters:

atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.
positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)
solvent (str solvent type for xtb relaxation)
num_processes (int (default = 1) number of processors to use for xtb relaxation and RDKit) – RMSD alignment.
timeout_minutes (float, optional) – Per-xtb-call timeout in minutes. If exceeded, the xtb subprocess is killed and the corresponding step (single point or relaxation) is treated as failed. Default is None (no timeout).

to_pandas()[source]#

Convert the stored attributes to a pd.Series (for global attributes).

Parameters:: self
Returns:: pd.Series
Return type:: holds attributes in an easy to visualize way

class shepherd_score.evaluations.evaluate.evals.ConsistencyEval(atoms, positions, surf_points=None, surf_esp=None, pharm_feats=None, pharm_multi_vector=None, solvent=None, probe_radius=1.2, num_processes=1, timeout_minutes=None)[source]#

Bases: ConfEval

Evaluation of the consistency between jointly generated molecules’ features. Consistency in terms of similarity scores.

Parameters:

atoms (ndarray)
positions (ndarray)
surf_points (ndarray | None)
surf_esp (ndarray | None)
pharm_feats (Tuple[ndarray, ndarray, ndarray] | None)
pharm_multi_vector (bool | None)
solvent (str | None)
probe_radius (float)
num_processes (int)
timeout_minutes (float | None)

__init__(atoms, positions, surf_points=None, surf_esp=None, pharm_feats=None, pharm_multi_vector=None, solvent=None, probe_radius=1.2, num_processes=1, timeout_minutes=None)[source]#

Consistency evaluation class for jointly generated molecule and features.

Uses 3D similarity scoring functions. Inherits from ConfEval so that it can first run a conformer evaluation on the generated molecule.

Must supply atoms and positions AND at least one of the features necessary for similarity scoring.

Notes

Important assumptions:

Gaussian width parameter (alpha) for surface similarity was fitted to a probe radius of 1.2 A.
ESP weighting parameter (lam) for electrostatic similarity is set to 0.3 which was tested for the above assumption.

Parameters:

atoms (np.ndarray) – Array of shape (N,) of atomic numbers of the generated molecule or (N, M) one-hot encoding.
positions (np.ndarray) – Array of shape (N, 3) of coordinates for the generated molecule’s atoms.
surf_points (np.ndarray, optional) – Array of shape (M, 3) of generated surface point cloud.
surf_esp (np.ndarray, optional) – Array of shape (M,) of generated electrostatic potential on surface.
pharm_feats (tuple, optional) – Tuple of (pharm_types, pharm_ancs, pharm_vecs) where pharm_types is (P,) type of pharmacophore defined by shepherd_score.score.constants.P_TYPES, pharm_ancs is (P, 3) anchor positions, and pharm_vecs is (P, 3) unit vectors relative to anchor.
pharm_multi_vector (bool, optional) – Use multiple vectors to represent Aro/HBA/HBD or single.
solvent (str, optional) – Solvent type for xTB relaxation.
probe_radius (float, optional) – Radius of probe atom used to generate solvent accessible surface. Default is 1.2 (vdW radius of hydrogen).
num_processes (int, optional) – Number of processors to use for xTB relaxation. Default is 1.
timeout_minutes (float, optional) – Per-xtb-call timeout in minutes. If exceeded, the xtb subprocess is killed and the corresponding step (single point or relaxation) is treated as failed. Default is None (no timeout).

class shepherd_score.evaluations.evaluate.evals.ConditionalEval(ref_molec, atoms, positions, condition, num_surf_points=400, pharm_multi_vector=None, priority_pharm_indices=None, solvent=None, num_processes=1, timeout_minutes=None)[source]#

Bases: ConfEval

Evaluation of conditionally generated molecules’ quality and similarity.

Parameters:

ref_molec (Molecule)
atoms (ndarray)
positions (ndarray)
condition (str)
num_surf_points (int)
pharm_multi_vector (bool | None)
priority_pharm_indices (list | None)
solvent (str | None)
num_processes (int)
timeout_minutes (float | None)

__init__(ref_molec, atoms, positions, condition, num_surf_points=400, pharm_multi_vector=None, priority_pharm_indices=None, solvent=None, num_processes=1, timeout_minutes=None)[source]#

Evaluation pipeline for conditionally-generated molecules.

Inherits from ConfEval so that it can first run a conformer evaluation on the generated molecule.

Notes

Important assumptions:

Gaussian width parameter (alpha) for surface similarity assumes a probe radius of 1.2A.
ESP weighting parameter (lam) for electrostatic similarity is set to 0.3 which was tested for the above assumption.

Parameters:

ref_molec (Molecule) – Molecule object of reference/target molecule. Must contain the representation that was used for conditioning.
atoms (np.ndarray) – Array of shape (N,) of atomic numbers of the generated molecule or (N, M) one-hot encoding.
positions (np.ndarray) – Array of shape (N, 3) of coordinates for the generated molecule’s atoms.
condition (str) – Condition that the molecule was conditioned on. One of ‘surface’, ‘esp’, ‘pharm’, or ‘all’. Used for alignment. Choose ‘esp’ or ‘all’ if you want to compute ESP-aligned scores for other profiles.
num_surf_points (int, optional) – Number of surface points to sample for similarity scoring. Default is 400.
pharm_multi_vector (bool, optional) – Use multiple vectors to represent Aro/HBA/HBD or single.
priority_pharm_indices (list of int, optional) – Indices (into ref_molec pharmacophore arrays) of “priority” pharmacophores. When provided, two additional Tversky ('tversky_ref') scores are computed after the full-set pharm alignment: one for the priority subset and one for the non-priority complement subset, each scored against the full pharmacophore set of the aligned generated molecule. Requires condition to be 'pharm' or 'all' and pharm_multi_vector to be a bool. Must satisfy 0 < len(priority_pharm_indices) < N_pharm.
solvent (str, optional) – Solvent type for xTB relaxation.
num_processes (int, optional) – Number of processors to use for xTB relaxation. Default is 1.
timeout_minutes (float, optional) – Per-xtb-call timeout in minutes. If exceeded, the xtb subprocess is killed and the corresponding step (single point or relaxation) is treated as failed. Default is None (no timeout).

Evaluation Pipelines#

Evaluation pipeline classes for generated molecules.

class shepherd_score.evaluations.evaluate.pipelines.UnconditionalEvalPipeline(generated_mols, solvent=None)[source]#

Bases: object

Unconditional evaluation pipeline

Parameters:

generated_mols (List[Tuple[ndarray, ndarray]])
solvent (str | None)

__init__(generated_mols, solvent=None)[source]#

Evaluation pipeline for a list of unconditionally generated molecules.

Parameters:

generated_mols (List[Tuple[np.ndarray, np.ndarray]]) – List containing tuple of np.ndarrays holding atomic numbers (N,) and corresponding positions (N, 3).
solvent (str, optional) – Implicit solvent model to use for xtb relaxation.

evaluate(num_processes=1, num_workers=1, verbose=False, timeout_minutes=None, *, mp_context='spawn')[source]#

Run the evaluation pipeline.

Parameters:

num_processes (int, optional) – Number of processors to use for xtb relaxation. Default is 1.
num_workers (int, optional) – Number of parallel worker processes. Constraint: num_workers*num_processes <= available CPUs. Only recommended if generated_mols is > 100 due to start-up overhead of new processes. If num_workers > 1, multiprocessing is used, and not much is gained by setting num_processes > 1 in this case. Default is 1.
verbose (bool, optional) – Whether to print tqdm progress bar. Default is False.
timeout_minutes (float, optional) – Per-molecule timeout in minutes. If exceeded, the evaluation is terminated and a failed result is recorded. Useful for skipping molecules where xtb relaxation is unlikely to converge. Default is None (no timeout).
mp_context ({'spawn', 'forkserver'}, optional) – Context for multiprocessing. 'spawn' is recommended for most cases. Default is 'spawn'.

Returns:

Updates the class attributes in place.

Return type:

None

get_attr(obj, attr)[source]#

Gets an attribute of obj via the string name. If it is None, then return np.nan

Parameters:: attr (str)

get_frac_valid()[source]#: Fraction of generated molecules that were valid.

get_frac_valid_post_opt()[source]#: Fraction of generated molecules that were valid after relaxation.

get_frac_consistent_graph()[source]#: Fraction of generated molecules that were consistent before and after relaxation.

get_frac_unique()[source]#: Fraction of unique smiles extracted pre-optimization in the generated set.

get_frac_unique_post_opt()[source]#: Fraction of unique smiles extracted post-optimization in the generated set.

get_diversity(post_opt=False)[source]#

Get average molecular graph diversity and similarity matrix.

Computes average molecular graph diversity (average dissimilarity) as defined by GenBench3D (arXiv:2407.04424) and the Tanimoto similarity matrix of fingerprints.

Parameters:

post_opt (bool, optional) – Whether to use post-optimization fingerprints. Default is False.

Returns:

avg_diversity (float or None) – Average diversity in range [0, 1] where 1 is more diverse (more dissimilar). Returns None if no valid molecules.
similarity_matrix (np.ndarray or None) – Similarity matrix of shape (N, N). Returns None if no valid molecules.

Return type:

Tuple[float, ndarray]

to_pandas()[source]#

Convert the stored attributes to a pd.Series (for global attributes) and pd.DataFrame (for attributes relevant to every instance).

Parameters:: self
Returns:: pd.Series : global attributes pd.DataFrame : attributes for each evaluated sample
Return type:: Tuple

class shepherd_score.evaluations.evaluate.pipelines.ConditionalEvalPipeline(ref_molec, generated_mols, condition, num_surf_points=400, pharm_multi_vector=None, solvent=None, priority_pharm_indices=None)[source]#

Bases: object

Evaluation pipeline for conditionally generated molecules.

Parameters:

ref_molec (Molecule)
generated_mols (List[Tuple[ndarray, ndarray]])
condition (str)
num_surf_points (int)
pharm_multi_vector (bool | None)
solvent (str | None)
priority_pharm_indices (list | None)

__init__(ref_molec, generated_mols, condition, num_surf_points=400, pharm_multi_vector=None, solvent=None, priority_pharm_indices=None)[source]#

Initialize attributes for conditional evaluation pipeline.

Parameters:

ref_molec (Molecule) – Reference/target molecule object that was used for conditioning. Must contain the 3D representation that was used for conditioning (i.e., shape, ESP, or pharmacophores).
generated_mols (List[Tuple[np.ndarray, np.ndarray]]) – List containing tuple of np.ndarrays holding atomic numbers (N,) and corresponding positions (N, 3).
condition (str) – Condition the molecule was conditioned on, one of 'surface', 'esp', 'pharm', 'all'. Used for alignment.
num_surf_points (int, optional) – Number of surface points to sample for similarity scoring. Must match the number of surface points in ref_molec. Default is 400.
pharm_multi_vector (bool, optional) – Use multiple vectors to represent Aro/HBA/HBD or single. Choose whatever was used during joint generation and the settings for ref_molec should match.
solvent (str, optional) – Solvent type for xtb relaxation.
priority_pharm_indices (list of int, optional) – Indices (into ref_molec pharmacophore arrays) of “priority” pharmacophores. When provided, two additional Tversky ('tversky_ref') scores are computed after the full-set pharm alignment: one for the priority subset (sims_pharm_priority_target_relax_optimal) and one for the non-priority complement subset (sims_pharm_nonpriority_target_relax_optimal). Requires condition to be 'pharm' or 'all' and pharm_multi_vector to be a bool.

evaluate(num_processes=1, num_workers=1, verbose=False, timeout_minutes=None, *, mp_context='spawn')[source]#

Run conditional evaluation on every generated molecule.

Parameters:

num_processes (int, optional) – Number of processors to use for xtb relaxation. Default is 1.
num_workers (int, optional) – Number of workers to use for multiprocessing. If num_workers > 1, multiprocessing is used, and not much is gained by setting num_processes > 1. There is an associated overhead of starting up new processes and doing score evaluations. Default is 1.
verbose (bool, optional) – Whether to display tqdm progress bar. Default is False.
timeout_minutes (float, optional) – Per-molecule timeout in minutes. If exceeded, the evaluation is terminated and a failed result is recorded. Default is None (no timeout).
mp_context ({'spawn', 'forkserver'}, optional) – Context for multiprocessing. 'spawn' is recommended for most cases. Default is 'spawn'.

Returns:

Updates the class attributes in place.

Return type:

None

resampling_surf_scores()[source]#

Capture distribution of similarity scores caused by resampling surface.

Returns:

surf_scores (np.ndarray or None) – Surface similarity scores from resampling, or None if not relevant.
esp_scores (np.ndarray or None) – Surface ESP scores from resampling, or None if not relevant.

Return type:

ndarray | None

get_attr(obj, attr)[source]#

Gets an attribute of obj via the string name. If it is None, then return np.nan

Parameters:: attr (str)

get_frac_valid()[source]#: Fraction of generated molecules that were valid.

get_frac_valid_post_opt()[source]#: Fraction of generated molecules that were valid after relaxation.

get_frac_consistent_graph()[source]#: Fraction of generated molecules that were consistent before and after relaxation.

get_frac_unique()[source]#: Fraction of unique smiles extracted pre-optimization in the generated set.

get_frac_unique_post_opt()[source]#: Fraction of unique smiles extracted post-optimization in the generated set.

get_diversity()[source]#

Get average molecular graph diversity with respect to target.

Returns:: Average diversity in range [0, 1] where 1 is more diverse (more dissimilar).
Return type:: float

to_pandas()[source]#

Convert the stored attributes to a pd.Series (for global attributes) and pd.DataFrame (for attributes relevant to every instance).

Parameters:: self
Returns:: pd.Series : global attributes pd.DataFrame : attributes for each evaluated sample
Return type:: Tuple

shepherd_score.evaluations.evaluate.pipelines.resample_surf_scores(ref_molec, num_samples=20, eval_surf=True, eval_esp=True, lam_scaled=62.20604814099848)[source]#

Get baseline scores by resampling the surface.

Parameters:

ref_molec (Molecule) – Reference molecule object.
num_samples (int, optional) – Number of times to resample the surface. Default is 20.
eval_surf (bool, optional) – Whether to evaluate surface similarity. Default is True.
eval_esp (bool, optional) – Whether to evaluate ESP similarity. Default is True.
lam_scaled (float, optional) – Scaled lambda parameter for ESP scoring. Default is 0.3 * LAM_SCALING.

Returns:

surf_scores (np.ndarray or None) – Surface similarity scores from resampling, or None if not relevant.
esp_scores (np.ndarray or None) – ESP similarity scores from resampling, or None if not relevant.

Return type:

Tuple[ndarray | None]

class shepherd_score.evaluations.evaluate.pipelines.ConsistencyEvalPipeline(generated_mols, generated_surf_points=None, generated_surf_esp=None, generated_pharm_feats=None, probe_radius=1.2, pharm_multi_vector=None, solvent=None, random_molblock_charges=None)[source]#

Bases: UnconditionalEvalPipeline

Evaluation pipeline for generated molecules with consistency check.

Parameters:

generated_mols (List[Tuple[ndarray, ndarray]])
generated_surf_points (List[ndarray] | None)
generated_surf_esp (List[ndarray] | None)
generated_pharm_feats (List[Tuple[ndarray, ndarray, ndarray]] | None)
probe_radius (float)
pharm_multi_vector (bool | None)
solvent (str | None)
random_molblock_charges (List[Tuple] | None)

__init__(generated_mols, generated_surf_points=None, generated_surf_esp=None, generated_pharm_feats=None, probe_radius=1.2, pharm_multi_vector=None, solvent=None, random_molblock_charges=None)[source]#

Initialize attributes for consistency evaluation pipeline.

Parameters:

generated_mols (List[Tuple[np.ndarray, np.ndarray]]) – List containing tuple of np.ndarrays holding atomic numbers (N,) and corresponding positions (N, 3).
generated_surf_points (List[np.ndarray], optional) – List containing all surface point clouds of shape (M, 3).
generated_surf_esp (List[np.ndarray], optional) – List containing corresponding ESP values of shape (M,) for the generated_surf_points.
generated_pharm_feats (List[Tuple[np.ndarray, np.ndarray, np.ndarray]], optional) –
List of tuples containing:
- generated_pharm_types : np.ndarray (P,) pharmacophore types as ints.
- generated_pharm_ancs : np.ndarray (P, 3) pharm anchor coordinates.
- generated_pharm_vecs : np.ndarray (P, 3) pharm vectors relative unit vecs.
probe_radius (float, optional) – Probe radius used for solvent accessible surface. Default is 1.2.
pharm_multi_vector (bool, optional) – Use multiple vectors to represent Aro/HBA/HBD or single if generated_pharm_feats is used. Choose whatever was used during joint generation and the settings for ref_molec should match.
solvent (str, optional) – Solvent type for xtb relaxation.
random_molblock_charges (List[Tuple], optional) – Contains molblock_charges to randomly select from, and align with (re-)generated sample.

evaluate(num_processes=1, num_workers=1, verbose=False, timeout_minutes=None, *, mp_context='spawn')[source]#

Run consistency evaluation on every generated molecule.

Parameters:

num_processes (int, optional) – Number of processors to use for xtb relaxation. Default is 1.
num_workers (int, optional) – Number of workers to use for multiprocessing. If num_workers > 1, multiprocessing is used, and not much is gained by setting num_processes > 1 in this case. There is an associated overhead of starting up new processes and doing score evaluations. Default is 1.
verbose (bool, optional) – Whether to display tqdm progress bar. Default is False.
timeout_minutes (float, optional) – Per-molecule timeout in minutes. If exceeded, the evaluation is terminated and a failed result is recorded. Default is None (no timeout).
mp_context ({'spawn', 'forkserver'}, optional) – Context for multiprocessing. 'spawn' is recommended for most cases. Default is 'spawn'.

Returns:

Updates the class attributes in place.

Return type:

None

resampling_surf_scores(consis_eval, num_samples=20)[source]#

Capture distribution of similarity scores caused by resampling surface.

Parameters:

consis_eval (ConsistencyEval) – ConsistencyEval object to check similarity scores caused by resampling.
num_samples (int, optional) – Number of times to resample surface and score. Default is 20.

Returns:

surf_scores (np.ndarray or None) – Surface similarity scores from resampling, or None if not relevant.
esp_scores (np.ndarray or None) – ESP similarity scores from resampling, or None if not relevant.

Return type:

Tuple[ndarray | None]

static resampling_upper_bounds(consis_eval, num_samples=5, num_surf_points=None)[source]#

Compute upper bound of similarity score from stochastic surface sampling.

The upper bound is computed as the mean similarity between pairwise comparisons of resampled surfaces.

Parameters:

consis_eval (ConsistencyEval) – ConsistencyEval object to evaluate.
num_samples (int, optional) – Number of samples to use for computing the upper bound. Default is 5.
num_surf_points (int, optional) – Number of surface points to sample. If None, uses the value from consis_eval.

Returns:

upper_bound_surf (float or None) – Surface similarity upper bound, or None if not applicable.
upper_bound_esp (float or None) – ESP similarity upper bound, or None if not applicable.

Return type:

Tuple[float | None]

to_pandas()[source]#

Convert the stored attributes to a pd.Series (for global attributes) and pd.DataFrame (for attributes relevant to every instance).

Parameters:: self
Returns:: pd.Series : global attributes pd.DataFrame : attributes for each evaluated sample
Return type:: Tuple

Evaluation

Contents

Evaluation#

Evaluation Classes#

Evaluation Pipelines#