Pharmacophore Utilities#

Modules for pharmacophore feature extraction and manipulation.

Pharmacophore Extraction#

Generate pharmacophores from a RDKit conformer.

Parts of code adapted from Francois Berenger / Tsuda Lab and RDKit.

References:

shepherd_score.pharm_utils.pharmacophore.pattern_of_smarts(s)[source]#
shepherd_score.pharm_utils.pharmacophore.find_hydrophobes(mol, cluster_hydrophobic=True)[source]#

Find hydrophobes and cluster them.

Parameters:
  • mol (rdkit Mol object with a conformer.)

  • cluster_hydrophobic (bool (default=True) to cluster hydrophobic atoms if they fall within 2A.)

Return type:

list of tuples containing coordinates for the locations for each hydrophobe.

shepherd_score.pharm_utils.pharmacophore.get_pharmacophores_dict(mol, multi_vector=True, exclude=[], check_access=False, scale=1.0)[source]#

Get the positions of pharmacophore anchors and their associated unit vectors.

Returns a dictionary. Adapted from rdkit.Chem.Features.ShowFeats.ShowMolFeats.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit Mol object with a conformer.

  • multi_vector (bool, optional) – Whether to represent pharmacophores with multiple vectors. Default is True.

  • exclude (list, optional) – List of atom indices to not include as a HBD. Default is [].

  • check_access (bool, optional) – Check if HBD/HBA are accessible to the molecular surface. Default is False.

  • scale (float, optional) – Length of the vector in Angstroms. Default is 1.0.

Returns:

Dictionary with format {'FeatureName': {'P': [(anchor coord), ...], 'V': [(rel. vec), ...]}}.

Return type:

dict

shepherd_score.pharm_utils.pharmacophore.get_pharmacophores(mol, multi_vector=True, exclude=[], check_access=False, scale=1.0)[source]#

Get the identity, anchor positions, and relative unit vectors for each pharmacophore.

Pharmacophore ordering for indexing: (‘Acceptor’, ‘Donor’, ‘Aromatic’, ‘Hydrophobe’, ‘Cation’, ‘Anion’, ‘ZnBinder’)

Notes

The check_access parameter is currently based on whether interaction points sampled from a sphere’s surface with a radius of 1.8A from the acceptor/donor atom falls outside the solvent accessible surface defined by the vdW radius + 0.8A of the neighboring atoms. This works for buried acceptors/donors, but may be prone to false positives. For example, CN(C)C would have its sole HBA rejected. Other approaches such as buried volume should be considered in the future.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit Mol object with conformer.

  • multi_vector (bool, optional) – Whether to represent pharmacophores with multiple vectors. Default is True.

  • exclude (list, optional) – List of hydrogen indices to not include as a HBD. Default is [].

  • check_access (bool, optional) – Check if HBD/HBA are accessible to the molecular surface. Default is False.

  • scale (float, optional) – Length of a pharmacophore vector in Angstroms. Default is 1.0.

Returns:

  • X (np.ndarray) – Identity of pharmacophore corresponding to the indexing order, shape (N,).

  • P (np.ndarray) – Anchor positions of each pharmacophore, shape (N, 3).

  • V (np.ndarray) – Unit vectors in a relative position to the anchor positions, shape (N, 3). Adding P and V results in the position of the vector’s extended point.

Return type:

Tuple[ndarray, ndarray, ndarray]

Pharmacophore Vectors#

Generate the vector features for pharmacophores from a rdkit conformer.

Adapted from rdkit:

rdkit/rdkit rdkit/rdkit

Changed to return anchor position and relative unit vector.

shepherd_score.pharm_utils.pharmvec.GetAromaticFeatVects(conf, featAtoms, featLoc, return_both=False, scale=1.0)[source]#

Compute the direction vector for an aromatic feature

Changed: only return one vector, process later for visualization and scoring

Parameters:
  • conf (a conformer)

  • featAtoms (list of atom IDs that make up the feature)

  • featLoc (location of the aromatic feature specified as point3d)

  • return_both (bool for whether to return both vectors or just one.)

  • scale (the size of the direction vector)

Returns:

list of anchor position(s) as rdkit Point3D list of relative unit vector(s) as rdkit Point3D

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetDonorFeatVects(conf, featAtoms, scale=1.0, exclude=[])[source]#

Get vectors for hydrogen bond donors in the direction of the hydrogens.

Parameters:
  • conf (rdkit Mol object with a conformer.)

  • featAtoms (list containing rdkit Atom object of atom attributed as a donor.)

  • scale (float (default = 1.) length of direction vector.)

  • exclude (list of atom indices that should not be included as a donatable H.)

Returns:

list of anchor position(s) as rdkit Point3D or None list of relative unit vector(s) as rdkit Point3D or None list of neighboring hydrogens or None

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptorFeatVects(conf, featAtoms, scale=1.0)[source]#

Get the anchor positions and relative unit vectors of an acceptor atom.

Assumes HBA’s are only O and N as defined by smarts_features.fdef. If HBA is not one of those, then it assumes the atom has one lone pair.

Parameters:
  • conf (Chem.Mol) – RDKit Mol object with a conformer.

  • featAtoms (list) – List containing RDKit Atom object of atom attributed as an acceptor.

  • scale (float, optional) – Length of direction vector. Default is 1.0.

Returns:

(list of anchor position(s) as RDKit Point3D or [None], list of relative unit vector(s) as RDKit Point3D or [None])

Return type:

tuple

shepherd_score.pharm_utils.pharmvec.GetHalogenFeatVects(conf, featAtoms, scale=1.0)[source]#

Get the anchor positions and relative unit vectors of a halogen atom. Assumes only one connection.

Parameters:
  • conf (rdkit Mol object with a conformer.)

  • featAtoms (list containing rdkit Atom object of atom attributed as an acceptor.)

  • scale (float (default = 1.) length of direction vector.)

Returns:

list of anchor position(s) as rdkit Point3D or [None] list of relative unit vector(s) as rdkit Point3D or [None]

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetDonor1FeatVects_single(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Donor of type 1. Made to generate a single vector representation.

This is a donor with one heavy atom. It is not clear where we should we should be putting the direction vector for this. It should probably be a cone. In this case we will just use the direction vector from the donor atom to the heavy atom.

Changed: conditioning based on the number of hydrogens 1. If 1 hydrogen, vector should point in the direction of the hydrogen. 2. If 2 hydrogens, vector should point in a bisecting direction of the two hydrogens. 3. If 3 hydrogens, point in the direction of the bond.

Parameters:
  • conformer (conf - rdkit Mol object with)

  • feature (featAtoms - list of atoms that are part of the)

  • 1.0) (scale - float for length of the direction vector (default =)

Returns:

anchor position as rdkit Point3D or None relative unit vector(s) as rdkit Point3D or None list of hydrogen rdkit Atom objects

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetDonor2FeatVects_single(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Donor of type 2. Made to generate a single vector representation.

This is a donor with two heavy atoms as neighbors. The atom may are may not have hydrogen on it. Here are the situations with the neighbors that will be considered here 1. two heavy atoms and two hydrogens: we will assume a sp3 arrangement here 2. two heavy atoms and one hydrogen: this can either be sp2 or sp3 3. two heavy atoms and no hydrogens

Changed: conditioning based on the number of hydrogens 1. For case 1, point in the direction bisecting the two hydrogens. 2. For case 2, point in the direction of the hydrogen. 3. For case 3, no changes.

Parameters:
  • conf (rdkit Mol object with conformer)

  • featAtoms (list of atoms that are part of the feature)

  • scale (float for length of the direction vector (default = 1.0))

Returns:

anchor position as rdkit Point3D or None relative unit vector(s) as rdkit Point3D or None list of hydrogen rdkit Atom objects

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetDonor3FeatVects_single(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Donor of type 3. Made to generate a single vector representation.

This is a donor with three heavy atoms as neighbors. We will assume a tetrahedral arrangement of these neighbors. So the direction we are seeking is the last fourth arm of the sp3 arrangement

Changed: Return anchor and relative unit vector tuple

Parameters:
  • conf (rdkit Mol object with conformer)

  • featAtoms (list of atoms that are part of the feature)

  • scale (float for length of the direction vector (default = 1.0))

Returns:

anchor position as rdkit Point3D or None relative unit vector(s) as rdkit Point3D or None list of hydrogen rdkit Atom objects

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptor1FeatVects_single(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Acceptor of type 1 (single vector representation).

This is an acceptor with one heavy atom neighbor. There are two possibilities:

  • The bond to the heavy atom is a single bond (e.g. CO): We use the inversion of this bond direction and mark it as a ‘cone’.

  • The bond to the heavy atom is a double bond (e.g. C=O): We have two possible directions except in some special cases (e.g. SO2) where we use bond direction.

Notes

Modified to condition on the number of hydrogens with methanamine fix:

  • Case 1: If one hydrogen, vector points in the opposite direction of the bisection of the acute angle formed by the heavy-acceptor-hydrogen. If two hydrogens, assume sp3 and project in that lone-pair direction. If not tetrahedral, return None.

  • Case 2: Return the bisecting vector of the two lone-pairs.

Parameters:
  • conf (Chem.Mol) – RDKit Mol object with conformer.

  • featAtoms (list) – List of atoms that are part of the feature.

  • scale (float, optional) – Length of the direction vector. Default is 1.0.

Returns:

(anchor position as RDKit Point3D or None, relative unit vector(s) as RDKit Point3D or None)

Return type:

tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptor2FeatVects_single(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Acceptor of type 2. Made to generate a single vector representation.

This is the acceptor with two adjacent heavy atoms. We will special case a few things here. If the acceptor atom is an oxygen we will assume a sp3 hybridization the acceptor directions (two of them) reflect that configurations. Otherwise the direction vector in plane with the neighboring heavy atoms

Changed: Only generate one vector Rather than generating 2 vectors for sp3 oxygen, just keep the average vector.

Parameters:
  • conf (rdkit Mol object with conformer)

  • featAtoms (list of atoms that are part of the feature)

  • scale (float for length of the direction vector (default = 1.0))

Returns:

anchor position as rdkit Point3D or None relative unit vector(s) as rdkit Point3D or None

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptor3FeatVects_single(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Donor of type 3. Made to generate a single vector representation.

This is a donor with three heavy atoms as neighbors. We will assume a tetrahedral arrangement of these neighbors. So the direction we are seeking is the last fourth arm of the sp3 arrangement

Changed: to return anchor and relative unit vector tuple

Parameters:
  • conf (rdkit Mol object with conformer)

  • featAtoms (list of atoms that are part of the feature)

  • scale (float for length of the direction vector (default = 1.0))

Returns:

anchor position as rdkit Point3D or None relative unit vector(s) as rdkit Point3D or None

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptor1FeatVects(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Acceptor of type 1 (multi-vector representation).

This is an acceptor with one heavy atom neighbor. There are two possibilities:

  • The bond to the heavy atom is a single bond (e.g. CO): We use the inversion of this bond direction and mark it as a ‘cone’.

  • The bond to the heavy atom is a double bond (e.g. C=O): We have two possible directions except in some special cases (e.g. SO2) where we use bond direction.

Notes

Modified to change return format, with fixes for methanamine and two vectors for hydroxyls.

Parameters:
  • conf (Chem.Mol) – RDKit Mol object with a conformer.

  • featAtoms (list) – List containing RDKit Atom object of atom attributed as an acceptor.

  • scale (float, optional) – Length of direction vector. Default is 1.0.

Returns:

(list of anchor position(s) as RDKit Point3D or None, list of relative unit vector(s) as RDKit Point3D or None)

Return type:

tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptor2FeatVects(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Acceptor of type 2. Made to generate a single vector representation.

This is the acceptor with two adjacent heavy atoms. We will special case a few things here. If the acceptor atom is an oxygen we will assume a sp3 hybridization the acceptor directions (two of them) reflect that configurations. Otherwise the direction vector in plane with the neighboring heavy atoms

Changed: return format

Parameters:
  • conf (rdkit Mol object with a conformer.)

  • featAtoms (list containing rdkit Atom object of atom attributed as an acceptor.)

  • scale (float (default = 1.) length of direction vector.)

Returns:

list of anchor position(s) as rdkit Point3D or None list of relative unit vector(s) as rdkit Point3D or None

Return type:

Tuple

shepherd_score.pharm_utils.pharmvec.GetAcceptor3FeatVects(conf, featAtoms, scale=1.0)[source]#

Get the direction vectors for Donor of type 3. Made to generate a single vector representation.

This is a donor with three heavy atoms as neighbors. We will assume a tetrahedral arrangement of these neighbors. So the direction we are seeking is the last fourth arm of the sp3 arrangement

Changed: return format

Parameters:
  • conf (rdkit Mol object with a conformer.)

  • featAtoms (list containing rdkit Atom object of atom attributed as an acceptor.)

  • scale (float (default = 1.) length of direction vector.)

Returns:

list of anchor position(s) as rdkit Point3D or None list of relative unit vector(s) as rdkit Point3D or None

Return type:

Tuple