Conformer Generation#

Functions for generating and optimizing molecular conformers.

Handles anything related to generating conformers with xTB or MMFF94.

Requires xtb installation with command-line access. See https://xtb-docs.readthedocs.io/en/latest/setup.html for installation instructions.

shepherd_score.conformer_generation.set_thread_limits(num_threads)[source]#

Temporarily set threading environment variables.

Parameters:

num_threads (int)

shepherd_score.conformer_generation.update_mol_coordinates(mol, coordinates)[source]#

Update the coordinates of a 3D RDKit mol object with a new set of coordinates.

Parameters:
  • mol (Chem.Mol) – RDKit mol object with 3D coordinates to be replaced.

  • coordinates (list or array-like) – List/array of new [x, y, z] coordinates.

Returns:

RDKit mol object with updated 3D coordinates.

Return type:

Chem.Mol

shepherd_score.conformer_generation.read_multi_xyz_file(file_dir)[source]#

Read an xyz file that potentially contains multiple structures.

Parameters:

file_dir (str) – Path to .xyz file.

Returns:

  • all_coordinates (list) – List of lists containing the coordinates of each structure in the xyz file.

  • all_elements (list) – List of lists containing the element types of each atom in each structure.

shepherd_score.conformer_generation.embed_conformer(mol, attempts=50, MMFF_optimize=False, random_seed=-1)[source]#

Embed a mol object into a 3D RDKit mol object with ETKDG (and optional MMFF94).

Parameters:
  • mol (Chem.Mol) – RDKit Mol object.

  • attempts (int, optional) – Number of embedding attempts. Default is 50.

  • MMFF_optimize (bool, optional) – Whether to optimize embedded conformer with MMFF94. Default is False.

  • random_seed (int, optional) – Seed for RDKit’s EmbedMolecule. -1 means no seed, otherwise must be positive.

Returns:

RDKit mol object with 3D coordinates, or None if embedding fails.

Return type:

Chem.Mol or None

shepherd_score.conformer_generation.embed_conformer_from_smiles(smiles, attempts=50, MMFF_optimize=False, random_seed=-1)[source]#

Embed a SMILES into a 3D RDKit mol object with ETKDG (and optionally MMFF94).

Parameters:
  • smiles (str) – SMILES string of molecule.

  • attempts (int, optional) – Number of embedding attempts. Default is 50.

  • MMFF_optimize (bool, optional) – Whether to optimize embedded conformer with MMFF94. Default is False.

  • random_seed (int, optional) – Seed for RDKit’s EmbedMolecule. -1 means no seed, otherwise must be positive.

Returns:

RDKit mol object with 3D coordinates, or None if embedding fails.

Return type:

Chem.Mol or None

shepherd_score.conformer_generation.conf_to_mol(mol, conf_id)[source]#

Convert a conformer of a RDKit mol object into its own RDKit mol object.

Parameters:
  • mol (Chem.Mol) – RDKit mol object with multiple conformers.

  • conf_id (int) – ID of conformer to be converted into its own mol object.

Returns:

Mol object with only 1 conformer (the selected conformer).

Return type:

Chem.Mol

shepherd_score.conformer_generation.generate_conformer_ensemble(mol_3d, num_confs=100, num_threads=4, threshold=0.25, num_opt_steps=200)[source]#

Use ETKDG algorithm to embed multiple conformers from a given 3D conformer template.

Optionally optimizes each embedded conformer with MMFF94.

Parameters:
  • mol_3d (Chem.Mol) – RDKit mol object with 3D coordinates.

  • num_confs (int, optional) – Maximum number of conformers to be embedded with ETKDG. Default is 100.

  • num_threads (int, optional) – Number of processors to be used in parallel when embedding conformers. Default is 4.

  • threshold (float, optional) – RMSD threshold used to eliminate redundant conformers after ETKDG embedding. Default is 0.25.

  • num_opt_steps (int, optional) – Number of MMFF94 optimization steps. Default is 200.

Returns:

List of mol objects, each containing 1 (unique) conformer.

Return type:

list

shepherd_score.conformer_generation.cluster_conformers_butina(conformers, threshold=0.2, num_max_conformers=100)[source]#

Cluster a list of conformers by their pairwise RMSD with Butina Clustering algorithm.

Parameters:
  • conformers (list) – List of rdkit mol objects containing conformers of a common molecule to be clustered.

  • threshold (float, optional) – Initial RMSD threshold for clustering. Default is 0.2.

  • num_max_conformers (int, optional) – Maximum number of conformers in the final clustered ensemble. Default is 100.

Returns:

List of int indices of the centroids of each cluster, to be indexed into conformers.

Return type:

list

shepherd_score.conformer_generation.optimize_conformer_with_xtb(conformer, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#

Use external calls to GFN2-XTB (command line) to optimize a conformer geometry.

Parameters:
  • conformer (Chem.Mol) – RDKit mol object containing 3D coordinates.

  • solvent (str, optional) – Implicit solvent to be used during optimization. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is None.

  • num_cores (int, optional) – Number of CPU cores to be used in the xTB geometry optimization. Default is 1.

  • charge (int, optional) – Molecular charge. Default is 0.

  • temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.

Returns:

(xtb_mol, energy, charges) - tuple of optimized RDKit mol object, xTB energy (in Hartrees), and partial charges (in e-).

Return type:

tuple

shepherd_score.conformer_generation.optimize_conformer_with_xtb_from_xyz_block(xyz_block, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#

Use external calls to GFN2-XTB (command line) to optimize coordinates from an xyz block.

Parameters:
  • xyz_block (str) – String of an xyz block.

  • solvent (str, optional) – Implicit solvent to be used during optimization. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is None.

  • num_cores (int, optional) – Number of CPU cores to be used in the xTB geometry optimization. Default is 1.

  • charge (int, optional) – Molecular charge. Default is 0.

  • temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.

Returns:

(xtb_xyz_block, energy, charges) - tuple of optimized xyz block string, xTB energy (in Hartrees), and partial charges (in e-).

Return type:

tuple

shepherd_score.conformer_generation.charges_from_single_point_conformer_with_xtb(conformer, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#

Compute atomic partial charges from a single point xTB calculation of a provided conformer.

Uses external calls to GFN2-XTB (command line).

Parameters:
  • conformer (Chem.Mol) – RDKit mol object containing 3D coordinates.

  • solvent (str, optional) – Implicit solvent to be used during calculation. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is None.

  • num_cores (int, optional) – Number of CPU cores to be used in the xTB calculation. Default is 1.

  • charge (int, optional) – Molecular charge. Default is 0.

  • temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.

Returns:

List of partial charges for each atom (in e-).

Return type:

list

shepherd_score.conformer_generation.single_point_xtb_from_xyz(xyz_block, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#

Compute energy and atomic partial charges from a single point xTB calculation.

Uses external calls to GFN2-XTB (command line).

Parameters:
  • xyz_block (str) – String of xyz block representing a molecule.

  • solvent (str, optional) – Implicit solvent to be used during calculation. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is None.

  • num_cores (int, optional) – Number of CPU cores to be used in the xTB calculation. Default is 1.

  • charge (int, optional) – Molecular charge. Default is 0.

  • temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.

Returns:

  • energy (float) – xTB energy in Hartrees.

  • charges (list) – List of partial charges for each atom (in e-).

shepherd_score.conformer_generation.optimize_conformer_ensemble_with_xtb(conformers, solvent=None, num_processes=1, num_workers=1, charge=0, temp_dir=PosixPath('/tmp'), verbose=False)[source]#

GFN2-XTB geometry optimization for a list of conformers.

Parameters:
  • conformers (list) – List of RDKit Mol objects (with 3D coordinates) to be optimized.

  • solvent (str, optional) – Implicit solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is None.

  • num_processes (int, optional) – Number of CPU cores used per xTB optimization. Default is 1.

  • num_workers (int, optional) – Number of parallel workers (processes) to distribute conformers across. Ensure num_workers * num_processes <= available CPUs to avoid oversubscription. Default is 1.

  • charge (int, optional) – Molecular charge. RDKit will be used to compute the formal charge if len(conformers) > 1. Default is 0.

  • temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.

  • verbose (bool, optional) – Show a simple progress bar in single-process mode. Default is False.

Returns:

(conformers_opt, energies_opt, charges_opt) - tuple of lists containing optimized conformers, their energies, and partial charges.

Return type:

tuple

shepherd_score.conformer_generation.generate_opt_conformers_xtb(smiles, charge=0, solvent=None, MMFF_optimize=True, num_processes=1, num_workers=1, temp_dir=PosixPath('/tmp'), verbose=False, num_confs=1000)[source]#

Generate conformer ensemble with RDKit then relax with xTB.

Parameters:
  • smiles (str) – SMILES string of the molecule.

  • charge (int, optional) – Molecular charge. Default is 0.

  • solvent (str, optional) – Implicit solvent to be used during optimization. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is None.

  • MMFF_optimize (bool, optional) – Optimize RDKit embedded molecules with MMFF94. Default is True.

  • num_processes (int, optional) – Number of CPU cores to be used in the xTB geometry optimization. Default is 1.

  • num_workers (int, optional) – Number of parallel workers (processes) to distribute conformers across. Ensure num_workers * num_processes <= available CPUs to avoid oversubscription. Default is 1.

  • temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.

  • verbose (bool, optional) – Toggle tqdm progress bar. Default is False.

  • num_confs (int, optional) – Number of conformers to initially generate. Default is 1000.

Returns:

  • clustered_conformers_xtb (list) – List of rdkit conformers after xTB relaxation and clustering.

  • clustered_energies_xtb (list) – List of energies for associated conformers.

  • clustered_charges_xtb (list) – List of partial charges for associated conformers.

shepherd_score.conformer_generation.generate_opt_conformers(smiles, MMFF_optimize=True, verbose=False, num_confs=1000)[source]#

Generate optimal conformers with RDKit (MMFF94).

Parameters:
  • smiles (str) – SMILES string of the molecule.

  • MMFF_optimize (bool, optional) – Optimize RDKit embedded molecules with MMFF94. Default is True.

  • verbose (bool, optional) – Toggle tqdm progress bar. Default is False.

  • num_confs (int, optional) – Number of conformers to initially generate. Default is 1000.

Returns:

List of clustered rdkit conformers after RDKit embedding and optional MMFF relaxation.

Return type:

list