Conformer Generation#
Functions for generating and optimizing molecular conformers.
Handles anything related to generating conformers with xTB or MMFF94.
Requires xtb installation with command-line access. See https://xtb-docs.readthedocs.io/en/latest/setup.html for installation instructions.
- shepherd_score.conformer_generation.set_thread_limits(num_threads)[source]#
Temporarily set threading environment variables.
- Parameters:
num_threads (int)
- shepherd_score.conformer_generation.update_mol_coordinates(mol, coordinates)[source]#
Update the coordinates of a 3D RDKit mol object with a new set of coordinates.
- Parameters:
mol (Chem.Mol) – RDKit mol object with 3D coordinates to be replaced.
coordinates (list or array-like) – List/array of new [x, y, z] coordinates.
- Returns:
RDKit mol object with updated 3D coordinates.
- Return type:
Chem.Mol
- shepherd_score.conformer_generation.read_multi_xyz_file(file_dir)[source]#
Read an xyz file that potentially contains multiple structures.
- Parameters:
file_dir (str) – Path to .xyz file.
- Returns:
all_coordinates (list) – List of lists containing the coordinates of each structure in the xyz file.
all_elements (list) – List of lists containing the element types of each atom in each structure.
- shepherd_score.conformer_generation.embed_conformer(mol, attempts=50, MMFF_optimize=False, random_seed=-1)[source]#
Embed a mol object into a 3D RDKit mol object with ETKDG (and optional MMFF94).
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
attempts (int, optional) – Number of embedding attempts. Default is 50.
MMFF_optimize (bool, optional) – Whether to optimize embedded conformer with MMFF94. Default is
False.random_seed (int, optional) – Seed for RDKit’s EmbedMolecule. -1 means no seed, otherwise must be positive.
- Returns:
RDKit mol object with 3D coordinates, or
Noneif embedding fails.- Return type:
Chem.Mol or None
- shepherd_score.conformer_generation.embed_conformer_from_smiles(smiles, attempts=50, MMFF_optimize=False, random_seed=-1)[source]#
Embed a SMILES into a 3D RDKit mol object with ETKDG (and optionally MMFF94).
- Parameters:
smiles (str) – SMILES string of molecule.
attempts (int, optional) – Number of embedding attempts. Default is 50.
MMFF_optimize (bool, optional) – Whether to optimize embedded conformer with MMFF94. Default is
False.random_seed (int, optional) – Seed for RDKit’s EmbedMolecule. -1 means no seed, otherwise must be positive.
- Returns:
RDKit mol object with 3D coordinates, or
Noneif embedding fails.- Return type:
Chem.Mol or None
- shepherd_score.conformer_generation.conf_to_mol(mol, conf_id)[source]#
Convert a conformer of a RDKit mol object into its own RDKit mol object.
- Parameters:
mol (Chem.Mol) – RDKit mol object with multiple conformers.
conf_id (int) – ID of conformer to be converted into its own mol object.
- Returns:
Mol object with only 1 conformer (the selected conformer).
- Return type:
Chem.Mol
- shepherd_score.conformer_generation.generate_conformer_ensemble(mol_3d, num_confs=100, num_threads=4, threshold=0.25, num_opt_steps=200)[source]#
Use ETKDG algorithm to embed multiple conformers from a given 3D conformer template.
Optionally optimizes each embedded conformer with MMFF94.
- Parameters:
mol_3d (Chem.Mol) – RDKit mol object with 3D coordinates.
num_confs (int, optional) – Maximum number of conformers to be embedded with ETKDG. Default is 100.
num_threads (int, optional) – Number of processors to be used in parallel when embedding conformers. Default is 4.
threshold (float, optional) – RMSD threshold used to eliminate redundant conformers after ETKDG embedding. Default is 0.25.
num_opt_steps (int, optional) – Number of MMFF94 optimization steps. Default is 200.
- Returns:
List of mol objects, each containing 1 (unique) conformer.
- Return type:
- shepherd_score.conformer_generation.cluster_conformers_butina(conformers, threshold=0.2, num_max_conformers=100)[source]#
Cluster a list of conformers by their pairwise RMSD with Butina Clustering algorithm.
- Parameters:
conformers (list) – List of rdkit mol objects containing conformers of a common molecule to be clustered.
threshold (float, optional) – Initial RMSD threshold for clustering. Default is 0.2.
num_max_conformers (int, optional) – Maximum number of conformers in the final clustered ensemble. Default is 100.
- Returns:
List of int indices of the centroids of each cluster, to be indexed into conformers.
- Return type:
- shepherd_score.conformer_generation.optimize_conformer_with_xtb(conformer, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#
Use external calls to GFN2-XTB (command line) to optimize a conformer geometry.
- Parameters:
conformer (Chem.Mol) – RDKit mol object containing 3D coordinates.
solvent (str, optional) – Implicit solvent to be used during optimization. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is
None.num_cores (int, optional) – Number of CPU cores to be used in the xTB geometry optimization. Default is 1.
charge (int, optional) – Molecular charge. Default is 0.
temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.
- Returns:
(xtb_mol, energy, charges) - tuple of optimized RDKit mol object, xTB energy (in Hartrees), and partial charges (in e-).
- Return type:
- shepherd_score.conformer_generation.optimize_conformer_with_xtb_from_xyz_block(xyz_block, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#
Use external calls to GFN2-XTB (command line) to optimize coordinates from an xyz block.
- Parameters:
xyz_block (str) – String of an xyz block.
solvent (str, optional) – Implicit solvent to be used during optimization. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is
None.num_cores (int, optional) – Number of CPU cores to be used in the xTB geometry optimization. Default is 1.
charge (int, optional) – Molecular charge. Default is 0.
temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.
- Returns:
(xtb_xyz_block, energy, charges) - tuple of optimized xyz block string, xTB energy (in Hartrees), and partial charges (in e-).
- Return type:
- shepherd_score.conformer_generation.charges_from_single_point_conformer_with_xtb(conformer, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#
Compute atomic partial charges from a single point xTB calculation of a provided conformer.
Uses external calls to GFN2-XTB (command line).
- Parameters:
conformer (Chem.Mol) – RDKit mol object containing 3D coordinates.
solvent (str, optional) – Implicit solvent to be used during calculation. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is
None.num_cores (int, optional) – Number of CPU cores to be used in the xTB calculation. Default is 1.
charge (int, optional) – Molecular charge. Default is 0.
temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.
- Returns:
List of partial charges for each atom (in e-).
- Return type:
- shepherd_score.conformer_generation.single_point_xtb_from_xyz(xyz_block, solvent=None, num_cores=1, charge=0, temp_dir=PosixPath('/tmp'))[source]#
Compute energy and atomic partial charges from a single point xTB calculation.
Uses external calls to GFN2-XTB (command line).
- Parameters:
xyz_block (str) – String of xyz block representing a molecule.
solvent (str, optional) – Implicit solvent to be used during calculation. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is
None.num_cores (int, optional) – Number of CPU cores to be used in the xTB calculation. Default is 1.
charge (int, optional) – Molecular charge. Default is 0.
temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.
- Returns:
energy (float) – xTB energy in Hartrees.
charges (list) – List of partial charges for each atom (in e-).
- shepherd_score.conformer_generation.optimize_conformer_ensemble_with_xtb(conformers, solvent=None, num_processes=1, num_workers=1, charge=0, temp_dir=PosixPath('/tmp'), verbose=False)[source]#
GFN2-XTB geometry optimization for a list of conformers.
- Parameters:
conformers (list) – List of RDKit Mol objects (with 3D coordinates) to be optimized.
solvent (str, optional) – Implicit solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is
None.num_processes (int, optional) – Number of CPU cores used per xTB optimization. Default is 1.
num_workers (int, optional) – Number of parallel workers (processes) to distribute conformers across. Ensure num_workers * num_processes <= available CPUs to avoid oversubscription. Default is 1.
charge (int, optional) – Molecular charge. RDKit will be used to compute the formal charge if len(conformers) > 1. Default is 0.
temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.
verbose (bool, optional) – Show a simple progress bar in single-process mode. Default is
False.
- Returns:
(conformers_opt, energies_opt, charges_opt) - tuple of lists containing optimized conformers, their energies, and partial charges.
- Return type:
- shepherd_score.conformer_generation.generate_opt_conformers_xtb(smiles, charge=0, solvent=None, MMFF_optimize=True, num_processes=1, num_workers=1, temp_dir=PosixPath('/tmp'), verbose=False, num_confs=1000)[source]#
Generate conformer ensemble with RDKit then relax with xTB.
- Parameters:
smiles (str) – SMILES string of the molecule.
charge (int, optional) – Molecular charge. Default is 0.
solvent (str, optional) – Implicit solvent to be used during optimization. Must be a solvent supported by XTB (https://xtb-docs.readthedocs.io/en/latest/gbsa.html). Default is
None.MMFF_optimize (bool, optional) – Optimize RDKit embedded molecules with MMFF94. Default is
True.num_processes (int, optional) – Number of CPU cores to be used in the xTB geometry optimization. Default is 1.
num_workers (int, optional) – Number of parallel workers (processes) to distribute conformers across. Ensure num_workers * num_processes <= available CPUs to avoid oversubscription. Default is 1.
temp_dir (str or Path, optional) – Temporary directory for I/O. Default is the system temporary directory.
verbose (bool, optional) – Toggle tqdm progress bar. Default is
False.num_confs (int, optional) – Number of conformers to initially generate. Default is 1000.
- Returns:
clustered_conformers_xtb (list) – List of rdkit conformers after xTB relaxation and clustering.
clustered_energies_xtb (list) – List of energies for associated conformers.
clustered_charges_xtb (list) – List of partial charges for associated conformers.
- shepherd_score.conformer_generation.generate_opt_conformers(smiles, MMFF_optimize=True, verbose=False, num_confs=1000)[source]#
Generate optimal conformers with RDKit (MMFF94).
- Parameters:
smiles (str) – SMILES string of the molecule.
MMFF_optimize (bool, optional) – Optimize RDKit embedded molecules with MMFF94. Default is
True.verbose (bool, optional) – Toggle tqdm progress bar. Default is
False.num_confs (int, optional) – Number of conformers to initially generate. Default is 1000.
- Returns:
List of clustered rdkit conformers after RDKit embedding and optional MMFF relaxation.
- Return type: