Evaluation Utilities#

Utility functions supporting the evaluation modules.

Data Conversion#

Helper functions to convert different data types.

shepherd_score.evaluations.utils.convert_data.write_xyz_file(atomic_numbers, positions, path_to_file=None)[source]#

Writes an xyz file of an atomistic structure, given np.ndarray of atomic numbers and coordinates.

Parameters:

atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)
positions (np.ndarray of shape (N,3) containing atomic coordinates)
path_to_file (str specifying file path -- e.g. path_to_file = 'examples/molecule.xyz'. If None, then no output file is written.)

Returns:

str

Return type:

xyz block

shepherd_score.evaluations.utils.convert_data.write_xyz_file_with_dummy(atomic_numbers, positions, path_to_file=None)[source]#

Writes an xyz file of an atomistic structure, given np.ndarray of atomic numbers and coordinates. Accounts for the presence of dummy atoms.

Parameters:

atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)
positions (np.ndarray of shape (N,3) containing atomic coordinates)
path_to_file (str specifying file path -- e.g. path_to_file = 'examples/molecule.xyz'. If None, then no output file is written.)

Returns:

xyz : str : xyz block dummy_atom_pos : np.ndarray : positions of dummy atoms

Return type:

Tuple

shepherd_score.evaluations.utils.convert_data.get_xyz_content(atomic_numbers, positions)[source]#

Get the xyz block of an atomistic structure.

Parameters:

atomic_numbers (ndarray)
positions (ndarray)

Return type:

str

shepherd_score.evaluations.utils.convert_data.get_xyz_content_with_dummy(atomic_numbers, positions)[source]#

Get the xyz block of an atomistic structure and remove dummy atoms from the xyz block.

Parameters:

atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)
positions (np.ndarray of shape (N,3) containing atomic coordinates)

Returns:

xyz : str : xyz block (without dummy atoms) dummy_atom_pos : np.ndarray : positions of dummy atoms

Return type:

Tuple

shepherd_score.evaluations.utils.convert_data.extract_mol_from_xyz_block(xyz_block, charge=0, verbose=False)[source]#

Attempts to extract a mol object from an xyz block.

Assumes that the xyz structure has hydrogens included explicitly.

Parameters:

xyz_block (str containing atomistic structure in xyz format.)
charge (int specifying the expected (overall) charge of the structure.)
verbose (bool indicating whether to print error statements upon extraction failure)

Return type:

rdkit.Chem.rdchem.Mol object if successful, None otherwise

shepherd_score.evaluations.utils.convert_data.get_mol_from_atom_pos(atoms, positions)[source]#

Try to get a RDKit mol object from atom and coordinate arrays.

Parameters:

atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.
positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)

Returns:

mol : Chem.Mol or None charge : int overall charge of molecule xyz_block : str

Return type:

Tuple

shepherd_score.evaluations.utils.convert_data.get_smiles_from_atom_pos(atoms, positions)[source]#

Try to get a SMILES string from atom and coordinate arrays.

Parameters:

atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.
positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)

Return type:

SMILES str or None

shepherd_score.evaluations.utils.convert_data.load_npz_to_df(npz_path, file_id)[source]#

Function to load a single npz file and return a dataframe with expanded zero-dimensional arrays. This works specifically for files generated by ConditionalEvalPipeline.

Parameters:

npz_path (Path | str)
file_id (bool)

Return type:

DataFrame

shepherd_score.evaluations.utils.convert_data.collate_npz_files(npz_files, include_file_id)[source]#

Function to collate all npz files into a single dataframe.

Parameters:

npz_files (list of file paths)
include_file_id (bool Whether to include a column called "file_id" that groups together) – rows that came from the same file.

Returns:

pd.DataFrame

Return type:

rows are each sample, columns are each property, and it repeats any 0d arrays.

Evaluation Utilities

Contents

Evaluation Utilities#

Data Conversion#

Protein & Ligand Preparation#