Evaluation Utilities#
Utility functions supporting the evaluation modules.
Data Conversion#
Helper functions to convert different data types.
- shepherd_score.evaluations.utils.convert_data.write_xyz_file(atomic_numbers, positions, path_to_file=None)[source]#
Writes an xyz file of an atomistic structure, given np.ndarray of atomic numbers and coordinates.
- Parameters:
atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)
positions (np.ndarray of shape (N,3) containing atomic coordinates)
path_to_file (str specifying file path -- e.g. path_to_file = 'examples/molecule.xyz'. If None, then no output file is written.)
- Returns:
str
- Return type:
xyz block
- shepherd_score.evaluations.utils.convert_data.write_xyz_file_with_dummy(atomic_numbers, positions, path_to_file=None)[source]#
Writes an xyz file of an atomistic structure, given np.ndarray of atomic numbers and coordinates. Accounts for the presence of dummy atoms.
- Parameters:
atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)
positions (np.ndarray of shape (N,3) containing atomic coordinates)
path_to_file (str specifying file path -- e.g. path_to_file = 'examples/molecule.xyz'. If None, then no output file is written.)
- Returns:
xyz : str : xyz block dummy_atom_pos : np.ndarray : positions of dummy atoms
- Return type:
Tuple
- shepherd_score.evaluations.utils.convert_data.get_xyz_content(atomic_numbers, positions)[source]#
Get the xyz block of an atomistic structure.
- shepherd_score.evaluations.utils.convert_data.get_xyz_content_with_dummy(atomic_numbers, positions)[source]#
Get the xyz block of an atomistic structure and remove dummy atoms from the xyz block.
- Parameters:
atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)
positions (np.ndarray of shape (N,3) containing atomic coordinates)
- Returns:
xyz : str : xyz block (without dummy atoms) dummy_atom_pos : np.ndarray : positions of dummy atoms
- Return type:
Tuple
- shepherd_score.evaluations.utils.convert_data.extract_mol_from_xyz_block(xyz_block, charge=0, verbose=False)[source]#
Attempts to extract a mol object from an xyz block.
Assumes that the xyz structure has hydrogens included explicitly.
- Parameters:
xyz_block (str containing atomistic structure in xyz format.)
charge (int specifying the expected (overall) charge of the structure.)
verbose (bool indicating whether to print error statements upon extraction failure)
- Return type:
rdkit.Chem.rdchem.Mol object if successful, None otherwise
- shepherd_score.evaluations.utils.convert_data.get_mol_from_atom_pos(atoms, positions)[source]#
Try to get a RDKit mol object from atom and coordinate arrays.
- Parameters:
atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.
positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)
- Returns:
mol : Chem.Mol or None charge : int overall charge of molecule xyz_block : str
- Return type:
Tuple
- shepherd_score.evaluations.utils.convert_data.get_smiles_from_atom_pos(atoms, positions)[source]#
Try to get a SMILES string from atom and coordinate arrays.
- Parameters:
atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.
positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)
- Return type:
SMILES str or None
- shepherd_score.evaluations.utils.convert_data.load_npz_to_df(npz_path, file_id)[source]#
Function to load a single npz file and return a dataframe with expanded zero-dimensional arrays. This works specifically for files generated by ConditionalEvalPipeline.
- shepherd_score.evaluations.utils.convert_data.collate_npz_files(npz_files, include_file_id)[source]#
Function to collate all npz files into a single dataframe.
- Parameters:
npz_files (list of file paths)
include_file_id (bool Whether to include a column called "file_id" that groups together) – rows that came from the same file.
- Returns:
pd.DataFrame
- Return type:
rows are each sample, columns are each property, and it repeats any 0d arrays.