Evaluation Utilities#

Utility functions supporting the evaluation modules.

Data Conversion#

Helper functions to convert different data types.

shepherd_score.evaluations.utils.convert_data.write_xyz_file(atomic_numbers, positions, path_to_file=None)[source]#

Writes an xyz file of an atomistic structure, given np.ndarray of atomic numbers and coordinates.

Parameters:
  • atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)

  • positions (np.ndarray of shape (N,3) containing atomic coordinates)

  • path_to_file (str specifying file path -- e.g. path_to_file = 'examples/molecule.xyz'. If None, then no output file is written.)

Returns:

str

Return type:

xyz block

shepherd_score.evaluations.utils.convert_data.write_xyz_file_with_dummy(atomic_numbers, positions, path_to_file=None)[source]#

Writes an xyz file of an atomistic structure, given np.ndarray of atomic numbers and coordinates. Accounts for the presence of dummy atoms.

Parameters:
  • atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)

  • positions (np.ndarray of shape (N,3) containing atomic coordinates)

  • path_to_file (str specifying file path -- e.g. path_to_file = 'examples/molecule.xyz'. If None, then no output file is written.)

Returns:

xyz : str : xyz block dummy_atom_pos : np.ndarray : positions of dummy atoms

Return type:

Tuple

shepherd_score.evaluations.utils.convert_data.get_xyz_content(atomic_numbers, positions)[source]#

Get the xyz block of an atomistic structure.

Parameters:
Return type:

str

shepherd_score.evaluations.utils.convert_data.get_xyz_content_with_dummy(atomic_numbers, positions)[source]#

Get the xyz block of an atomistic structure and remove dummy atoms from the xyz block.

Parameters:
  • atomic_numbers (np.ndarray of shape (N,) containing atomic numbers)

  • positions (np.ndarray of shape (N,3) containing atomic coordinates)

Returns:

xyz : str : xyz block (without dummy atoms) dummy_atom_pos : np.ndarray : positions of dummy atoms

Return type:

Tuple

shepherd_score.evaluations.utils.convert_data.extract_mol_from_xyz_block(xyz_block, charge=0, verbose=False)[source]#

Attempts to extract a mol object from an xyz block.

Assumes that the xyz structure has hydrogens included explicitly.

Parameters:
  • xyz_block (str containing atomistic structure in xyz format.)

  • charge (int specifying the expected (overall) charge of the structure.)

  • verbose (bool indicating whether to print error statements upon extraction failure)

Return type:

rdkit.Chem.rdchem.Mol object if successful, None otherwise

shepherd_score.evaluations.utils.convert_data.get_mol_from_atom_pos(atoms, positions)[source]#

Try to get a RDKit mol object from atom and coordinate arrays.

Parameters:
  • atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.

  • positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)

Returns:

mol : Chem.Mol or None charge : int overall charge of molecule xyz_block : str

Return type:

Tuple

shepherd_score.evaluations.utils.convert_data.get_smiles_from_atom_pos(atoms, positions)[source]#

Try to get a SMILES string from atom and coordinate arrays.

Parameters:
  • atoms (np.ndarray (N,) of atomic numbers of the generated molecule or (N,M) one-hot) – encoding.

  • positions (np.ndarray (N,3) of coordinates for the generated molecule's atoms.)

Return type:

SMILES str or None

shepherd_score.evaluations.utils.convert_data.load_npz_to_df(npz_path, file_id)[source]#

Function to load a single npz file and return a dataframe with expanded zero-dimensional arrays. This works specifically for files generated by ConditionalEvalPipeline.

Parameters:
Return type:

DataFrame

shepherd_score.evaluations.utils.convert_data.collate_npz_files(npz_files, include_file_id)[source]#

Function to collate all npz files into a single dataframe.

Parameters:
  • npz_files (list of file paths)

  • include_file_id (bool Whether to include a column called "file_id" that groups together) – rows that came from the same file.

Returns:

pd.DataFrame

Return type:

rows are each sample, columns are each property, and it repeats any 0d arrays.

Protein & Ligand Preparation#