-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a list of oemols for PointMutationExecutor input instead of files #746
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominicrufa can you check my comments below to make sure they are correct? after that, @glass-w can make the changes
ligand_n_atoms = ligand_md_topology.n_atoms | ||
|
||
elif isinstance(ligand_input, list): # list of oemol object(s) | ||
ligand_mol = ligand_input[0] # assume only one molecule in the list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're assuming there is only molecule in the list, let's just allow the options for ligand_input to be a .pdb, .sdf or oemol object (passing a list thats always one elements feels unnecessary). You should be able to check if the ligand_input
is an instance of oechem.OEMol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't know if reading a small molecule as a .pdb is safe, come to think of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the small molecule should not be passed as a pdb. if ligand_input
is a protein, it should be passed as a pdb. this will be noted in function description (it's in one of my suggestions in this PR)
ligand_n_atoms = ligand_md_topology.n_atoms | ||
|
||
elif isinstance(ligand_input, list): # list of oemol object(s) | ||
ligand_mol = ligand_input[0] # assume only one molecule in the list | ||
molecules.append(Molecule.from_openeye(ligand_mol, allow_undefined_stereo=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ligand_mol = generate_unique_atom_names(ligand_mol)
is necessary to include before this line.. did you see any errors when you ran with this version of PointMutationExecutor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait, i just noticed generate_unique_atom_names
is redundant.
it's called here:
perses/perses/utils/openeye.py
Line 354 in f640bf8
molecule = generate_unique_atom_names(molecule) |
so it doesn't need to be called again after
createMolFromSDF
@@ -116,7 +116,7 @@ def __init__(self, | |||
if phase == vacuum, then the complex will not be solvated with water; else, it will be solvated with tip3p | |||
conduct_endstate_validation : bool, default True | |||
whether to conduct an endstate validation of the hybrid topology factory | |||
ligand_file : str, default None | |||
ligand_input : str or list of oemol objects, default None | |||
path to ligand of interest (i.e. small molecule or protein); .sdf or .pdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change this to:
path to ligand of interest: .pdb for protein and .sdf or oemol for small molecule
if isinstance(ligand_input, str): | ||
if ligand_input.endswith('.sdf'): # small molecule | ||
ligand_mol = createOEMolFromSDF(ligand_input, index=ligand_index) | ||
ligand_mol = generate_unique_atom_names(ligand_mol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is redundant, as `generate_unique_atom_names is called here:
perses/perses/utils/openeye.py
Line 354 in f640bf8
molecule = generate_unique_atom_names(molecule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominicrufa can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. remove in the relative_point_mutation_setup.py
, you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes in relative_point_mutation_setup.py
@glass-w I noticed your script above that generates the oemol does not include the functionality covered by these lines in |
Instead of Internally, we can use |
is there anything lossy between oemols and OFF molecule objects? if not, then this is perfectly fine and can be added here. there probably should be a test written somewhere here. |
I believe some non-chemical information, such as SD tags, would be lost. But atom ordering and geometry should be preserved. Is there any particular information you're worried about being lost, @dominicrufa? If so, we can check (or you can just ask on the OFF slack). |
I didn't get any errors when running the above script and the
Yes the geometries looked fine when visualising.
Yes this is a key issue when using OE3DToInternalStereo. It seems that for a subset of these ligands running I agree that if a user were to prepare their own |
Yes I think this would be an idea for the future. Perhaps we can raise this as an issue and generate a separate PR? Right now this change is needed for functionality with this set of ligands. |
@@ -116,8 +116,8 @@ def __init__(self, | |||
if phase == vacuum, then the complex will not be solvated with water; else, it will be solvated with tip3p | |||
conduct_endstate_validation : bool, default True | |||
whether to conduct an endstate validation of the hybrid topology factory | |||
ligand_file : str, default None | |||
path to ligand of interest (i.e. small molecule or protein); .sdf or .pdb | |||
ligand_input : str or list of oemol objects, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glass-w list of oemol objects
should be changed to oemol
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good catch!
@glass-w have you re-run your example script above after making the changes today? just want to make sure that changing |
@zhang-ivy yes I've re-run in a batch job using this functionality: from perses.app.relative_point_mutation_setup import PointMutationExecutor
from perses.utils.openeye import generate_unique_atom_names
import oechem
from simtk import unit
ifs = oechem.oemolistream()
ifs.open('./ligand.sdf')
mol_list = [oechem.OEMol(mol) for mol in ifs.GetOEMols()]
mol = mol_list[0]
if len([atom.GetName() for atom in mol.GetAtoms()]) > len(set([atom.GetName() for atom in mol.GetAtoms()])):
mol = generate_unique_atom_names(mol)
oechem.OEAssignAromaticFlags(mol, oechem.OEAroModelOpenEye)
oechem.OEAssignHybridization(mol)
oechem.OEAddExplicitHydrogens(mol)
oechem.OEPerceiveChiral(mol)
protein_file = './protein.pdb'
x = PointMutationExecutor(
protein_file,
'1',
'510', # an example residue number
'VAL', # an example mutation
ligand_input=mol,
ionic_strength=0.15 * unit.molar,
small_molecule_forcefields='openff-1.2.0'
) it runs and produces all of the relvant output files. Again the above script is for my use case but for general |
i think this is sufficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
This PR adapts the input of
PointMutationExecutor
to allow for a list of oemol objects to be used instead. Both @schallerdavid and I were having difficulty parsing ligand.sdf
files containing a nitrogen that was perceived as needing to be tetrahedral at this line inperses.utils.openeye
. This PR allows the user to bypass internal oemol preparation.Changed:
PointMutationExecutor
now takes inputs:.sdf
,.pdb
,[oemol]
.ligand_file
renamed toligand_input
to be more general.An example script of how to prepare the
[oemol]
object is below, input names changed from my use case: