Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended practice of finding bonded hydrogen from hydrogen bond donor #2521

Open
xiki-tempula opened this issue Feb 10, 2020 · 5 comments

Comments

@xiki-tempula
Copy link
Contributor

Currently, in the water bridge analysis module, the user needs to give the name for the hydrogen bond donor heavy atom and the hydrogens were found via a simple distance search. I'm interested in using the bonded information if possible for this step. However, I'm a man of gro and xtc and I'm not quite sure of how much information a topology file can provide.

I have checked @p-j-smith 's implementation and found that it is done by finding the bonded atom associated with the hydrogens. However, in water bridge analysis, the name of hydrogen bond donor heavy atom is provided instead of the hydrogen bond donor hydrogens. Thus, I need to find the hydrogen atoms associated with the atom and exclude the non-hydrogen atoms.

I can think of two implementations.

hydrogens = [atom for atom in heavy_atom.bonded_atoms if atom.name[0] == 'H']

or

hydrogens = [atom for atom in heavy_atom.bonded_atoms if atom.type == 'H']

Technically, the two implementations should be the same. Since I'm not very familiar with the universe where topology information is available and the type of files where the user might supply, I'm wondering if one implementation might be more robust and less error-prone than the other?

This issue is related to #2120.

@lilyminium
Copy link
Member

Checking names is probably better.

>>> import MDAnalysis as mda
>>> from MDAnalysis.tests.datafiles import *
>>> u = mda.Universe(TPR)
>>> u.atoms.types
array(['opls_287', 'opls_290', 'opls_290', ..., 'opls_407', 'opls_407',
       'opls_407'], dtype=object)
>>> u.atoms.names
array(['N', 'H1', 'H2', ..., 'NA', 'NA', 'NA'], dtype=object)

You could also check the mass and charge like HydrogenBondAnalysis does (#2516). That is more agnostic than checking names/types if your format has that info available.

@xiki-tempula
Copy link
Contributor Author

xiki-tempula commented Feb 10, 2020

@lilyminium Thank you very much for the advice. I will use name instead. I mainly use charmm and amber so type usually works for me.

I'm not too worried about the mass and charge as this module operates in a fashion consistent with the old hydrogen bond analysis, which only checks the name. Thus, unless the user supplies the name for the hydrogen bond donor, in which case the user should know what they are doing, the default hydrogen bond donor atom name is consistent with the pdb amino acid name convention.

I will try to see if I can import some function from the new HydrogenBondAnalysis to do the job

@richardjgowers
Copy link
Member

Just to throw a spanner in the works, I'd check via mass, then we offload the problem of identifying hydrogens to guess_masses :)

@orbeckst
Copy link
Member

Well, if we correctly identified elements then I would use these.

@xiki-tempula
Copy link
Contributor Author

I will probably work on this a bit later, once we have the element attribute in place. In PR #2519, the hydrogen bond donor lookup is done through a dictionary looking after the first frame. Thus, this new implementation should offer at most a 10s gain for a 100k atom system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants