-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hbonds analysis writing the digit 3 from TIP3 onto acceptor/donor resid #801
Comments
So what does |
here it is...thank you! In [16]: u.atoms.resnames
Out[16]:
array(['GLN', 'GLN', 'GLN', ..., 'CLA', 'CLA', 'CLA'],
dtype='|S4')
In [17]: u.atoms.resnames[:500]
Out[17]:
array(['GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN',
'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN',
'GLN', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA',
'ALA', 'ALA', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU',
'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU',
'GLU', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ALA', 'ALA',
'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'LYS',
'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
'LYS', 'LYS', 'LYS', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP',
'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'LEU', 'LEU', 'LEU',
'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU',
'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'GLN', 'GLN',
'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN',
'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'ALA', 'ALA', 'ALA',
'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'VAL', 'VAL',
'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'ILE', 'ILE', 'ILE', 'ILE',
'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE',
'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'ASP',
'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP',
'ASP', 'ASP', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'THR', 'THR', 'THR',
'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
'THR', 'THR', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY',
'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
'THR', 'THR', 'THR', 'THR', 'THR', 'LEU', 'LEU', 'LEU', 'LEU',
'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU',
'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'THR', 'THR', 'THR',
'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
'THR', 'THR', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU',
'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLY',
'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'ARG', 'ARG', 'ARG',
'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
'ARG', 'ARG', 'ARG', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'GLY', 'GLY', 'GLY', 'GLY',
'GLY', 'GLY', 'GLY', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
'VAL', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'ASP', 'ASP', 'ASP',
'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP',
'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE',
'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE',
'ILE', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'GLY',
'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'PHE', 'PHE', 'PHE',
'PHE', 'PHE', 'PHE', 'PHE', 'PHE'],
dtype='|S4')
In [18]: u.atoms.resnames[:5000]
Out[18]:
array(['GLN', 'GLN', 'GLN', ..., 'TIP3', 'TIP3', 'TIP3'],
dtype='|S4') |
Another important piece of information from the email exchange (and @samuel3181 said the results for 0.14.0 are identical to the one below for 0.8.1):
|
@samuel3181 , can you try to reproduce the problem with some of the MDAnalysis test data files that also contain TIP3P, in particular, try
because
Alternatively, try Please post the python code that you are using to reproduce the problem, something along the lines of import MDAnalysis as mda
import MDAnalysis.analysis.hbonds
from MDAnalysis.tests.datafiles import waterPSF, waterDCD
u = mda.Universe(waterPSF, waterDCD)
# ... If you can reproduce your problem with some of the data files then we can easily fix it. |
Hi Oliver, import MDAnalysis as mda
from MDAnalysis import *
import MDAnalysis.analysis.hbonds
import cPickle
import numpy
f1=open("../hbonds/MDAnalysis_tests_hbonds_TIP.txt",'w')
f1.write("#TIME donor_idx acceptor_idx donor_resnm donor_resid donor_atom acceptor_resnm acceptor_resid acceptor_atom distance angle")
f1.write("\n")
for x in range(1,5): #25
from MDAnalysis.tests.datafiles import waterPSF, waterDCD
u=Universe(waterPSF,waterDCD)
p1 = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u,"resname TIP3","resname TIP3", distance=3.0, angle=120.0)
p1.run()
p1.generate_table()
p1.save_table(filename="../hbonds/hbond_test_tip.pickle")
table1=cPickle.load(open("../hbonds/hbond_test_tip.pickle"))
for i in range(0,len(table1)):
f1.write(str(table1[i][0])) # gives individual elements and convert to degrees
f1.write(" ")
f1.write(strf1.write(" ")
f1.write(str(table1[i][2]))
f1.write(" ")
f1.write(str(table1[i][3]))
f1.write(" ")
f1.write(str(table1[i][4]))
f1.write(" ")
f1.write(str(table1[i][5]))
f1.write(" ")
f1.write(str(table1[i][6]))
f1.write(" ")
f1.write(str(table1[i][7]))
f1.write(" ")
f1.write(str(table1[i][8]))
f1.write(" ")
f1.write(str(table1[i][9]))
f1.write(" ")
f1.write(str(table1[i][10]))
f1.write(" ")
f1.write("\n")(table1[i][1])) In [12]: u.atoms.resnames
Out[12]:
array(['TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3',
'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3'],
dtype='|S4') output - Looks like the issue persists with a different data set...
|
Oh, I forgot to post the results from PSF_TRICLINIC and DCD_TRICLINIC
|
I tested it with the latest develop and get the same issue: import MDAnalysis as mda
import MDAnalysis.analysis.hbonds
from MDAnalysis.tests.datafiles import waterPSF, waterDCD
u = mda.Universe(waterPSF, waterDCD)
p1 = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u,"resname TIP3","resname TIP3",
distance=3.0, angle=120.0)
p1.run()
p1.generate_table()
print(p1.table[:3]) which shows that the "3" from
Compare to the actual atoms: In [28]: list(u.atoms[p1.table.donor_idx - 1][:3])
Out[28]:
[<Atom 9: H2 of type HT of resname TIP3, resid 8 and segid WAT>,
<Atom 11: H1 of type HT of resname TIP3, resid 15 and segid WAT>,
<Atom 15: H2 of type HT of resname TIP3, resid 21 and segid WAT>] (NOTE: For some stupid reason, we store 1-based atom indices instead of proper 0-based indices --- and the documention is p***-poor, I had to look at the source to find frame_results.append(
[h.index + 1, a.index + 1, '{0!s}{1!s}:{2!s}'.format(h.resname, repr(h.resid), h.name),
'{0!s}{1!s}:{2!s}'.format(a.resname, repr(a.resid), a.name), dist, angle]) Argh! Maybe we want to change this or at least document... need issue for that.) |
Thank you Oliver! I will keep checking for the progress. |
@samuel3181 wrote:
Yes, that's the point to start. It could be a problem in how the There's another data structure named |
Hi Oliver, |
@samuel3181 , anyone interested can contribute code. You clone the repository, make changes, and submit a pull request (PR). The PR gets reviewed, you fix anything that is flagged, and your code is merged (integrated) and you have become and author of MDAnalysis. See Contributing Code on the wiki. And you can always ask for help on the developer mailing list or in the issue tracker. |
Regarding option 3 in #807: please add a comment there; here it will be forgotten. Thanks for voicing your opinion, feedback from interested users is really important |
Sorry, haven't had time to look at this. If you find any hints on what goes wrong, just add notes here. You can also link directly to lines in the code https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/analysis/hbonds/hbond_analysis.py#L929 (for instance, this link points to HydrogenBondAnalysis.generate_table()). |
The problem is that we are storing frame_results.append(
[h.index + 1, a.index + 1, '{0!s}{1!s}:{2!s}'.format(h.resname, repr(h.resid), h.name),
'{0!s}{1!s}:{2!s}'.format(a.resname, repr(a.resid), a.name), dist, angle]) and in out[cursor] = (t, donor_idx, acceptor_idx) + parse_residue(donor) + \
parse_residue(acceptor) + (distance, angle) Unfortunately, >>> from MDAnalysis.lib.util import parse_residue
>>> atom_identifier = "{0!s}{1!s}:{2!s}".format('TIP3', repr(8), 'O')
>>> parts = parse_residue(atom_identifier)
>>> print(atom_identifier)
TIP38:O
>>> print(parts)
('TIP', 38, 'O') So, Is this something that could benefit from the new topology system #363? (@richardjgowers, @dotsdl ?) |
A workaround for the time being is to use the data in |
@orbeckst it just looks like a problem with the concatenation of the strings (which is a bizarre way to store this data?). We could add TIP3 as a special case to |
Thank you for the pointers Oliver! I was getting git installed and setting up the ssh key to fork a copy of MDAnalysis. Could i still try to take a stab at this defect concurrently as others try to work on this? Thanks so much |
Hi Sam, By all means, work on it! In order to fix this in a backward compatible way I would replace the current timeseries data structure with something simpler that only records the atom indices instead of the other detailed information (and call it something else). Then I would make timeseries a cached property that is generated in the old format when someone wants it. Similarly, the table can then also be generated from the correct underlying data when needed. Oliver Oliver Beckstein Am Apr 21, 2016 um 5:59 schrieb samuel3181 notifications@github.com:
|
Hi all,
Supposed to work? @orbeckst So far as I can tell there are no references to TIP3 in adk.psf I am going to work on the assumption that this is another error in the docs. |
Well, if PSF and DCD are the ones from the tests then no, because there’s no water in the trajectory. However, GRO and XTC should work, only the water residuname is SOL (I think). Oliver Beckstein * orbeckst@gmx.net |
…drogenBondAnalysis (issue #801) - fixes #801 - analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses donor and acceptor names and is not tripped up by resnames that end in numbers, such as TIP3 - store timeseries as _timeseries with donor and acceptor atom identifiers as tuples instead of strings (avoids the use of fragile lib.util.parse_residue()) - made timeseries a property that is generated on the fly so that old code does not break, but it is not cached (to avoid memory consumption for big trajectories) and so users should cache themselves if needed - added tests - rewrote parts of the docs and added notes on use of timeseries - added DEPRECATION warning for timeseries in 1.0 - updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in 0.16.0 but we forgot)
…drogenBondAnalysis (issue #801) - fixes #801 - analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses donor and acceptor names and is not tripped up by resnames that end in numbers, such as TIP3 - store timeseries as _timeseries with donor and acceptor atom identifiers as tuples instead of strings (avoids the use of fragile lib.util.parse_residue()) - made timeseries a property that is generated on the fly so that old code does not break, but it is not cached (to avoid memory consumption for big trajectories) and so users should cache themselves if needed - added tests - rewrote parts of the docs and added notes on use of timeseries - updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in 0.16.0 but we forgot)
…drogenBondAnalysis (issue #801) - fixes #801 - analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses donor and acceptor names and is not tripped up by resnames that end in numbers, such as TIP3 - store timeseries as _timeseries with donor and acceptor atom identifiers as tuples instead of strings (avoids the use of fragile lib.util.parse_residue()) - made timeseries a property that is generated on the fly so that old code does not break, but it is not cached (to avoid memory consumption for big trajectories) and so users should cache themselves if needed - added tests - rewrote parts of the docs and added notes on use of timeseries - updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in 0.16.0 but we forgot)
#1339) * test for correct TIP3P resname in hbond analysis table (#801) * Fixed incorrect handling of residue names with trailing numbers in HydrogenBondAnalysis (issue #801) - fixes #801 - analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses donor and acceptor names and is not tripped up by resnames that end in numbers, such as TIP3 - store timeseries as _timeseries with donor and acceptor atom identifiers as tuples instead of strings (avoids the use of fragile lib.util.parse_residue()) - made timeseries a property that is generated on the fly so that old code does not break, but it is not cached (to avoid memory consumption for big trajectories) and so users should cache themselves if needed - added tests - rewrote parts of the docs and added notes on use of timeseries - updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in 0.16.0 but we forgot) * hbond analysis doc fixes/improvements - mostly numpy style (whenever possible): Apparently, napoleon does not like a single Notes and See Also section, need to use reST. - named the 1-based indices "idx" in the docs. - added example for analysis - describe convenience analysis functions - how to use pandas * HydrogneBondAnalysis: internal clean up Use atom.index instead of atom.index+1 internally and for debug output. * Hbond analysis: fixed docs (stated wrong defaults for updating) * hbond analysis: addressed review comments
link to issue on MDAnalysis discussion groups
https://groups.google.com/d/msg/mdnalysis-discussion/JVMNkH0DLR8/LWxoH_kfJwAJ
Expected behaviour
Writing Hbond information such as donor/acceptor id as seen in PSF files
Actual behaviour
I am able to extract several details (like donor acceptor idx, angles, distances.) I do see the digit 3 from the donor or acceptor name TIP3 (for waters) get prefixed onto the folowing column (please see below). The resnames in the PDB and PSF files are "TIP3" as opposed to :TIP 3". I also scrolled down the PSF/PDB files all the way to the bottom to make sure that the column which contains "TIP3" and the adjacent column does have a space. Not sure why MDAnalysis would write that in the output. This becomes a problem because, when I am trying to extract the identities of the waters associated with my protein I get a 3 added to the actual number defining the TIP3.
Code to reproduce the behaviour
Currently version of MDAnalysis:
(run
python -c "import MDAnalysis as mda; print(mda.__version__)"
)The text was updated successfully, but these errors were encountered: