Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hbonds analysis writing the digit 3 from TIP3 onto acceptor/donor resid #801

Closed
samuel3181 opened this issue Mar 23, 2016 · 20 comments
Closed

Comments

@samuel3181
Copy link

link to issue on MDAnalysis discussion groups
https://groups.google.com/d/msg/mdnalysis-discussion/JVMNkH0DLR8/LWxoH_kfJwAJ

Expected behaviour

Writing Hbond information such as donor/acceptor id as seen in PSF files

Actual behaviour

I am able to extract several details (like donor acceptor idx, angles, distances.) I do see the digit 3 from the donor or acceptor name TIP3 (for waters) get prefixed onto the folowing column (please see below). The resnames in the PDB and PSF files are "TIP3" as opposed to :TIP 3". I also scrolled down the PSF/PDB files all the way to the bottom to make sure that the column which contains "TIP3" and the adjacent column does have a space. Not sure why MDAnalysis would write that in the output. This becomes a problem because, when I am trying to extract the identities of the waters associated with my protein I get a 3 added to the actual number defining the TIP3.

Code to reproduce the behaviour

import MDAnalysis as mda
import MDAnalysis
from MDAnalysis import *
import MDAnalysis.analysis.hbonds
import cPickle
import numpy

# u = mda.Universe(top, trj)
f1=open("../hbonds_TIP.txt",'w')


f1.write("#TIME donor_idx acceptor_idx donor_resnm donor_resid donor_atom acceptor_resnm acceptor_resid acceptor_atom distance angle")
f1.write("\n")

for x in range(1,5): #25
        dcd_name='2frames.dcd'
        u=Universe("../step3_pbcsetup.psf",dcd_name)
        p1 = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u,"protein","resname TIP3", distance=3.0, angle=120.0)
        p1.run()
        p1.generate_table()
        p1.save_table(filename="../hbond_test_tip.pickle")
        table1=cPickle.load(open("../3skx_open/hbond_test_tip.pickle"))

        for i in range(0,len(table1)):
                f1.write(str(table1[i][0])) # gives individual elements and convert to degrees
                f1.write(" ")
                f1.write(str(table1[i][1]))
                f1.write(" ")
                f1.write(str(table1[i][2]))
                f1.write(" ")
                f1.write(str(table1[i][3]))
                f1.write(" ")
                f1.write(str(table1[i][4]))
                f1.write(" ")
                f1.write(str(table1[i][5]))
                f1.write(" ")
                f1.write(str(table1[i][6]))
                f1.write(" ")
                f1.write(str(table1[i][7]))
                f1.write(" ")
                f1.write(str(table1[i][8]))
                f1.write(" ")
                f1.write(str(table1[i][9]))
                f1.write(" ")
                f1.write(str(table1[i][10]))
                f1.write(" ")
                f1.write("\n")

....

Currently version of MDAnalysis:

(run python -c "import MDAnalysis as mda; print(mda.__version__)")

@richardjgowers
Copy link
Member

So what does u.atoms.resnames give you?

@samuel3181
Copy link
Author

here it is...thank you!

In [16]: u.atoms.resnames
Out[16]: 
array(['GLN', 'GLN', 'GLN', ..., 'CLA', 'CLA', 'CLA'], 
      dtype='|S4')

In [17]: u.atoms.resnames[:500]
Out[17]: 
array(['GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN',
       'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN',
       'GLN', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA',
       'ALA', 'ALA', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU',
       'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU',
       'GLU', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
       'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
       'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ALA', 'ALA',
       'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'LYS',
       'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
       'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
       'LYS', 'LYS', 'LYS', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP',
       'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'LEU', 'LEU', 'LEU',
       'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU',
       'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'GLN', 'GLN',
       'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN',
       'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'GLN', 'ALA', 'ALA', 'ALA',
       'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'ALA', 'VAL', 'VAL',
       'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
       'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'ILE', 'ILE', 'ILE', 'ILE',
       'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE',
       'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'ASP',
       'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP',
       'ASP', 'ASP', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
       'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS',
       'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'LYS', 'THR', 'THR', 'THR',
       'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
       'THR', 'THR', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY',
       'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
       'THR', 'THR', 'THR', 'THR', 'THR', 'LEU', 'LEU', 'LEU', 'LEU',
       'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU',
       'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'LEU', 'THR', 'THR', 'THR',
       'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
       'THR', 'THR', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU',
       'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLU', 'GLY',
       'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'ARG', 'ARG', 'ARG',
       'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
       'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG', 'ARG',
       'ARG', 'ARG', 'ARG', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'PHE', 'GLY', 'GLY', 'GLY', 'GLY',
       'GLY', 'GLY', 'GLY', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
       'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
       'VAL', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'THR',
       'THR', 'THR', 'THR', 'THR', 'THR', 'THR', 'ASP', 'ASP', 'ASP',
       'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP', 'ASP',
       'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE',
       'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE', 'ILE',
       'ILE', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL',
       'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'VAL', 'GLY',
       'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'GLY', 'PHE', 'PHE', 'PHE',
       'PHE', 'PHE', 'PHE', 'PHE', 'PHE'], 
      dtype='|S4')

In [18]: u.atoms.resnames[:5000]
Out[18]: 
array(['GLN', 'GLN', 'GLN', ..., 'TIP3', 'TIP3', 'TIP3'], 
      dtype='|S4')

@orbeckst
Copy link
Member

Another important piece of information from the email exchange (and @samuel3181 said the results for 0.14.0 are identical to the one below for 0.8.1):

u=Universe(PSF,DCD)

>>> print(MDAnalysis.__version__)
0.8.1



>>> print(water.residues[:10])
<ResidueGroup [<Residue 'TIP3', 1>, <Residue 'TIP3', 3>, <Residue 'TIP3', 5>, <Residue 'TIP3', 7>, <Residue 'TIP3', 9>, <Residue 'TIP3', 11>, <Residue 'TIP3', 13>, <Residue 'TIP3', 15>, <Residue 'TIP3', 17>, <Residue 'TIP3', 19>]>

Lines from the PSF file:

      4021 WATA     1        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4022 WATA     1        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4023 WATA     1        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4024 WATA     3        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4025 WATA     3        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4026 WATA     3        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4027 WATA     5        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4028 WATA     5        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4029 WATA     5        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4030 WATA     7        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4031 WATA     7        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02

      4804 SOLV     1        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4805 SOLV     1        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4806 SOLV     1        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4807 SOLV     2        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4808 SOLV     2        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4809 SOLV     2        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4810 SOLV     3        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4811 SOLV     3        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4812 SOLV     3        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4813 SOLV     4        TIP3     OH2         3  -0.834000       15.9994           0   0.00000     -0.301140E-02
      4814 SOLV     4        TIP3     H1          1   0.417000       1.00800           0   0.00000     -0.301140E-02
      4815 SOLV     4        TIP3     H2          1   0.417000       1.00800           0   0.00000     -0.301140E-02

@orbeckst
Copy link
Member

@samuel3181 , can you try to reproduce the problem with some of the MDAnalysis test data files that also contain TIP3P, in particular, try

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import waterPSF, waterDCD

because waterPSF contains lines such as

      15 !NATOM
       1 WAT  5    TIP3 OH2  OT    -0.834000       15.9994           0
       2 WAT  5    TIP3 H1   HT     0.417000        1.0080           0
       3 WAT  5    TIP3 H2   HT     0.417000        1.0080           0
       4 WAT  7    TIP3 OH2  OT    -0.834000       15.9994           0
       5 WAT  7    TIP3 H1   HT     0.417000        1.0080           0
       6 WAT  7    TIP3 H2   HT     0.417000        1.0080           0
       7 WAT  8    TIP3 OH2  OT    -0.834000       15.9994           0

Alternatively, try PSF_TRICLINIC and DCD_TRICLINIC (they contain 125 waters and should show some H-bonding).

Please post the python code that you are using to reproduce the problem, something along the lines of

import MDAnalysis as mda
import MDAnalysis.analysis.hbonds
from MDAnalysis.tests.datafiles import waterPSF, waterDCD

u = mda.Universe(waterPSF, waterDCD)
# ...

If you can reproduce your problem with some of the data files then we can easily fix it.

@samuel3181
Copy link
Author

Hi Oliver,
I tried the first option and this is what I am seeing...

import MDAnalysis as mda
from MDAnalysis import *
import MDAnalysis.analysis.hbonds
import cPickle
import numpy

f1=open("../hbonds/MDAnalysis_tests_hbonds_TIP.txt",'w')


f1.write("#TIME donor_idx acceptor_idx donor_resnm donor_resid donor_atom acceptor_resnm acceptor_resid acceptor_atom distance angle")
f1.write("\n")

for x in range(1,5): #25
        from MDAnalysis.tests.datafiles import waterPSF, waterDCD
        u=Universe(waterPSF,waterDCD)
        p1 = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u,"resname TIP3","resname TIP3", distance=3.0, angle=120.0)
        p1.run()
        p1.generate_table()
        p1.save_table(filename="../hbonds/hbond_test_tip.pickle")
        table1=cPickle.load(open("../hbonds/hbond_test_tip.pickle"))

        for i in range(0,len(table1)):
                f1.write(str(table1[i][0])) # gives individual elements and convert to degrees
                f1.write(" ")
                f1.write(strf1.write(" ")
                f1.write(str(table1[i][2]))
                f1.write(" ")
                f1.write(str(table1[i][3]))
                f1.write(" ")
                f1.write(str(table1[i][4]))
                f1.write(" ")
                f1.write(str(table1[i][5]))
                f1.write(" ")
                f1.write(str(table1[i][6]))
                f1.write(" ")
                f1.write(str(table1[i][7]))
                f1.write(" ")
                f1.write(str(table1[i][8]))
                f1.write(" ")
                f1.write(str(table1[i][9]))
                f1.write(" ")
                f1.write(str(table1[i][10]))
                f1.write(" ")
                f1.write("\n")(table1[i][1]))
In [12]: u.atoms.resnames
Out[12]: 
array(['TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3',
       'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3', 'TIP3'], 
      dtype='|S4')

output - Looks like the issue persists with a different data set...


#TIME donor_idx acceptor_idx donor_resnm donor_resid donor_atom acceptor_resnm acceptor_resid acceptor_atom distance angle
0.0 9 13 TIP 38 H2 TIP 321 OH2 2.07548332214 138.471969604
0.0 11 4 TIP 315 H1 TIP 37 OH2 1.95483160019 150.489547729
0.0 15 10 TIP 321 H2 TIP 315 OH2 1.81008863449 157.037582397
0.0200000000596 9 13 TIP 38 H2 TIP 321 OH2 2.16670084 165.639282227
0.0200000000596 11 4 TIP 315 H1 TIP 37 OH2 1.95273351669 145.66973877
0.0200000000596 15 10 TIP 321 H2 TIP 315 OH2 1.73379433155 168.516952515
0.0400000001193 9 13 TIP 38 H2 TIP 321 OH2 2.48292899132 139.893661499
0.0400000001193 11 4 TIP 315 H1 TIP 37 OH2 1.78535282612 165.448654175
0.0400000001193 15 10 TIP 321 H2 TIP 315 OH2 1.807538867 161.226196289
0.0600000001789 9 13 TIP 38 H2 TIP 321 OH2 2.57859063148 152.799087524
0.0600000001789 11 4 TIP 315 H1 TIP 37 OH2 1.74403131008 174.421768188
0.0600000001789 15 10 TIP 321 H2 TIP 315 OH2 1.86083495617 162.850708008
0.0800000002385 9 13 TIP 38 H2 TIP 321 OH2 2.66911864281 158.354324341
0.0800000002385 11 4 TIP 315 H1 TIP 37 OH2 1.83738911152 150.355743408
0.0800000002385 15 10 TIP 321 H2 TIP 315 OH2 1.8890235424 160.65083313
0.100000000298 9 13 TIP 38 H2 TIP 321 OH2 2.74517846107 138.393447876
0.100000000298 11 4 TIP 315 H1 TIP 37 OH2 1.87105560303 150.649490356
0.100000000298 15 10 TIP 321 H2 TIP 315 OH2 1.86695420742 159.56439209
0.120000000358 9 13 TIP 38 H2 TIP 321 OH2 2.79278540611 121.193733215
0.120000000358 11 4 TIP 315 H1 TIP 37 OH2 1.8583688736 167.711288452
0.120000000358 15 10 TIP 321 H2 TIP 315 OH2 1.80249166489 174.608947754
0.140000000417 11 4 TIP 315 H1 TIP 37 OH2 1.8793040514 166.983398438
0.140000000417 15 10 TIP 321 H2 TIP 315 OH2 1.81333422661 159.858963013
0.160000000477 8 13 TIP 38 H1 TIP 321 OH2 2.36031007767 122.898033142
0.160000000477 11 4 TIP 315 H1 TIP 37 OH2 1.85728180408 165.319534302
0.160000000477 15 10 TIP 321 H2 TIP 315 OH2 1.82592153549 155.935836792
0.180000000537 8 13 TIP 38 H1 TIP 321 OH2 1.9011015892 154.341995239
0.180000000537 11 4 TIP 315 H1 TIP 37 OH2 1.84500610828 151.789459229
0.180000000537 15 10 TIP 321 H2 TIP 315 OH2 1.83791613579 157.846481323

@samuel3181
Copy link
Author

Oh, I forgot to post the results from PSF_TRICLINIC and DCD_TRICLINIC
Here it is ,,,,they still have the same aliasing effect....

#TIME donor_idx acceptor_idx donor_resnm donor_resid donor_atom acceptor_resnm acceptor_resid acceptor_atom distance angle
0.0 2 169 TIP 31 H1 TIP 357 OH2 2.1812813282 122.447937012
0.0 3 142 TIP 31 H2 TIP 348 OH2 2.34604525566 142.771743774
0.0 3 328 TIP 31 H2 TIP 3110 OH2 2.74227809906 127.493614197
0.0 5 97 TIP 32 H1 TIP 333 OH2 1.67874491215 169.701599121
0.0 6 10 TIP 32 H2 TIP 34 OH2 2.2600133419 128.895172119
0.0 6 16 TIP 32 H2 TIP 36 OH2 2.36733198166 139.175811768
0.0 8 322 TIP 33 H1 TIP 3108 OH2 2.29818582535 132.527496338
0.0 9 244 TIP 33 H2 TIP 382 OH2 2.12786602974 150.010223389
0.0 11 13 TIP 34 H1 TIP 35 OH2 2.96735048294 148.360931396
0.0 12 97 TIP 34 H2 TIP 333 OH2 2.02071690559 167.535858154
0.0 14 301 TIP 35 H1 TIP 3101 OH2 1.96005547047 164.00440979
0.0 14 316 TIP 35 H1 TIP 3106 OH2 2.77286911011 127.391159058
0.0 15 130 TIP 35 H2 TIP 344 OH2 2.01135373116 137.454803467
0.0 17 10 TIP 36 H1 TIP 34 OH2 1.94057536125 153.257217407
0.0 18 112 TIP 36 H2 TIP 338 OH2 1.74056231976 173.660324097
0.0 20 169 TIP 37 H1 TIP 357 OH2 1.9749622345 152.005126953
0.0 21 94 TIP 37 H2 TIP 332 OH2 1.90088307858 156.38482666
0.0 23 64 TIP 38 H1 TIP 322 OH2 1.91663682461 154.783218384
0.0 24 25 TIP 38 H2 TIP 39 OH2 1.76101863384 163.711090088
0.0 27 121 TIP 39 H2 TIP 341 OH2 1.81751167774 165.689575195
0.0 29 64 TIP 310 H1 TIP 322 OH2 2.99959397316 129.776947021
0.0 32 250 TIP 311 H1 TIP 384 OH2 1.96491932869 161.571655273
0.0 33 40 TIP 311 H2 TIP 314 OH2 2.04032707214 162.863861084
0.0 35 178 TIP 312 H1 TIP 360 OH2 1.90591323376 146.325469971
0.0 41 223 TIP 314 H1 TIP 375 OH2 1.92644774914 166.94241333
0.0 42 175 TIP 314 H2 TIP 359 OH2 2.09352254868 170.348937988
0.0 44 85 TIP 315 H1 TIP 329 OH2 1.97960209846 165.118484497
0.0 45 52 TIP 315 H2 TIP 318 OH2 1.79222106934 162.048934937
0.0 47 40 TIP 316 H1 TIP 314 OH2 1.89716815948 171.168731689

@orbeckst
Copy link
Member

I tested it with the latest develop and get the same issue:

import MDAnalysis as mda
import MDAnalysis.analysis.hbonds
from MDAnalysis.tests.datafiles import waterPSF, waterDCD

u = mda.Universe(waterPSF, waterDCD)
p1 = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u,"resname TIP3","resname TIP3", 
        distance=3.0, angle=120.0)
p1.run()
p1.generate_table()
print(p1.table[:3])

which shows that the "3" from TIP3 was transferred to the resid in the table output:

rec.array([ (0.0, 9, 13, u'TIP', 38, u'H2', u'TIP', 321, u'OH2', 2.0754833221435547, 138.4719696044922),
 (0.0, 11, 4, u'TIP', 315, u'H1', u'TIP', 37, u'OH2', 1.954831600189209, 150.4895477294922),
 (0.0, 15, 10, u'TIP', 321, u'H2', u'TIP', 315, u'OH2', 1.8100886344909668, 157.03758239746094)],
          dtype=[('time', '<f8'), ('donor_idx', '<i8'), ('acceptor_idx', '<i8'), ('donor_resnm', '<U4'), ('donor_resid', '<i8'), ('donor_atom', '<U4'), ('acceptor_resnm', '<U4'), ('acceptor_resid', '<i8'), ('acceptor_atom', '<U4'), ('distance', '<f8'), ('angle', '<f8')])

Compare to the actual atoms:

In [28]: list(u.atoms[p1.table.donor_idx - 1][:3])
Out[28]:
[<Atom 9: H2 of type HT of resname TIP3, resid 8 and segid WAT>,
 <Atom 11: H1 of type HT of resname TIP3, resid 15 and segid WAT>,
 <Atom 15: H2 of type HT of resname TIP3, resid 21 and segid WAT>]

(NOTE: For some stupid reason, we store 1-based atom indices instead of proper 0-based indices --- and the documention is p***-poor, I had to look at the source to find

frame_results.append(
                                    [h.index + 1, a.index + 1, '{0!s}{1!s}:{2!s}'.format(h.resname, repr(h.resid), h.name),
                                        '{0!s}{1!s}:{2!s}'.format(a.resname, repr(a.resid), a.name), dist, angle])

Argh! Maybe we want to change this or at least document... need issue for that.)

@samuel3181
Copy link
Author

Thank you Oliver! I will keep checking for the progress.

@orbeckst
Copy link
Member

@samuel3181 wrote:

I would very much to dig into the code myself. So should I look in the hbond_analysis.py file?

Yes, that's the point to start. It could be a problem in how the generate_table() method parses residues into fields.

There's another data structure named timeseries which gives a big, messy dictionary. Have a look at it and see if this also contains wrong resids.

@samuel3181
Copy link
Author

Hi Oliver,
I have been keeping track of the discussion in the next thread about the issue. I think option 3 makes sense. is it incumbent upon me to make these changes in time for the next release? Or only developers are allowed to make changes to source code..
best wishes

@orbeckst
Copy link
Member

orbeckst commented Apr 5, 2016

@samuel3181 , anyone interested can contribute code. You clone the repository, make changes, and submit a pull request (PR). The PR gets reviewed, you fix anything that is flagged, and your code is merged (integrated) and you have become and author of MDAnalysis. See Contributing Code on the wiki. And you can always ask for help on the developer mailing list or in the issue tracker.

@orbeckst
Copy link
Member

orbeckst commented Apr 5, 2016

Regarding option 3 in #807: please add a comment there; here it will be forgotten. Thanks for voicing your opinion, feedback from interested users is really important

@orbeckst
Copy link
Member

Sorry, haven't had time to look at this. If you find any hints on what goes wrong, just add notes here.

You can also link directly to lines in the code https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/analysis/hbonds/hbond_analysis.py#L929 (for instance, this link points to HydrogenBondAnalysis.generate_table()).

@orbeckst
Copy link
Member

orbeckst commented Apr 21, 2016

The problem is that we are storing residue atom information as a string (see line 887)

frame_results.append(
          [h.index + 1, a.index + 1, '{0!s}{1!s}:{2!s}'.format(h.resname, repr(h.resid), h.name),
          '{0!s}{1!s}:{2!s}'.format(a.resname, repr(a.resid), a.name), dist, angle])

and in generate_tables() we then try to reverse-parse the string for residue information (line 970) using parse_residue():

 out[cursor] = (t, donor_idx, acceptor_idx) + parse_residue(donor) + \
                       parse_residue(acceptor) + (distance, angle)

Unfortunately, parse_residue() fails if the last character in the residue name is a number such as "TIP3":

>>> from MDAnalysis.lib.util import parse_residue
>>> atom_identifier = "{0!s}{1!s}:{2!s}".format('TIP3', repr(8), 'O')
>>> parts = parse_residue(atom_identifier)
>>> print(atom_identifier)
TIP38:O
>>> print(parts)
('TIP', 38, 'O')

So, parse_residue() is not sufficiently robust. We should really be storing the data in full resname/resid detail (or even as actual Atom instances)?

Is this something that could benefit from the new topology system #363? (@richardjgowers, @dotsdl ?)

@orbeckst
Copy link
Member

A workaround for the time being is to use the data in HydrogenBondAnalysis.timeseries (the format is documented in the docs).

@richardjgowers
Copy link
Member

@orbeckst it just looks like a problem with the concatenation of the strings (which is a bizarre way to store this data?). We could add TIP3 as a special case to parse_residue.. but really the analysis module shouldn't squash the data and then try and unsquash it.

@samuel3181
Copy link
Author

Thank you for the pointers Oliver! I was getting git installed and setting up the ssh key to fork a copy of MDAnalysis. Could i still try to take a stab at this defect concurrently as others try to work on this? Thanks so much

@orbeckst
Copy link
Member

Hi Sam,

By all means, work on it!

In order to fix this in a backward compatible way I would replace the current timeseries data structure with something simpler that only records the atom indices instead of the other detailed information (and call it something else). Then I would make timeseries a cached property that is generated in the old format when someone wants it. Similarly, the table can then also be generated from the correct underlying data when needed.

Oliver

Oliver Beckstein
email: orbeckst@gmail.com

Am Apr 21, 2016 um 5:59 schrieb samuel3181 notifications@github.com:

Thank you for the pointers Oliver! I was getting git installed and setting up the ssh key to fork a copy of MDAnalysis. Could i still try to take a stab at this defect concurrently as others try to work on this? Thanks so much


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@jdetle
Copy link
Contributor

jdetle commented May 3, 2016

Hi all,
Is

u = MDA.Universe(PSF,DCD)
h = hb.HydrogenBondAnalysis(u,'protein','resname TIP3', distance=3.0, angle=120.0)
h.run()

Supposed to work? @orbeckst

So far as I can tell there are no references to TIP3 in adk.psf

I am going to work on the assumption that this is another error in the docs.

@orbeckst
Copy link
Member

orbeckst commented May 3, 2016

On 2 May, 2016, at 19:56, John Detlefs notifications@github.com wrote:

Hi all,
Is

h = hb.HydrogenBondAnalysis(u,'protein','resname TIP3', distance=3.0, angle=120.0)
h.run()

Supposed to work? 

Well, if PSF and DCD are the ones from the tests then no, because there’s no water in the trajectory.

However, GRO and XTC should work, only the water residuname is SOL (I think).

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

orbeckst added a commit that referenced this issue May 10, 2017
…drogenBondAnalysis (issue #801)

- fixes #801
- analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses
  donor and acceptor names and is not tripped up by resnames that end in numbers, such
  as TIP3
- store timeseries as _timeseries with donor and acceptor atom identifiers as
  tuples instead of strings (avoids the use of fragile lib.util.parse_residue())
- made timeseries a property that is generated on the fly so that old code does not break, but
  it is not cached (to avoid memory consumption for big trajectories) and so users should cache
  themselves if needed
- added tests
- rewrote parts of the docs and added notes on use of timeseries
- added DEPRECATION warning for timeseries in 1.0
- updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in
  0.16.0 but we forgot)
orbeckst added a commit that referenced this issue May 11, 2017
…drogenBondAnalysis (issue #801)

- fixes #801
- analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses
  donor and acceptor names and is not tripped up by resnames that end in numbers, such
  as TIP3
- store timeseries as _timeseries with donor and acceptor atom identifiers as
  tuples instead of strings (avoids the use of fragile lib.util.parse_residue())
- made timeseries a property that is generated on the fly so that old code does not break, but
  it is not cached (to avoid memory consumption for big trajectories) and so users should cache
  themselves if needed
- added tests
- rewrote parts of the docs and added notes on use of timeseries
- updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in
  0.16.0 but we forgot)
orbeckst added a commit that referenced this issue May 12, 2017
…drogenBondAnalysis (issue #801)

- fixes #801
- analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses
  donor and acceptor names and is not tripped up by resnames that end in numbers, such
  as TIP3
- store timeseries as _timeseries with donor and acceptor atom identifiers as
  tuples instead of strings (avoids the use of fragile lib.util.parse_residue())
- made timeseries a property that is generated on the fly so that old code does not break, but
  it is not cached (to avoid memory consumption for big trajectories) and so users should cache
  themselves if needed
- added tests
- rewrote parts of the docs and added notes on use of timeseries
- updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in
  0.16.0 but we forgot)
kain88-de pushed a commit that referenced this issue May 14, 2017
#1339)

* test for correct TIP3P resname in hbond analysis table (#801)

* Fixed incorrect handling of residue names with trailing numbers in HydrogenBondAnalysis (issue #801)

- fixes #801
- analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses
  donor and acceptor names and is not tripped up by resnames that end in numbers, such
  as TIP3
- store timeseries as _timeseries with donor and acceptor atom identifiers as
  tuples instead of strings (avoids the use of fragile lib.util.parse_residue())
- made timeseries a property that is generated on the fly so that old code does not break, but
  it is not cached (to avoid memory consumption for big trajectories) and so users should cache
  themselves if needed
- added tests
- rewrote parts of the docs and added notes on use of timeseries
- updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in
  0.16.0 but we forgot)

* hbond analysis doc fixes/improvements

- mostly numpy style (whenever possible):
  Apparently, napoleon does not like a single Notes and See Also section, need to
  use reST.
- named the 1-based indices "idx" in the docs.
- added example for analysis
  - describe convenience analysis functions
  - how to use pandas

* HydrogneBondAnalysis: internal clean up

Use atom.index instead of atom.index+1 internally and for debug output.

* Hbond analysis: fixed docs (stated wrong defaults for updating)

* hbond analysis: addressed review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants