hbond analysis: fixed incorrect residue handling with trailing numbers #1339

orbeckst · 2017-05-10T00:24:24Z

Fixes #801

Changes made in this Pull Request:

HydrogenBondAnalysis now correctly parses residue names with trailing numbers in the normalized structured array table
changed data format of result timeseries and store as new attribute _timeseries
created a managed attribute timeseries that reproduces the previous behavior (but might come at speed penalty because it is not cached on purpose)
updated deprecation warnings for 1-based indices: now to be removed in 0.17.0 (because we forgot to remove in 0.16.0)

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

orbeckst · 2017-05-10T18:22:40Z

Will fix the doc build error https://travis-ci.org/MDAnalysis/mdanalysis/jobs/230575555

EDIT: no idea why napoleon barffs on a lone Note heading; it also does not like a lone See Also – maybe because it is a property annotation??

Anyway, need to change back to reST.

orbeckst · 2017-05-10T21:38:14Z

Only the numpy dev test fails. I'll rebase the whole thing later and then ask for reviews.

orbeckst · 2017-05-11T20:07:20Z

Could someone please review? The travis failure is due to the stuff discussed in #1334 and is harmless.

kain88-de · 2017-05-12T07:16:14Z

package/MDAnalysis/analysis/hbonds/hbond_analysis.py


+      .. deprecated:: 0.15.0


Can you somewhere add a developer note how to remove these things. I must have missed it looking for deprecations to remove before the last release

Ok, added the following

# REMOVAL (DEPRECATION) instructions for 0.17.0: "1-based indices" (aka "idx" vs 0-based "index") # - remove the warning # - replace 'deprecated 0.15.0' with 'versionchanged 0.17.0' (and adjust text) # - update docs for "timeseries" by removing all "idx (1-based)" entries: # one hydrogen bond should now look like # <donor index (0-based)>, <acceptor index (0-based)>, <donor string>, <acceptor string>, # <distance>, <angle> # - update docs for "table": removed *_idx and renumber # - in run(): removed the 1-based indices (h.index + 1, a.index + 1) from frame_results: # for instance, should now read (two occurences!) # frame_results.append( # [h.index, a.index, # (h.resname, h.resid, h.name), # (a.resname, a.resid, a.name), # dist, angle]) # - generate_table(), count_*(), timesteps_by_type(): # remove any donor_idx, acceptor_idx variables and fields. # - update tests (change indices in analysis/test_hbonds.py) # - update any docs in this file that mention "1-based" # # I suggest to keep this removal a separate commit. [@orbeckst]

kain88-de · 2017-05-12T07:17:27Z

package/MDAnalysis/analysis/hbonds/hbond_analysis.py

-            *selection2* handles donors and acceptors: If *selection1* contains
-            'both' then *selection2* will also contain *both*. If *selection1*
-            is set to 'donor' then *selection2* is 'acceptor' (and vice versa).
+            value for `selection1_type` automatically determines how


you need double backticks. The single ones will result in a itallic formatting. We have so far always opted for a monospace-coding formatting with double backticks for variable names

I haven't changed this yet because this is how numpy docs are written (IIRC) and it is consistent with the Napoleon formatting.

kain88-de · 2017-05-12T07:19:20Z

package/MDAnalysis/analysis/hbonds/hbond_analysis.py

+           and they are targeted for removal in 0.17.0.
+
+        """
+        return [[self._reformat_hb(hb) for hb in hframe] for hframe in self._timeseries]


Is here the fix for the problem?

The real fix is

frame_results.append( [h.index + 1, a.index + 1, h.index, a.index, (h.resname, h.resid, h.name), (a.resname, a.resid, a.name), dist, angle])

where we store atom information as a tuple (h.resname, h.resid, h.name) and (a.resname, a.resid, a.name) instead of a mangled name. These unambiguous data are then stored as _timeseries. The timeseries is generated to look like the previous output. Importantly, the other methods (generate_table, count_*, timeseries_by_type) now all use the unambiguous data to generate their own data structures. Using unambiguous time series data fixes the issue.

The _reformat_hb() is just a way to return the same timeseries as in previous versions.

Do you think that this needs to be commented?

(In principle we could just store atom indices

No need to comment this. But a hint about this for a review with lots of docs changes would be nice next time.

kain88-de · 2017-05-12T07:19:57Z

package/MDAnalysis/analysis/hbonds/hbond_analysis.py

@@ -1083,47 +1228,52 @@ def save_table(self, filename="hbond_table.pickle"):
            self.generate_table()
        cPickle.dump(self.table, open(filename, 'wb'), protocol=cPickle.HIGHEST_PROTOCOL)

+    def _has_timeseries(self):
+        has_timeseries = (self._timeseries is not None)


no braces needed

I like them for readability but I am happy to remove them – if you didn't care you wouldn't have taken the time to comment.

…drogenBondAnalysis (issue #801) - fixes #801 - analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses donor and acceptor names and is not tripped up by resnames that end in numbers, such as TIP3 - store timeseries as _timeseries with donor and acceptor atom identifiers as tuples instead of strings (avoids the use of fragile lib.util.parse_residue()) - made timeseries a property that is generated on the fly so that old code does not break, but it is not cached (to avoid memory consumption for big trajectories) and so users should cache themselves if needed - added tests - rewrote parts of the docs and added notes on use of timeseries - updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in 0.16.0 but we forgot)

- mostly numpy style (whenever possible): Apparently, napoleon does not like a single Notes and See Also section, need to use reST. - named the 1-based indices "idx" in the docs. - added example for analysis - describe convenience analysis functions - how to use pandas

Use atom.index instead of atom.index+1 internally and for debug output.

orbeckst · 2017-05-12T23:52:38Z

Feel free to squash.

orbeckst added the Component-Analysis label May 10, 2017

orbeckst added this to the 0.16.x milestone May 10, 2017

orbeckst added the needs review label May 10, 2017

orbeckst force-pushed the hbond-fixes branch from 28dc1dd to 46e40c1 Compare May 11, 2017 00:32

orbeckst requested review from kain88-de and jbarnoud May 11, 2017 20:06

kain88-de reviewed May 12, 2017

View reviewed changes

orbeckst added 6 commits May 12, 2017 14:36

test for correct TIP3P resname in hbond analysis table (#801)

c84cb4b

HydrogneBondAnalysis: internal clean up

2cf191a

Use atom.index instead of atom.index+1 internally and for debug output.

Hbond analysis: fixed docs (stated wrong defaults for updating)

245560e

hbond analysis: addressed review comments

ba47e1f

orbeckst force-pushed the hbond-fixes branch from 46e40c1 to ba47e1f Compare May 12, 2017 22:58

kain88-de approved these changes May 14, 2017

View reviewed changes

kain88-de merged commit 1f9a31f into develop May 14, 2017

kain88-de deleted the hbond-fixes branch May 14, 2017 09:37

IAlibay mentioned this pull request Feb 2, 2020

Deprecation of _reformat_hb in hbonds/hbond_analysis.py for v1.0 #2492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hbond analysis: fixed incorrect residue handling with trailing numbers #1339

hbond analysis: fixed incorrect residue handling with trailing numbers #1339

orbeckst commented May 10, 2017

orbeckst commented May 10, 2017 •

edited

Loading

orbeckst commented May 10, 2017

orbeckst commented May 11, 2017

kain88-de May 12, 2017

orbeckst May 12, 2017

kain88-de May 12, 2017

orbeckst May 12, 2017

kain88-de May 12, 2017

orbeckst May 12, 2017

kain88-de May 14, 2017

kain88-de May 12, 2017

orbeckst May 12, 2017 •

edited

Loading

orbeckst commented May 12, 2017

hbond analysis: fixed incorrect residue handling with trailing numbers #1339

hbond analysis: fixed incorrect residue handling with trailing numbers #1339

Conversation

orbeckst commented May 10, 2017

PR Checklist

orbeckst commented May 10, 2017 • edited Loading

orbeckst commented May 10, 2017

orbeckst commented May 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst May 12, 2017 • edited Loading

Choose a reason for hiding this comment

orbeckst commented May 12, 2017

orbeckst commented May 10, 2017 •

edited

Loading

orbeckst May 12, 2017 •

edited

Loading