Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hbond analysis: fixed incorrect residue handling with trailing numbers #1339

Merged
merged 6 commits into from
May 14, 2017

Conversation

orbeckst
Copy link
Member

Fixes #801

Changes made in this Pull Request:

  • HydrogenBondAnalysis now correctly parses residue names with trailing numbers in the normalized structured array table
  • changed data format of result timeseries and store as new attribute _timeseries
  • created a managed attribute timeseries that reproduces the previous behavior (but might come at speed penalty because it is not cached on purpose)
  • updated deprecation warnings for 1-based indices: now to be removed in 0.17.0 (because we forgot to remove in 0.16.0)

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@orbeckst
Copy link
Member Author

orbeckst commented May 10, 2017

Will fix the doc build error https://travis-ci.org/MDAnalysis/mdanalysis/jobs/230575555

EDIT: no idea why napoleon barffs on a lone Note heading; it also does not like a lone See Also – maybe because it is a property annotation??

Anyway, need to change back to reST.

@orbeckst
Copy link
Member Author

Only the numpy dev test fails. I'll rebase the whole thing later and then ask for reviews.

@orbeckst
Copy link
Member Author

Could someone please review? The travis failure is due to the stuff discussed in #1334 and is harmless.


.. deprecated:: 0.15.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you somewhere add a developer note how to remove these things. I must have missed it looking for deprecations to remove before the last release

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added the following

        # REMOVAL (DEPRECATION) instructions for 0.17.0: "1-based indices" (aka "idx" vs 0-based "index")
        # - remove the warning
        # - replace 'deprecated 0.15.0' with 'versionchanged 0.17.0' (and adjust text)
        # - update docs for "timeseries" by removing all "idx (1-based)" entries:
        #   one hydrogen bond should now look like
        #      <donor index (0-based)>, <acceptor index (0-based)>, <donor string>, <acceptor string>,
        #      <distance>, <angle>
        # - update docs for "table": removed *_idx and renumber
        # - in run(): removed the 1-based indices (h.index + 1, a.index + 1) from frame_results:
        #   for instance, should now read (two occurences!)
        #        frame_results.append(
        #                            [h.index, a.index,
        #                             (h.resname, h.resid, h.name),
        #                             (a.resname, a.resid, a.name),
        #                              dist, angle])
        # - generate_table(), count_*(), timesteps_by_type():
        #   remove any donor_idx, acceptor_idx variables and fields.
        # - update tests (change indices in analysis/test_hbonds.py)
        # - update any docs in this file that mention "1-based"
        #
        # I suggest to keep this removal a separate commit. [@orbeckst]

*selection2* handles donors and acceptors: If *selection1* contains
'both' then *selection2* will also contain *both*. If *selection1*
is set to 'donor' then *selection2* is 'acceptor' (and vice versa).
value for `selection1_type` automatically determines how
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need double backticks. The single ones will result in a itallic formatting. We have so far always opted for a monospace-coding formatting with double backticks for variable names

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed this yet because this is how numpy docs are written (IIRC) and it is consistent with the Napoleon formatting.

and they are targeted for removal in 0.17.0.

"""
return [[self._reformat_hb(hb) for hb in hframe] for hframe in self._timeseries]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is here the fix for the problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real fix is

                                frame_results.append(
                                    [h.index + 1, a.index + 1, h.index, a.index,
                                    (h.resname, h.resid, h.name),
                                    (a.resname, a.resid, a.name),
                                    dist, angle])

where we store atom information as a tuple (h.resname, h.resid, h.name) and (a.resname, a.resid, a.name) instead of a mangled name. These unambiguous data are then stored as _timeseries. The timeseries is generated to look like the previous output. Importantly, the other methods (generate_table, count_*, timeseries_by_type) now all use the unambiguous data to generate their own data structures. Using unambiguous time series data fixes the issue.

The _reformat_hb() is just a way to return the same timeseries as in previous versions.

Do you think that this needs to be commented?

(In principle we could just store atom indices

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to comment this. But a hint about this for a review with lots of docs changes would be nice next time.

@@ -1083,47 +1228,52 @@ def save_table(self, filename="hbond_table.pickle"):
self.generate_table()
cPickle.dump(self.table, open(filename, 'wb'), protocol=cPickle.HIGHEST_PROTOCOL)

def _has_timeseries(self):
has_timeseries = (self._timeseries is not None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no braces needed

Copy link
Member Author

@orbeckst orbeckst May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like them for readability but I am happy to remove them – if you didn't care you wouldn't have taken the time to comment.

orbeckst added 6 commits May 12, 2017 14:36
…drogenBondAnalysis (issue #801)

- fixes #801
- analysis.hbond.hbond_analysis.HydrogenBondAnalysis now correctly stores and parses
  donor and acceptor names and is not tripped up by resnames that end in numbers, such
  as TIP3
- store timeseries as _timeseries with donor and acceptor atom identifiers as
  tuples instead of strings (avoids the use of fragile lib.util.parse_residue())
- made timeseries a property that is generated on the fly so that old code does not break, but
  it is not cached (to avoid memory consumption for big trajectories) and so users should cache
  themselves if needed
- added tests
- rewrote parts of the docs and added notes on use of timeseries
- updated DEPRECATION warning for 1-based indices to 0.17.0 (should have been removed in
  0.16.0 but we forgot)
- mostly numpy style (whenever possible):
  Apparently, napoleon does not like a single Notes and See Also section, need to
  use reST.
- named the 1-based indices "idx" in the docs.
- added example for analysis
  - describe convenience analysis functions
  - how to use pandas
Use atom.index instead of atom.index+1 internally and for debug output.
@orbeckst
Copy link
Member Author

Feel free to squash.

@kain88-de kain88-de merged commit 1f9a31f into develop May 14, 2017
@kain88-de kain88-de deleted the hbond-fixes branch May 14, 2017 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants