Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace distance search with capped_distance in water bridge analysis #2480

Merged
merged 6 commits into from
Feb 9, 2020

Conversation

xiki-tempula
Copy link
Contributor

@xiki-tempula xiki-tempula commented Jan 29, 2020

Changes made in this Pull Request:
Changes the distance calculation to capped_distance in water bridge analysis.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@xiki-tempula xiki-tempula mentioned this pull request Jan 29, 2020
6 tasks
@codecov
Copy link

codecov bot commented Jan 29, 2020

Codecov Report

❗ No coverage uploaded for pull request base (develop@5f4915c). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #2480   +/-   ##
==========================================
  Coverage           ?   90.28%           
==========================================
  Files              ?      169           
  Lines              ?    23200           
  Branches           ?     3002           
==========================================
  Hits               ?    20945           
  Misses             ?     1654           
  Partials           ?      601

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f4915c...8ec5951. Read the comment docs.

@xiki-tempula
Copy link
Contributor Author

I will add some test to bump the coverage up

@xiki-tempula
Copy link
Contributor Author

@richardjgowers Hi Richard, I have bumped the coverage. I wonder if your mind does a code review? If you felt it is ok. I will modify the changelog.

@xiki-tempula xiki-tempula changed the title Replace distance search with capped_distance Replace distance search with capped_distance in water bridge analysis Jan 31, 2020
)
)
hbond_indices = np.where(angles > self.angle)[0]
for index in hbond_indices:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of making a nice numpy array then iterating over it (rather than slicing efficiently) is a bit ugly. Ideally you could make some sort of numpy structured array and slice into it. Everything you're loading into the result array can be obtained through slicing their origin array with hbond_indices, so you could grab the data with n columns operations rather than nrows.

But maybe that's beyond the scope of this PR, this will work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I will make it into a PR after this one is merged.

Copy link
Contributor Author

@xiki-tempula xiki-tempula Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After given it some thought, I'm not quite sure of the representation that you are talking about.
I will start with a list of indices (hydrogen bond donor heavy atom, hydrogen bond donor hydrogen, hydrogen bond acceptor), which is an array of rough size of (1000,3).
Then I will do the distance search, which gives the distance and shrink the number of indexes significantly. Thus, we have an array of (100,4), where the last column is distance.
Then I will do the angle search, which will also shrink the number of indexes and gives the final result of (10,5), where the last column is the angle.

To make a structured array, we need to know the array size in advance.
Do you suggest that I shall make a structured array of (1000,5) and then slice through it? to eventually get to (10,5).
Or is it better to reconstructed an array ar each step?
(1000,3) > (100,3)+(100,1) > (10,4)+(10,1)
Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know the size of hbond_indices, so rather than have a loop iteration for each entry, you could create a structured array of size hbond_indices and fill that. Not too important though

@xiki-tempula
Copy link
Contributor Author

xiki-tempula commented Feb 5, 2020

@richardjgowers Thank you for the review. I guess the major problem in this PR is that I need to represent some atom group as a list of atoms indexes. To avoid the confusion that some atom groups are represented as indexes while others are represented as atom groups. I made that decision that atom groups should be represented as list of indexes, which changed some idioms.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiki-tempula has this been benchmarked at all?

@xiki-tempula
Copy link
Contributor Author

xiki-tempula commented Feb 7, 2020

@richardjgowers Yes, it has been benchmarked.
the order or water bridge versus the new and old implementation

N new (sec) old (sec)
0 1 1.3
1 6 12
2 21 40
3 52 95
4 123 205

EDIT: reformatted data as table — @orbeckst

@orbeckst
Copy link
Member

orbeckst commented Feb 7, 2020

Does this mean that the new implementation is much slower?

@xiki-tempula
Copy link
Contributor Author

@orbeckst Sorry for the confusion. This is the amount of time which is required to run the same task. The new implementation runs twice as fast as the old one.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiki-tempula can you put in a quick CHANGELOG entry so that you get included in the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants