-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace distance search with capped_distance in water bridge analysis #2480
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2480 +/- ##
==========================================
Coverage ? 90.28%
==========================================
Files ? 169
Lines ? 23200
Branches ? 3002
==========================================
Hits ? 20945
Misses ? 1654
Partials ? 601 Continue to review full report at Codecov.
|
I will add some test to bump the coverage up |
@richardjgowers Hi Richard, I have bumped the coverage. I wonder if your mind does a code review? If you felt it is ok. I will modify the changelog. |
) | ||
) | ||
hbond_indices = np.where(angles > self.angle)[0] | ||
for index in hbond_indices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern of making a nice numpy array then iterating over it (rather than slicing efficiently) is a bit ugly. Ideally you could make some sort of numpy structured array and slice into it. Everything you're loading into the result array can be obtained through slicing their origin array with hbond_indices
, so you could grab the data with n columns operations rather than nrows.
But maybe that's beyond the scope of this PR, this will work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I will make it into a PR after this one is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After given it some thought, I'm not quite sure of the representation that you are talking about.
I will start with a list of indices (hydrogen bond donor heavy atom, hydrogen bond donor hydrogen, hydrogen bond acceptor), which is an array of rough size of (1000,3).
Then I will do the distance search, which gives the distance and shrink the number of indexes significantly. Thus, we have an array of (100,4), where the last column is distance.
Then I will do the angle search, which will also shrink the number of indexes and gives the final result of (10,5), where the last column is the angle.
To make a structured array, we need to know the array size in advance.
Do you suggest that I shall make a structured array of (1000,5) and then slice through it? to eventually get to (10,5).
Or is it better to reconstructed an array ar each step?
(1000,3) > (100,3)+(100,1) > (10,4)+(10,1)
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know the size of hbond_indices, so rather than have a loop iteration for each entry, you could create a structured array of size hbond_indices and fill that. Not too important though
@richardjgowers Thank you for the review. I guess the major problem in this PR is that I need to represent some atom group as a list of atoms indexes. To avoid the confusion that some atom groups are represented as indexes while others are represented as atom groups. I made that decision that atom groups should be represented as list of indexes, which changed some idioms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiki-tempula has this been benchmarked at all?
@richardjgowers Yes, it has been benchmarked.
EDIT: reformatted data as table — @orbeckst |
Does this mean that the new implementation is much slower? |
@orbeckst Sorry for the confusion. This is the amount of time which is required to run the same task. The new implementation runs twice as fast as the old one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiki-tempula can you put in a quick CHANGELOG entry so that you get included in the release notes.
Changes made in this Pull Request:
Changes the distance calculation to capped_distance in water bridge analysis.
PR Checklist