Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Cython port of contact map functions #375

Merged
merged 16 commits into from
Aug 5, 2015

Conversation

kain88-de
Copy link
Member

I want to fix #264. The non PBC version is already implemented in Cython and the tests are passing as well. The Cython-implementation is also faster using the 1Q0W protein.

Porting the PBC version I noticed that there are no tests for it. I added a test and the test suite is still passing. But I'm not sure if the test I have added is run. Btw I'm new to nose is there a way to specify a subset of tests that I want to run instead of executing the whole test-suite all the time?

If someone can help me get the test-cases for the PBC-case running I will also port the PBC version.


Here is a short description how I run the test-suite since I couldn't find up-to-date information how to set-up a dev environment or run unit-tests.

For testing I'm using the approach defined in the travis.yml, but I installed all dependencies via pip.
After each change in either the package or testsuite folder I reinstalled them with pip install --upgrade <folder>/

Simple Cython implementation. Instead of the other functions in
distances.pyx I have used pure cython for clarity. The loop is converted
to pure C except for saving the sparse matrix element.
@richardjgowers
Copy link
Member

Hey,

This looks like a good start.

To run a subset of tests, from a command line in the directory testsuite/MDAnalysisTests you can do

# run tests from a specific file
nosetests test_analysis.py

# or specify the name of the class to run
nosetests test_analysis.py:TestContactMatrix

More details here:
https://github.com/MDAnalysis/mdanalysis/wiki/UnitTests

This should then let you see the two tests that fail on Travis (scroll to bottom):
https://travis-ci.org/MDAnalysis/mdanalysis/builds/73811170

If you're not sure a test is running, the quickest way to check is to change the reference state (your res) and see if the test then fails.

Let us know if you need any more help

The test are currently only run for the dense arrays.
@kain88-de kain88-de force-pushed the cython-contact-map branch from 1886597 to af65ed5 Compare August 3, 2015 10:45
@kain88-de
Copy link
Member Author

Ok I got the new test running and it passes like it should.

The current sparse test is failing though due to an error in the cython code. My next problem is that installing the changes with pip install --upgrade packages/ isn't updating the cython-generated C-module. Is there a special way how I should trigger the cython-compiler?

Sorry for all the questions but this is badly/not documented in the wiki.

@richardjgowers
Copy link
Member

My workflow is to run setup.py install --user from the package/ directory, I'm not sure if this is the same as your method.

If that doesn't do it, deleting src/numtools/distances.c should force it to regenerate the C module from the pyx you wrote.

Due to a typo I calculated the distance between points on different axes
instead of on the same one.
@kain88-de
Copy link
Member Author

Yeah got it working now. Installing cython in the virtual-env and removing the c file did it.

I have noticed that the return values between the numpy and sparse matrix are inconsistent. For the numpy array the diagonal of the contact-matrix is true. But for sparse matrices the diagonal is set to false.

Personally I would prefer if if the diagonal elements were always true independent of the chosen return-type.

@richardjgowers
Copy link
Member

It's not documented either way, so we can go with the diagonal being True (update the docstring and tests) unless anyone objects.

The only thing in the package that uses contact_matrix current is analysis/leaflet, and I think all this does is build a networkx graph from the contacts, I'm not sure how the diagonal would affect this? @orbeckst

To do the PBC versions, it might be a good idea to import minimum_image from calc_distances.h and use that. I think you'll need to change your x y z into a single array for that function.

Simple port. I'm currently not using any of the helper functions defined
in calc_distances.h
@kain88-de kain88-de force-pushed the cython-contact-map branch from 8f88666 to 2b92835 Compare August 3, 2015 13:26
@kain88-de
Copy link
Member Author

Btw is the progressbar feature still required? Without it it should be straight forward to add these functions to calc_distances.h which will make adding openmp support later easier.

@richardjgowers
Copy link
Member

I wasn't too sure how easy it'd be to fill a np sparse matrix in C though?

@orbeckst
Copy link
Member

orbeckst commented Aug 3, 2015

The progressbar feature is not really necessary inside the cython code; feel free to get rid of it, especially if it makes the code cleaner.

Only calculate the distances for the upper triangle contact matrix. The
other entries are already determined through the symmetry. This also
sets the contacts along the diagonal automatically to true.This should
save about half of the operations.
This makes the behavior consistent regardless of the chosen return type.
The refactored tests don't depend on any external test-data anymore.
This has two nice effects. Giving the coordinates explicitely makes it
easy to see where a contact is and the tests now run over a 100 times
faster. Plus everything looks a lot more cleaned up.
I reordered some variable assigment and calculate the squares directely
instead of using `pow`. This can lead to performance improvements on
some platforms.
@kain88-de kain88-de force-pushed the cython-contact-map branch from b95dadb to ac0acb0 Compare August 4, 2015 07:36
@kain88-de
Copy link
Member Author

I added some changes

  • removed progress bar
  • only calculate upper triangle matrix and use symmetry
  • refactored test to be nicer to read and easier to understand.
  • set diagonal elements to True also for sparse matrices

@richardjgowers richardjgowers added this to the 0.11 milestone Aug 4, 2015
@richardjgowers
Copy link
Member

Ok this is looking good, nice catch with the reflection.

There's actually a minor bug in the code you've translated, (was present before so not your fault), I've left a line comment, if you can fix that up then we're good to go.

Other minor things to do:

  • add yourself to AUTHORS
  • add a line to CHANGELOG detailing the fix to the diagonal return type

y = xyz[i, 1] - xyz[j, 1]
z = xyz[i, 2] - xyz[j, 2]

if abs(x) > box_half[0]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only add/subtract a single box_view from each dimension, but it needs to do it multiple times where necessary.

So if box_half = 10 then x = 1 = 11 == 21 = 31, and x = 1 = -9 = -19

You can either import the minimum_image function from calc_distances.h which does this, or inline that into here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I always assume that particles are in one of the neighboring images and then wonder that happened when this is not the case. I will first add this to the unit-tests and then fix it.

Two points with PBC can be further apart then one image. The test now
checks that as well. The numpy pbc function already passes this. The
sparse version still assumes that a point is maximally in a neighboring
image.
Now also points that are several images apart are handled correctly. All
tests are passing.
@kain88-de
Copy link
Member Author

PBC are handled correctly with minium-image now.

@kain88-de
Copy link
Member Author

Btw should I regenerate the documentation or is this done automatically with a git-hook when this PR is merged?

@orbeckst
Copy link
Member

orbeckst commented Aug 4, 2015

Docs are not generated automatically (yet). There is a page in the wiki outlining the procedure but in the end someone with write access has to merge the develop branch with docs into the gh-pages branch.

Oliver Beckstein
email: orbeckst@gmail.com
skype: orbeckst

Am Aug 4, 2015 um 2:55 schrieb kain88-de notifications@github.com:

Btw should I regenerate the documentation or is this done automatically with a git-hook when this PR is merged?


Reply to this email directly or view it on GitHub.

@kain88-de
Copy link
Member Author

Docs are updated now. Sphinx has rebuild almost all the doc pages but I have only included the one affected by this PR.

@orbeckst
Copy link
Member

orbeckst commented Aug 4, 2015

I had a quick look and it seems fine but I let @richardjgowers do the final approval and merge.

Thanks @kain88-de !

richardjgowers added a commit that referenced this pull request Aug 5, 2015
WIP: Cython port of contact map functions
@richardjgowers richardjgowers merged commit d043dd1 into MDAnalysis:develop Aug 5, 2015
@kain88-de kain88-de deleted the cython-contact-map branch September 12, 2015 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port weave functions to Cython
3 participants