Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Dihedral and Ramachandran Analysis #1997

Merged
merged 20 commits into from
Jul 25, 2018
Merged

Conversation

hfmull
Copy link
Contributor

@hfmull hfmull commented Jul 18, 2018

Changes made in this Pull Request:

  • added new MDAnalysis.analysis.dihedrals module
  • use AnalysisBase for new Ramachandran class
  • new dihedrals_calc function

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • [n/a] Issue raised/referenced?

err_msg="error: dihedral angles should "
"match test values")

def test_ramachandran_identical_frames(self, universe, tmpdir):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have this test? It was included because I was learning from the test_rms.py file which had it, but I am not sure of the exact reason to include it and @orbeckst isn't either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed here.

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your first contribution. It already looks good. Most of my comments are about style and it would be nice if you address them.

"""

dihedrals = [atomgroup.dihedral for atomgroup in atomgroups]
angles = [dih.value() for dih in dihedrals]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a list not a numpy array. I would make this function a one liner

return np.array([ag.dihedral.value() for ag in atomgroups])

atoms for residues for which phi and psi are calculated
start : int, optional
starting frame, default None becomes 0.
stop : int, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we moving those to the run function @orbeckst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right, of course – I asked @hfmull to start from analysis.rms.RMSD and forgot that it uses soon-to-be changed call signatures (#1463).

Do we need to keep init start/stop/step for new additions like this one and deprecate them (#1979) right away?? I would say no (don't teach the user something you want them to forget) but I am open to alternative opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still keeping them here makes this behave exactly as any other class. Kind of nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to not allow them here if we want them in run

res_min = np.min(self.atomgroup.universe.select_atoms("protein").residues)
res_max = np.max(self.atomgroup.universe.select_atoms("protein").residues)
if any([(residue < res_min or residue > res_max) for residue in self.residues]):
raise IndexError("Selection exceeds protein length")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this possible? Literally how. I would need to get an atom that is not in the universe topology.

I like to check things but I can't think of a case where this would ever be triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.atomgroup.residues contains residues which aren't in protein I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solvent molecules are listed as residues in the topology files (at least in the .gro files I've been using) so it possible to have a residue group that contains solvent molecules. The idea was to give more feedback on what the problem actually is rather than being confused when phi_selection() or psi_selection() returned None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But using ag.issubset(protein) is easier and more robust for the test anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I agree with @orbeckst to use set-semantics because it's easier to understand and more robust. I would also update the warning message to Found atoms outside of protein. Only atoms inside of a 'protein' selection can be used to calculate dihedrals.

res_max = np.max(self.atomgroup.universe.select_atoms("protein").residues)
if any([(residue < res_min or residue > res_max) for residue in self.residues]):
raise IndexError("Selection exceeds protein length")
elif any([residue == (res_min or res_max) for residue in self.residues]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that does what you think it does. I'm sure you want res == res_min or res == res_max for res in residues. The parenthesis should be evaluated first resulting in the test residue == True as the test condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I'll fix that on the next commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also make use of set operations here

protein = ag.universe.select_atoms('protein')
...
elif protein.residues[[0, -1]].issubset(self.residues)

It's more robust like @orbeckst has suggested above. I know the set operations are not well documented but they do make such comparisons easier.

warnings.warn("Cannot determine phi and psi angles for the first or last residues")

self.phi_atoms = [residue.phi_selection() for residue in self.residues
if res_min < residue < res_max]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already checked that this can never happen. You should be fine I don't see the reason for the if here as long as the previous checks really do work. See comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that conditional is that the function does not allow selections that exceed the protein but it still allows selections that include the first and last residues. However, because it doesn't make sense to calculate the dihedrals for those, they are excluded. The warning just acts as a signal so the user knows that they included those residues in their selection, but that they are not included in the list of angles. This is just to avoid confusion if the number of residues they listed does not match the number of angle sets given.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I understand now. you can pre remove them in the selection.

protein = ag.universe.select_atoms('protein')
# remove first and last residue if part of selection
residues = residues.difference(protein.residues[[0, -1]])

https://www.mdanalysis.org/docs/documentation_pages/core/groups.html
Scroll down to the list of operations.

if title is not None:
ax.set_title(title)
for angles in self.angles:
ax.plot(angles[0],angles[1], color=color,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better function would be scatter here. It doesn't need the line plot styling.

ax.scatter(angles[0], angles[1], **kwargs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this into a one-line and only call into the plotting routine once. Might make a speed difference if you have a lot of frames.

a = self.angles.reshape((np.prod(self.angles.shape[:2]), 2))
ax.scatter(a[0], a[1], **kwargs)

ax.set_xticks(range(-180,181,60))
ax.set_yticks(range(-180,181,60))
ax.set_xlabel(r"$\phi$ (deg)")
ax.set_ylabel(r"$\psi$ (deg)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a more convenient ax.set(xlabel=..., ylabel=..., xticks=..., yticks=...) function available.

err_msg="error: dihedral angles should "
"match test values")

def test_ramachandran_identical_frames(self, universe, tmpdir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed here.

@@ -0,0 +1,57 @@
import numpy as np
from numpy.testing import assert_array_almost_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use assert_almost_equal. It does some more equality checks then assert_array_almost_equal

be added to the current active axes.
color : string, optional
Color used for markers in the plot; the default color is 'black'.
marker : string, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a **kwargs argument to the function. Anything passed to it will automatically be passed to the plot. This gives the greatest freedom for the user.

@kain88-de
Copy link
Member

Oh and please add this to the documentation generated for the users. @orbeckst can show you how (I assume you are working in his lab)

@orbeckst
Copy link
Member

@kain88-de @richardjgowers thanks for the comments.

@hfmull is doing a REU in my lab, with the goal to work more on PMDA. We use the dihedral featurization as the initial introduction (and we hopefully get a faster dihedral featurizer out of this MDAnalysis/pmda#45 and perhaps other PMDA analysis blocks).

We just discussed a number of the points that you raised and @hfmull will respond to them as he works through the PR.

"""

dihedrals = [atomgroup.dihedral for atomgroup in atomgroups]
angles = [dih.value() for dih in dihedrals]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't wrong, but it would be better to use lib.distances.calc_dihedrals and do them all at once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that works?

Great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@orbeckst orbeckst Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers does dih.value() take PBC into account by default?

EDIT: yes. Dihedral.value has pbc=True by default.

lib.distances.calc_dihedrals will want a box from AtomGroup.dimensions so that you get the same results with old and new dihedral calculation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hfmull to use calc_dihedrals you will need "vertical atomgroups", ie 4 atomgroups, one representing the first atom in each dihedral, the next the second atom in each dihedral. I'd make these in the __init__ of your class. Then you can call calc_dihedrals(ag1.positions, ag2.positions, ag3.positions, ag4.positions, box=ag1.dimensions)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good but see my additional comments (in addition to what @kain88-de and @richardjgowers said).

Also

  • tests for any exceptions/warnings
  • add yourself to AUTHORS
  • add stubs to docs (in analysis_modules.rst and new file analysis/dihedrals.rst) and add some text to the top of the dihedrals.py file.


from MDAnalysis.analysis.base import AnalysisBase

def dihedral_calc(atomgroups):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion below converged on eliminating this function and using lib.distances.calc_dihedrals.

@@ -415,5 +415,8 @@

GSD = resource_filename(__name__, 'data/example.gsd')

DihedralsArray = resource_filename(__name__, 'data/adk_oplsaa_dihedrals.npy')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add the to __all__ (I only learned that from the test that failed!)

def _conclude(self):
self.angles = np.array(self.angles)

def plot(self, ax=None, color='k', marker='s', title=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method needs a test that checks that it produces an Axes instance. Something like

ax = rama.plot()
assert isinstance(ax, plt.matplotlib.Axes)

(or whatever the actual instance is – see type(ax) ... or see other tests, try git grep Axes)

"""
if ax is None:
ax = plt.gca()
ax.axis([-180,180,-180,180])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do people plot the allowed and marginally allowed regions in "standard" Ramachandran plots? Is there a simple function (something like a spline for a density that gives the correct contours) or a dataset that we could use?

It would be nice if we had a helper function to plot them; this method could then use the helper to plot these regions in the background but anyone else could use it, too.

(Not a requirement for this PR but something to think about.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a data set from this website that was taken from this paper that I was able to make a decent reference graph of. I just don't know where I should put the data need to graph it and where I should cite it. This is just an example of what it looks like with the reference.
rama_ref_demo

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking very good, I only found one small thing and you still have to add the files to the docs.

@@ -0,0 +1,105 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the standard header – copy from the top of rms.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add from __future__ import absolute_import as the very first import with any import from six.moves directly below it. Also include the range import from six.moves

protein = self.atomgroup.universe.select_atoms("protein").residues

if not residues.issubset(protein):
raise IndexError("Found atoms outside of protein. Only atoms "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a value error here. The IndexError is when you use the square bracket operator.

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. Just some very minor details left.

I'm already giving my approval. Once you satisfied @orbeckst he can merge this PR/

ax.set(xticks=range(-180,181,60), yticks=range(-180,181,60),
xlabel=r"$\phi$ (deg)", ylabel=r"$\psi$ (deg)")
a = self.angles.reshape(np.prod(self.angles.shape[:2]), 2)
ax.scatter(a[:,0], a[:,1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add **kwargs to the scatter function as an argument.

@@ -0,0 +1,39 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright header should also be added here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint will also issue a warning here to add from __future__ import absolute_import as the top most import

@codecov
Copy link

codecov bot commented Jul 20, 2018

Codecov Report

Merging #1997 into develop will increase coverage by 0.02%.
The diff coverage is 97.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #1997      +/-   ##
==========================================
+ Coverage    88.48%   88.5%   +0.02%     
==========================================
  Files          142     143       +1     
  Lines        17202   17249      +47     
  Branches      2635    2646      +11     
==========================================
+ Hits         15221   15267      +46     
  Misses        1385    1385              
- Partials       596     597       +1
Impacted Files Coverage Δ
package/MDAnalysis/analysis/dihedrals.py 97.82% <97.82%> (ø)
package/MDAnalysis/lib/distances.py 87.22% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7cfc27...193e503. Read the comment docs.


import matplotlib.pyplot as plt
fig = plt.figure(figsize(5,5))
ax = fig.add_subplot(111)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a more concise way (and my preferred one)

fig, ax = plt.subplots()

It can also create a grid directly and change the figure size. Also, the line for creating a figure is not valid syntax.
I assume you change the figure size to get an equal aspect ratio. There is a better function called plt.figaspect you can use. As an argument, it accepts an aspect ratio and returns sensible figure sizes. So you can instead write

fig, ax = plt.subplots(figsize=plt.figaspect(1))

can be accessed using :meth:`Ramachandran.angles`::

fig = plt.figure(figsize(5, 5))
ax = fig.add_subplot(111)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above


from MDAnalysis.analysis.dihedrals import Ramachandran
R = Ramachandran(r)
R.run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred is R = Ramachandran(r).run() as a one-liner. The run method returns the object itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's originally what I was doing but I've been using rms.py as a reference so I thought that was the convention. I'll be sure to change it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might still be in older modules. We changed the return value of run only after the analysis class was introduced.

:members:
:inherited-members:

.. attribute:: angles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to have this as documentation in the class? @orbeckst is the information now rendered twice?

@orbeckst
Copy link
Member

orbeckst commented Jul 21, 2018 via email

@kain88-de
Copy link
Member

kain88-de commented Jul 21, 2018 via email

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the last thing I found

then a warning will be raised and they will be removed from the list of
residues, but the analysis will still run.

Run the analysis with :meth:`Ramachandran.run()`, whcih stores the results in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I missed that. I'll probably go through the docstring one last time to catch anything else like that.

@kain88-de
Copy link
Member

@orbeckst this should be done now

@orbeckst
Copy link
Member

orbeckst commented Jul 24, 2018 via email

@kain88-de kain88-de dismissed orbeckst’s stale review July 25, 2018 05:45

indicated the PR can be merged

@kain88-de kain88-de merged commit b50c6f0 into MDAnalysis:develop Jul 25, 2018
@kain88-de
Copy link
Member

Congratulations to your first PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants