Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Dihedral and Ramachandran Analysis #1997

Merged
merged 20 commits into from
Jul 25, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions package/MDAnalysis/analysis/dihedrals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the standard header – copy from the top of rms.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add from __future__ import absolute_import as the very first import with any import from six.moves directly below it. Also include the range import from six.moves

import matplotlib.pyplot as plt
import warnings

from MDAnalysis.analysis.base import AnalysisBase

def dihedral_calc(atomgroups):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion below converged on eliminating this function and using lib.distances.calc_dihedrals.

"""Calculates phi and psi angles for a list of AtomGroups over trajectory.

Parameters
----------
atomgroups : list of AtomGroups
must be a list of one or more AtomGroups containing 5 atoms in the
correct order (i.e. C-N-CA-C-N)

Returns
-------
angles : numpy.ndarray
An array of time steps which contain (phi,psi) for all AtomGroups.
"""

dihedrals = [atomgroup.dihedral for atomgroup in atomgroups]
angles = [dih.value() for dih in dihedrals]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a list not a numpy array. I would make this function a one liner

return np.array([ag.dihedral.value() for ag in atomgroups])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't wrong, but it would be better to use lib.distances.calc_dihedrals and do them all at once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that works?

Great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@orbeckst orbeckst Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers does dih.value() take PBC into account by default?

EDIT: yes. Dihedral.value has pbc=True by default.

lib.distances.calc_dihedrals will want a box from AtomGroup.dimensions so that you get the same results with old and new dihedral calculation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hfmull to use calc_dihedrals you will need "vertical atomgroups", ie 4 atomgroups, one representing the first atom in each dihedral, the next the second atom in each dihedral. I'd make these in the __init__ of your class. Then you can call calc_dihedrals(ag1.positions, ag2.positions, ag3.positions, ag4.positions, box=ag1.dimensions)


return angles


class Ramachandran(AnalysisBase):
"""Calculate phi and psi dihedral angles of specified residues.

Note
----
Run the analysis with :meth:`Ramachandran.run()`, which stores the results
in the array :attr:`Ramachandran.angles`. A axes object can be obtained
with :meth: `Ramachandran.run().plot()`.

If the residue selection is beyond the scope of the protein, then an error
will be raised. If the residue selection includes the first or last residue
then a warning will be raised, and the final array of angles will not
include those residues.

"""
def __init__(self, atomgroup, **kwargs):
r"""Parameters
----------
atomgroup : Atomgroup
atoms for residues for which phi and psi are calculated
start : int, optional
starting frame, default None becomes 0.
stop : int, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we moving those to the run function @orbeckst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right, of course – I asked @hfmull to start from analysis.rms.RMSD and forgot that it uses soon-to-be changed call signatures (#1463).

Do we need to keep init start/stop/step for new additions like this one and deprecate them (#1979) right away?? I would say no (don't teach the user something you want them to forget) but I am open to alternative opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still keeping them here makes this behave exactly as any other class. Kind of nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to not allow them here if we want them in run

Frame index to stop analysis. Default: None becomes
n_frames. Iteration stops *before* this frame number,
which means that the trajectory would be read until the end.
step : int, optional
step between frames, default None becomes 1.

"""
super(Ramachandran, self).__init__(atomgroup.universe.trajectory, **kwargs)
self.atomgroup = atomgroup

def _prepare(self):
self.residues = self.atomgroup.residues
res_min = np.min(self.atomgroup.universe.select_atoms("protein").residues)
res_max = np.max(self.atomgroup.universe.select_atoms("protein").residues)
if any([(residue < res_min or residue > res_max) for residue in self.residues]):
raise IndexError("Selection exceeds protein length")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this possible? Literally how. I would need to get an atom that is not in the universe topology.

I like to check things but I can't think of a case where this would ever be triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.atomgroup.residues contains residues which aren't in protein I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solvent molecules are listed as residues in the topology files (at least in the .gro files I've been using) so it possible to have a residue group that contains solvent molecules. The idea was to give more feedback on what the problem actually is rather than being confused when phi_selection() or psi_selection() returned None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But using ag.issubset(protein) is easier and more robust for the test anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I agree with @orbeckst to use set-semantics because it's easier to understand and more robust. I would also update the warning message to Found atoms outside of protein. Only atoms inside of a 'protein' selection can be used to calculate dihedrals.

elif any([residue == (res_min or res_max) for residue in self.residues]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that does what you think it does. I'm sure you want res == res_min or res == res_max for res in residues. The parenthesis should be evaluated first resulting in the test residue == True as the test condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I'll fix that on the next commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also make use of set operations here

protein = ag.universe.select_atoms('protein')
...
elif protein.residues[[0, -1]].issubset(self.residues)

It's more robust like @orbeckst has suggested above. I know the set operations are not well documented but they do make such comparisons easier.

warnings.warn("Cannot determine phi and psi angles for the first or last residues")

self.phi_atoms = [residue.phi_selection() for residue in self.residues
if res_min < residue < res_max]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already checked that this can never happen. You should be fine I don't see the reason for the if here as long as the previous checks really do work. See comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that conditional is that the function does not allow selections that exceed the protein but it still allows selections that include the first and last residues. However, because it doesn't make sense to calculate the dihedrals for those, they are excluded. The warning just acts as a signal so the user knows that they included those residues in their selection, but that they are not included in the list of angles. This is just to avoid confusion if the number of residues they listed does not match the number of angle sets given.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I understand now. you can pre remove them in the selection.

protein = ag.universe.select_atoms('protein')
# remove first and last residue if part of selection
residues = residues.difference(protein.residues[[0, -1]])

https://www.mdanalysis.org/docs/documentation_pages/core/groups.html
Scroll down to the list of operations.

self.psi_atoms = [residue.psi_selection() for residue in self.residues
if res_min < residue < res_max]

self.angles = []

def _single_frame(self):
self.phi_angles = dihedral_calc(self.phi_atoms)
self.psi_angles = dihedral_calc(self.psi_atoms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keep those around and attach them to self. I would rather make them normal variables that are not added to the class.

self.angles.append((self.phi_angles,self.psi_angles))

def _conclude(self):
self.angles = np.array(self.angles)

def plot(self, ax=None, color='k', marker='s', title=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is better suited as a free function that takes a Ramachandran class as first argument. Also you should allow for arbitrary kwargs and then call the plotting function ax.plot(...., **kwargs). This allows the user to select his own styling and better integrate the plots.

Copy link
Member

@orbeckst orbeckst Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the other analysis classes are using plot from an external function. And at least I think it makes sense to keep it as a method, along the lines of "the method knows how to plot the data". Most people will initially use it to get some quick results, then perhaps use it to plot into their own ax (that's important) and with custom kwargs (also important), and then finally just reverse-engineer the code to see what they need for their own plotting. I feel it is useful to have Ramachandran.plot() as it immediately tells you what you can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would omit all styling kwargs (color, marker, title) and just pass the **kwargs to plot. Then let the user do the styling, should work just fine with scatter plot as @kain88-de is suggesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method needs a test that checks that it produces an Axes instance. Something like

ax = rama.plot()
assert isinstance(ax, plt.matplotlib.Axes)

(or whatever the actual instance is – see type(ax) ... or see other tests, try git grep Axes)

"""Plots data into standard ramachandran plot. Each time step in
self.angles is plotted onto the same graph.

Parameters
----------
ax : :class:`matplotlib.axes.Axes`
If no `ax` is supplied or set to ``None`` then the plot will
be added to the current active axes.
color : string, optional
Color used for markers in the plot; the default color is 'black'.
marker : string, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a **kwargs argument to the function. Anything passed to it will automatically be passed to the plot. This gives the greatest freedom for the user.

Marker used in plot; the default marker is 'square'.
title : string, optional
Title of axes object; the default `None` leaves plot without a
title.

Returns
-------
ax : :class:`~matplotlib.axes.Axes`
Axes with the plot, either `ax` or the current axes.

"""
if ax is None:
ax = plt.gca()
ax.axis([-180,180,-180,180])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do people plot the allowed and marginally allowed regions in "standard" Ramachandran plots? Is there a simple function (something like a spline for a density that gives the correct contours) or a dataset that we could use?

It would be nice if we had a helper function to plot them; this method could then use the helper to plot these regions in the background but anyone else could use it, too.

(Not a requirement for this PR but something to think about.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a data set from this website that was taken from this paper that I was able to make a decent reference graph of. I just don't know where I should put the data need to graph it and where I should cite it. This is just an example of what it looks like with the reference.
rama_ref_demo

ax.axhline(0, color='k', lw=1)
ax.axvline(0, color='k', lw=1)
ax.set_xticks(range(-180,181,60))
ax.set_yticks(range(-180,181,60))
ax.set_xlabel(r"$\phi$ (deg)")
ax.set_ylabel(r"$\psi$ (deg)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a more convenient ax.set(xlabel=..., ylabel=..., xticks=..., yticks=...) function available.

if title is not None:
ax.set_title(title)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't to that. Let the user decide them self

for angles in self.angles:
ax.plot(angles[0],angles[1], color=color,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better function would be scatter here. It doesn't need the line plot styling.

ax.scatter(angles[0], angles[1], **kwargs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this into a one-line and only call into the plotting routine once. Might make a speed difference if you have a lot of frames.

a = self.angles.reshape((np.prod(self.angles.shape[:2]), 2))
ax.scatter(a[0], a[1], **kwargs)

marker=marker, linestyle='')
57 changes: 57 additions & 0 deletions testsuite/MDAnalysisTests/analysis/test_dihedrals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright header should also be added here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint will also issue a warning here to add from __future__ import absolute_import as the top most import

from numpy.testing import assert_array_almost_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use assert_almost_equal. It does some more equality checks then assert_array_almost_equal

import pytest
import os

import MDAnalysis as mda
from MDAnalysisTests.datafiles import GRO, XTC, DihedralsArray, GLYDihedralsArray
from MDAnalysis.analysis import dihedrals


class TestRamachandran(object):

@pytest.fixture()
def universe(self):
return mda.Universe(GRO, XTC)

def test_ramachandran(self, universe):
rama = dihedrals.Ramachandran(universe.select_atoms("protein")).run()
test_rama = np.load(DihedralsArray)

assert_array_almost_equal(rama.angles, test_rama, 5,
err_msg="error: dihedral angles should "
"match test values")

def test_ramachandran_single_frame(self, universe):
rama = dihedrals.Ramachandran(universe.select_atoms("protein"),
start=5, stop=6).run()
test_rama = [np.load(DihedralsArray)[5]]

assert_array_almost_equal(rama.angles, test_rama, 5,
err_msg="error: dihedral angles should "
"match test values")

def test_ramachandran_identical_frames(self, universe, tmpdir):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have this test? It was included because I was learning from the test_rms.py file which had it, but I am not sure of the exact reason to include it and @orbeckst isn't either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed here.


outfile = os.path.join(str(tmpdir), "dihedrals.xtc")

# write a dummy trajectory of all the same frame
with mda.Writer(outfile, universe.atoms.n_atoms) as W:
for _ in range(universe.trajectory.n_frames):
W.write(universe)

universe = mda.Universe(GRO, outfile)
rama = dihedrals.Ramachandran(universe.select_atoms("protein")).run()
test_rama = [np.load(DihedralsArray)[0] for ts in universe.trajectory]

assert_array_almost_equal(rama.angles, test_rama, 5,
err_msg="error: dihedral angles should "
"match test values")

def test_ramachandran_residue_selections(self, universe):
rama = dihedrals.Ramachandran(universe.select_atoms("resname GLY")).run()
test_rama = np.load(GLYDihedralsArray)

assert_array_almost_equal(rama.angles, test_rama, 5,
err_msg="error: dihedral angles should "
"match test values")
Binary file not shown.
Binary file not shown.
3 changes: 3 additions & 0 deletions testsuite/MDAnalysisTests/datafiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,5 +415,8 @@

GSD = resource_filename(__name__, 'data/example.gsd')

DihedralsArray = resource_filename(__name__, 'data/adk_oplsaa_dihedrals.npy')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add the to __all__ (I only learned that from the test that failed!)

GLYDihedralsArray = resource_filename(__name__, 'data/adk_oplsaa_GLY_dihedrals.npy')

# This should be the last line: clean up namespace
del resource_filename