Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Transformations from closure into class #2859

Merged
merged 33 commits into from
Sep 8, 2020

Conversation

yuxuanzhuang
Copy link
Contributor

@yuxuanzhuang yuxuanzhuang commented Jul 20, 2020

Fixes #2860

Changes made in this Pull Request:

  • refactor transformations from closure into class

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 20, 2020

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-06 17:15:25 UTC

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good idea but it's important not to be doing validity checks every frame

package/MDAnalysis/transformations/fit.py Show resolved Hide resolved
@yuxuanzhuang
Copy link
Contributor Author

yuxuanzhuang commented Jul 20, 2020

So for most parts I just moved code around, the only change is PositionAverager, where it now gets the same result when calling the same timestep, and does not update the coord_array. (to suit we call u.trajectory[frame] during pickling.

trans = PositionAverager(3, check_reset=True)
u = mda.Universe(TPR, XTC)
u.trajectory.add_transformations(trans)
u.trajectory[3]
print(u.atoms.positions[0])
u.trajectory[3]
print(u.atoms.positions[0])
u.trajectory[3]
print(u.atoms.positions[0])
  • Before
    [27.95 20.195 13.645]
    [55.9 40.39 27.29]
    [55.9 40.39 27.29]
    /home/scottzhuang/mdanalysis/package/MDAnalysis/transformations/positionaveraging.py:177: Warning: Cannot average position for non sequentialiterations. Averager will be reset.
    warnings.warn('Cannot average position for non sequential'
  • Now
    [53.96 41.975002 29.420002]
    [53.96 41.975002 29.420002]
    [53.96 41.975002 29.420002]

@codecov
Copy link

codecov bot commented Jul 20, 2020

Codecov Report

Merging #2859 into develop will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2859   +/-   ##
========================================
  Coverage    93.01%   93.02%           
========================================
  Files          187      187           
  Lines        24940    24962   +22     
  Branches      3261     3261           
========================================
+ Hits         23198    23220   +22     
  Misses        1694     1694           
  Partials        48       48           
Impacted Files Coverage Δ
package/MDAnalysis/core/universe.py 97.81% <100.00%> (+<0.01%) ⬆️
package/MDAnalysis/transformations/__init__.py 100.00% <100.00%> (ø)
package/MDAnalysis/transformations/fit.py 100.00% <100.00%> (ø)
...ge/MDAnalysis/transformations/positionaveraging.py 100.00% <100.00%> (ø)
package/MDAnalysis/transformations/rotate.py 100.00% <100.00%> (ø)
package/MDAnalysis/transformations/translate.py 100.00% <100.00%> (ø)
package/MDAnalysis/transformations/wrap.py 100.00% <100.00%> (ø)
transformations/__init__.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2b7bcb...0655275. Read the comment docs.

@orbeckst
Copy link
Member

There's a segfault in a pickle transformation test https://travis-ci.com/github/MDAnalysis/mdanalysis/jobs/365340906 – do you have an idea what's happening there?

@orbeckst
Copy link
Member

@yuxuanzhuang
Copy link
Contributor Author

I am not sure why it's happening, but the reason is that I define a universe (PDB, XTC) at the top level of the file (line 44). I guess some specific version of python/numpy/? is not quite happy with that.

=================================== ERRORS ====================================
_ ERROR collecting MDAnalysisTests/parallelism/test_pickle_transformation.py __
MDAnalysisTests\parallelism\test_pickle_transformation.py:44: in <module>
    u = mda.Universe(TPR, XTC)
C:\hostedtoolcache\windows\Python\3.7.8\x64\lib\site-packages\mdanalysis-2.0.0.dev0-py3.7-win-amd64.egg\MDAnalysis\core\universe.py:386: in __init__
    in_memory_step=in_memory_step, **kwargs)
C:\hostedtoolcache\windows\Python\3.7.8\x64\lib\site-packages\mdanalysis-2.0.0.dev0-py3.7-win-amd64.egg\MDAnalysis\core\universe.py:578: in load_new
    self.trajectory = reader(filename, format=format, **kwargs)
C:\hostedtoolcache\windows\Python\3.7.8\x64\lib\site-packages\mdanalysis-2.0.0.dev0-py3.7-win-amd64.egg\MDAnalysis\coordinates\XDR.py:149: in __init__
    self._load_offsets()
C:\hostedtoolcache\windows\Python\3.7.8\x64\lib\site-packages\mdanalysis-2.0.0.dev0-py3.7-win-amd64.egg\MDAnalysis\coordinates\XDR.py:194: in _load_offsets
    data = read_numpy_offsets(fname)
C:\hostedtoolcache\windows\Python\3.7.8\x64\lib\site-packages\mdanalysis-2.0.0.dev0-py3.7-win-amd64.egg\MDAnalysis\coordinates\XDR.py:84: in read_numpy_offsets
    return {k: v for k, v in np.load(filename).items()}
C:\hostedtoolcache\windows\Python\3.7.8\x64\lib\site-packages\numpy\lib\npyio.py:444: in load
    raise ValueError("Cannot load file containing pickled data "
E   ValueError: Cannot load file containing pickled data when allow_pickle=False

After I changed it to DCD, it's fine now.

@orbeckst
Copy link
Member

Uh, oh, ... I think @IAlibay and @lilyminium have been chasing an elusive segfault that seems to be connected to numpy and xtc. Just pinging so that they can comment.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had time to look at the docs in more detail, see comments.

package/MDAnalysis/transformations/translate.py Outdated Show resolved Hide resolved
package/MDAnalysis/transformations/rotate.py Outdated Show resolved Hide resolved
package/MDAnalysis/transformations/fit.py Outdated Show resolved Hide resolved
package/MDAnalysis/transformations/__init__.py Outdated Show resolved Hide resolved
package/MDAnalysis/transformations/__init__.py Outdated Show resolved Hide resolved
package/MDAnalysis/transformations/__init__.py Outdated Show resolved Hide resolved
package/MDAnalysis/transformations/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update CHANGELOG and there's a small doc typo still in there, otherwise all good

@orbeckst
Copy link
Member

ping @richardjgowers for re-review

@yuxuanzhuang
Copy link
Contributor Author

yuxuanzhuang commented Aug 27, 2020

Met a problem when I was wrapping things up. After we change the behaviour of AtomGroup pickling, one problem shows up:

fit_trans = trans.fit_rot_trans(u.atoms, u.atoms)
u.trajectory.add_transformations(fit_trans)
u_dumped =  pickle.dumps(u)
u_p = pickle.loads(u_dumped)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-e66d2dd7c314> in <module>
----> 1 u_p = pickle.loads(pickle.dumps(u))

~/mdanalysis/package/MDAnalysis/core/groups.py in _unpickle(u, ix)
    113 def _unpickle(u, ix):
    114     print(u.__dict__)
--> 115     return u.atoms[ix]
    116 
    117 

AttributeError: 'Universe' object has no attribute 'atoms'

Not totally sure the exact reason. Seems that in the new process, transformations (in which have AtomGroup) are unpickled when Universe is still empty. Changing back to __reduce__ can fix this pickle ordering(?) problem.

EDIT:
There's nothing wrong with how we pickle AtomGroup.

My hunch is, for pickling Universe:

  • It will first initialize an empty Universe.
  • For Universe, before __setstate__, it will unpickle _topology and _trajectory.
  • For _trajectory, it will unpickle transformations
  • For transformations, it will unpickle AtomGroup
  • For AtomGroup, it will find a suitable Universe, which is empty. Ergo the error.

while __reduce__ works differently (somehow).

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, please.

u_p = pickle.loads(pickle.dumps(u))
u.trajectory[0]
for u_ts, u_p_ts in zip(u.trajectory[:5], u_p.trajectory[:5]):
assert_equal(u_ts.positions, u_p_ts.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use equality comparison for floats

Suggested change
assert_equal(u_ts.positions, u_p_ts.positions)
assert_almost_equal(u_ts.positions, u_p_ts.positions)

u_p = pickle.loads(pickle.dumps(u))
u.trajectory[0]
for u_ts, u_p_ts in zip(u.trajectory[:5], u_p.trajectory[:5]):
assert_equal(u_ts.positions, u_p_ts.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use equality comparison for floats

Suggested change
assert_equal(u_ts.positions, u_p_ts.positions)
assert_almost_equal(u_ts.positions, u_p_ts.positions)

u_p = pickle.loads(pickle.dumps(u))
u.trajectory[0]
for u_ts, u_p_ts in zip(u.trajectory[:5], u_p.trajectory[:5]):
assert_equal(u_ts.positions, u_p_ts.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use equality comparison for floats

Suggested change
assert_equal(u_ts.positions, u_p_ts.positions)
assert_almost_equal(u_ts.positions, u_p_ts.positions)

u_p = pickle.loads(pickle.dumps(u))
u.trajectory[0]
for u_ts, u_p_ts in zip(u.trajectory[:5], u_p.trajectory[:5]):
assert_equal(u_ts.positions, u_p_ts.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use equality comparison for floats

Suggested change
assert_equal(u_ts.positions, u_p_ts.positions)
assert_almost_equal(u_ts.positions, u_p_ts.positions)

u_p = pickle.loads(pickle.dumps(u))
u.trajectory[0]
for u_ts, u_p_ts in zip(u.trajectory[:5], u_p.trajectory[:5]):
assert_equal(u_ts.positions, u_p_ts.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use equality comparison for floats

Suggested change
assert_equal(u_ts.positions, u_p_ts.positions)
assert_almost_equal(u_ts.positions, u_p_ts.positions)

u_p = pickle.loads(pickle.dumps(u))
u.trajectory[0]
for u_ts, u_p_ts in zip(u.trajectory[:5], u_p.trajectory[:5]):
assert_equal(u_ts.positions, u_p_ts.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use equality comparison for floats

Suggested change
assert_equal(u_ts.positions, u_p_ts.positions)
assert_almost_equal(u_ts.positions, u_p_ts.positions)

#
import pytest
import pickle
from numpy.testing import assert_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this for float comparisons:

Suggested change
from numpy.testing import assert_equal
from numpy.testing import assert_almost_equal

# apply changes to the Timestep object
return ts

See `MDAnalysis.transformations.translate` for a simple example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is not a valid sphinx link.

Suggested change
See `MDAnalysis.transformations.translate` for a simple example.
See :mod:`MDAnalysis.transformations.translate` for a simple example.

are often required for some analyses and visualization, and the functions in
this module allow transformations to be applied on-the-fly.

A typical transformation class looks like this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A typical transformation class looks like this:
A typical transformation class looks like this (note that we keep its name
lowercase because we will treat it as a function, thanks to the ``__call__``
method):

# or modify an AtomGroup and return Timestep

return ts

Copy link
Member

@orbeckst orbeckst Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You showed the abstract class. Now show a concrete example to address @richardjgowers comment. For instance

Suggested change
As a concrete example we will write a transformation that rotates a group of atoms around
the z-axis through the center of geometry by a fixed increment for every time step. We will use
:meth:`MDAnalysis.core.groups.AtomGroup.rotateby` and simply increase the rotation angle
every time the transformation is called::
class spin_atoms(object):
def __init__(self, atoms, dphi):
"""Rotate atoms by dphi degrees for every time step (around the z axis)"""
self.atoms = atoms
self.dphi = dphi
self.axis = np.array([0, 0, 1])
def __call__(self, ts):
phi = self.dphi * ts.frame
self.atoms.rotateby(phi, self.axis)
return ts
This transformation can be used as ::
u = mda.Universe(PSF, DCD)
u.trajectory.add_transformations(spin_atoms(u.select_atoms("protein"), 1.0))

I currently can't get nglview to work in my notebook so you'll need to check that this actually works... or come up with another example.

EDIT: forgot to increment phi...

EDIT 2: yes, it's pretty dumb that the phi angle just keeps incrementing, no matter what you do with the trajectory. Perhaps better to do something like phi = ts.frame * self.dphi

EDIT 3: changed it to phi = ts.frame * self.dphi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! it works. And I think it makes sense to just rotate by a fixed angle. (for visualization perhaps)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the snippet.

@orbeckst
Copy link
Member

And yeah, I don't know why we need to use __reduce__... are there downsides to doing so instead of __get/setstate__?

@orbeckst
Copy link
Member

Does it still work with dask?

@yuxuanzhuang
Copy link
Contributor Author

And yeah, I don't know why we need to use __reduce__... are there downsides to doing so instead of __get/setstate__?

No, I think it is just low-level func for __get/setstate__. And it still works will dask.

On the other hand, for performance, it seems that the Universe is pickled twice in the presence of transformation.

>>> u = mda.Universe(PSF, DCD)
>>> fit_trans = trans.fit_rot_trans(u.atoms, u.atoms)
>>> u.trajectory.add_transformations(fit_trans)

>>> dmp = pickle.dumps(u)
run `__reduce__` (Universe)
run `__reduce__` (AtomGroup)
run `__reduce__` (Universe)
>>> u_p = pickle.loads(dmp)
run `_unpickle_U`
run `unpickle AtomGroup`
run `_unpickle_U`

I have to admit I am a bit confused here...when pickling intertwined stuff, e.g when we pickle them apart, it's not an issue:

>>> u = mda.Universe(PSF, DCD)
>>> fit_trans = trans.fit_rot_trans(u.atoms, u.atoms)

>>> dmp = pickle.dumps([u, fit_trans])  #  Note they are not linked together here.
run `__reduce__` (AtomGroup)
run `__reduce__` (Universe)
>>> u_p = pickle.loads(dmp)
run `_unpickle_U` 
run `unpickle AtomGroup`

@orbeckst
Copy link
Member

Would be good to understand the path the code takes when it is pickling Universe twice. Even if it were not a performance issue, it is a potential bug somewhere down the line. Btw, is the Universe stored twice in the pickle?

@yuxuanzhuang
Copy link
Contributor Author

yuxuanzhuang commented Aug 30, 2020

I think it's a normal behavior of python handling circular reference pickling.

class Universe(object):
    def __init__(self, top):
        self.top = top
        
    def add_trajectory(self,traj):
        self.traj = traj
        
    def __reduce__(self):
        print('reduce')
        return self._unpickle_u, (self.top, self.traj)
    
    @classmethod
    def _unpickle_u(cls, top, traj):
        print('unpickle')
        u = cls(top)
        u.traj = traj
        return u

class Topology(object):
    def __init__(self):
        pass
    
class Trajectory(object):
    def __init__(self):
        pass
        
    def add_transformation(self, transformation):
        self.transformation = transformation
        
class Transformation(object):
    def __init__(self, atomgroup):
        self.atomgroup = atomgroup
        
class Atomgroup(object):
    def __init__(self, universe):
        self.universe = universe

top = Topology()
traj = Trajectory()
u = Universe(top)
u.add_trajectory(traj)
ag = Atomgroup(u)
trans = Transformation(ag)
traj.add_transformation(trans)

>>> import pickle
>>> u_dmp = pickle.dumps(u)
>>> u_p = pickle.loads(u_dmp)
reduce
reduce
unpickle
unpickle

EDIT:
Universe is not stored twice...if you mean the actual pickling size.

f = pickle.dumps(obj)
print(len(f))
81439 (with transformation)
69086 (without transformation)
63713 (u._topology)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at the pickling order. If we only store the data once then I think that's not a problem if it gets pickled twice.

I found a minor typo in the docs (see suggestion). Just correct it. I am already approving the PR.

package/MDAnalysis/transformations/__init__.py Outdated Show resolved Hide resolved
@orbeckst
Copy link
Member

orbeckst commented Sep 3, 2020

@richardjgowers , were your comments addressed?

yuxuanzhuang and others added 2 commits September 4, 2020 09:24
Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
@orbeckst orbeckst dismissed richardjgowers’s stale review September 8, 2020 00:50

From what I can see, the reviewer's comments were addressed. To move the PR towards a merge, I am dismissing the stale review.

@orbeckst orbeckst merged commit 2c5e385 into MDAnalysis:develop Sep 8, 2020
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* Fixes MDAnalysis#2860
* refactor transformations from closure into class (necessary so that a Universe
   with transformations can be pickled).
* change universe pickling to `__reduce__` (instead of `__setstate__`/`__getstate__`, which
   did not work with transformations)
* add test for pickling a Universe with transformations
* update docs for transformations
   - examples (written as class)
   - deprecate transformations as closures (still works but cannot be pickled due
      to limitations in Python's pickle module)
* update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor all Transformations into class
5 participants