Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryReader facelift #2080

Merged
merged 1 commit into from
Oct 9, 2018
Merged

Conversation

richardjgowers
Copy link
Member

@richardjgowers richardjgowers commented Sep 19, 2018

Fixes #2076 #2077 #1041 #2081
Replaces #1537

Changes made in this Pull Request:

  • MemoryReader now has velocities and forces (which hopefully work identically to positions)
  • MemoryReader now has variable dimensions (ie NPT conditions)
  • Universe.transfer_to_memory takes advantage of all of the above
  • Universe.empty now uses a MemoryReader (now it does the above)

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@richardjgowers
Copy link
Member Author

I'm a little confused what the expectations are for the MemoryReader wrt modifying attributes of Timesteps. From what I can tell, it tries to give views not copies of the underlying arrays. This means that if you edit the positions, change frame, your changes are saved (ie go back to that frame and you've still got the modified positions rather than the original). Is this correct?

Therefore, does it make sense to make .dimensions always an array of dimensions so that it can always follow this pattern?

@orbeckst
Copy link
Member

Yes, I'd say the expectation is that everything is permanent. In this regard, MemoryReader is a very different beast from all the other Readers. But I think it's required so that MemoryReader can become a swiss-army knife for trajectory manipulation.

I think that the new transformations are also permanent for MemoryReader (@jbarnoud @davidercruz ?)

@jbarnoud
Copy link
Contributor

The transformations are indeed permanent on the memory reader. In practice, the transformation is applied on all frames in one go, and then purged from the list of transformation to not apply the same transformation again when changing frame.

The expectation with the memory reader is that changes done to the coordinates are kept, on the contrary to the other readers. Having the box following the same pattern is what I would expect.

from .base import SingleFrameReaderBase


class DummyReader(SingleFrameReaderBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be deprecated first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only used internally, and I think it's so buggy (it doesn't iterate?) I doubt anyone found any good use for it.

dt=self.ts.dt,
filename=self.filename,
)
new[self.ts.frame]
new.ts = self.ts.copy()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line was problematic as base.Timestep.copy() isn't respectful of MemoryReader behaviour, see #2081. I don't think removing it causes any problems, as new[self.ts.frame] should populate the Timestep with everything needed

@richardjgowers richardjgowers changed the title Issue 2076 bettermemoryreader MemoryReader facelift Sep 24, 2018
@richardjgowers
Copy link
Member Author

@MDAnalysis/coredevs ok this could do with some eyes, it's got some nice MR fixes and improvements

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments/questions; someone else should probably also have a look.

@@ -315,8 +359,18 @@ def __init__(self, coordinate_array, order='fac',

self.ts = self._Timestep(self.n_atoms, **kwargs)
self.ts.dt = dt
if dimensions is not None:
self.ts.dimensions = dimensions
if dimensions is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do

if dimensions is None:
  pass
elif dimensions.shape == (6,):
  dimensions = np.tile(dimensions, (self.n_frames, 1))
elif dimensions.shape != (self.n_frames, 6):
  raise Hell
self.dimensions_array = dimensions

n_atoms=n_atoms,
velocities=velocities, forces=forces)
coords = np.zeros((n_atoms, 3), dtype=np.float32)
if velocities:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ternary?

vels = np.zeros((n_atoms, 3), dtype=np.float32) if velocities else None
forces = np.zeros((n_atoms, 3), dtype=np.float32) if forces else None

for compactness?

forces = np.zeros((n_atoms, 3), dtype=np.float32)
else:
forces = None
u.trajectory = get_reader_for(coords)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does an numpy array trigger MemoryReader selection? (I think so... can't remember the discussion a while back.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that this creates a MemoryReader.

# This is significantly faster, but only implemented for certain
# trajectory file formats
try:
coordinates = self.trajectory.timeseries(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that we do not have dedicated C-level timeseries anymore and thus the self.trajectory.timeseries was essentially just doing what happened in the except block? Or are there still cases where the timeseries approach would be beneficial?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only DCDReader ever had a fast timeseries (and still does). It didn't return dimensions information though, which is needed. We could change DCD.timeseries to have DCD.timeseries(...., dimensions=True) to then return both coordinates and dimensions.

Then maybe the base.Reader could implement the entire else: branch below (I started this in #1400 once...)

So yeah maybe I should try and use the fast DCD routine here still since it still exists, then manually grab the dimensions..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that you improve performance by cycling through the trajectory twice. Maybe leave as you have, add a note, and raise an issue so that it is not forgotten. A before/after benchmark (or Benchmark) would be good, just so that we have an idea how much slower this gets. analysis.encore uses DCD timeseries reading quite a bit, IIRC.

pm = ProgressMeter(n_frames, interval=1,
verbose=verbose, format=pm_format)
coordinates = np.zeros((n_frames, n_atoms, 3), dtype=np.float32)
# TODO: This assumes constant...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment.

has_fors = ts.has_forces
has_dims = ts.dimensions is not None

if has_vels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ternary operator for readability?

velocities = np.zeros_like(coordinates) if has_vels or None

Does Python short-circuit, i.e., is the creation of np.zeros_like(coordinates) skipped if not needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python (at least 3.5.3) properly short-circuits and does not execute the function in the skipped part:

In [1]: def dome():
   ...:     print("did it")
   ...:     return 10
   ...:

In [2]: x = dome() if True else 0
did it

In [3]: x = dome() if False else 0

mr2 = mr_reader.copy()

ts = mr2.ts
setattr(ts, attr, 7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me how this works: Are you setting e.g. ts.positions to 7? All positions in the array or are you replacing the array with the integer "7"?

If it is the latter, shouldn't we doing the former?

ts = mr2[1]
ts = mr2[0]

assert_equal(getattr(ts, attr), 7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array comparison? But the arrays should be float32 so you really ought to use assert_almost_equal.

@codecov
Copy link

codecov bot commented Sep 25, 2018

Codecov Report

Merging #2080 into develop will decrease coverage by <.01%.
The diff coverage is 96.15%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2080      +/-   ##
===========================================
- Coverage    89.41%   89.41%   -0.01%     
===========================================
  Files          159      157       -2     
  Lines        18731    18759      +28     
  Branches      2696     2709      +13     
===========================================
+ Hits         16749    16774      +25     
- Misses        1381     1382       +1     
- Partials       601      603       +2
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/__init__.py 100% <ø> (ø) ⬆️
package/MDAnalysis/coordinates/memory.py 96.02% <100%> (+1.74%) ⬆️
package/MDAnalysis/core/universe.py 94.39% <88.46%> (-0.65%) ⬇️
coordinates/__init__.py 100% <0%> (ø) ⬆️
dummy.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edb1643...d86dc42. Read the comment docs.

# single box, tile this to trajectory length
# allows modifying the box of some frames
dimensions = np.tile(dimensions, (self.n_frames, 1))
elif dimensions.shape != (self.n_frames, 6):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be protected against AttributeError if anything is passed through that is not a numpy array?

mapping of residues to segments
trajectory : bool, optional
if True, attaches a dummy reader to the Universe, therefore
if True, attaches a MemoryReader to the Universe, therefore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this (and others) a link

:class:`~MDAnalysis.coordinates.memory.MemoryReader`

"This must be a array of shape (6,) or "
"(n_frames, 6)".format(dimensions.shape))
self.dimensions_array = dimensions
dimensions = np.zeros((self.n_frames, 6), dtype=np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide to always have dimensions as an array, even if it contains zeros? (I assume this is more consistent with how ts.dimensions works, and it allows setting the box if required – more flexibility!) 👍

Copy link
Member

@zemanj zemanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of type checking, I'd prefer calling numpy.asarray(attr) if attr is supposed to be a numpy array (eventually). That allows greater type flexibility and will hopefully throw an appropriate error message if converting to a numpy array fails.

try:
if velocities.ndim == 2:
velocities = velocities[np.newaxis, :, :]
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just checks whether the velocities object has an attribute ndim. For type checking, better use isinstance(). But since that's not really "pythonic" anyway, I'd prefer calling numpy.asarray(velocities) before doing attribute checks.

try:
if forces.ndim == 2:
forces = forces[np.newaxis, :, :]
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above,I'd prefer using numpy.asarray(forces).

raise ValueError("Provided dimensions array has shape {}. "
"This must be a array of shape (6,) or "
"(n_frames, 6)".format(dimensions.shape))
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, better use numpy.asarray(dimensions)

Universe.empty now creates a MemoryReader trajectory

Removed DummyReader (it was buggy)

added varying box size to MemoryReader

Universe.transfer_to_memory now also transfer velocities and forces

MemoryReader now always has array of dimensions

fixes Issue #2081 #2076 #2077 #1041
@richardjgowers
Copy link
Member Author

Ok changed to use np.asarray, should be good to go hopefully

@richardjgowers richardjgowers added this to the 0.19.0 milestone Oct 3, 2018
@richardjgowers
Copy link
Member Author

@MDAnalysis/coredevs if this gets reviewed I can finish up 0.19.0

@orbeckst
Copy link
Member

orbeckst commented Oct 4, 2018

@zemanj could you please have a look when you can spare a moment?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me – but I am not an expert on the MemoryReader. If anyone else (@zemanj @jbarnoud @wouterboomsma ) might want to have a quick look then that would be great.

(We would like to merge this soon in order to get the 0.19.0 release out...)

@richardjgowers perhaps open an issue noting that we might want to revisit the MemoryReader once we have timeseries support with dimensions for all readers (so that fast timeseries-slurpers such as the one for DCD and possibly XTC) can be used.

@richardjgowers richardjgowers merged commit 97db6d1 into develop Oct 9, 2018
@richardjgowers richardjgowers deleted the issue-2076-bettermemoryreader branch October 9, 2018 00:13
@wouterboomsma
Copy link
Contributor

Thanks for notifying me, @orbeckst. I haven't followed recent development in MDAnalysis very closely, and therefore don't have anything intelligent to add to what has already been discussed. Good to see that MemoryReader is still being used. Thanks for putting in the effort in bringing this up to date @richardjgowers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants