Default numpy array layout for MemoryReader #1063

wouterboomsma · 2016-11-02T20:37:18Z

MemoryReader layout

Change the MemoryReader default order ("format") to fac (to be compatible with AnalysisFromFunction() and friends.
Change the keyword argument "format" to "order" (and check other parts of the code (eg ENCORE) as well as the docs to follow this change)

changed MemoryReader
changed docs (see Issue 1047 encore docs #1056)
tests

Background

There are currently two different default layouts for numpy arrays in MDAnalysis. DCDReader.timeseries() uses an 'afc' format by default, which has now also been adopted as the default in MemoryReader, both when constructing a new MemoryReader, and when extracting the raw array using MemoryReader.timeseries().
On the other hand, the natural order when using AnalysisFromFunction is 'fac', which also intuitively (at least to me) is a more natural order.

These different default layouts leads to some unnecessary confusion. Note, for instance, the swapaxes that is necessary when extracting coordinates using AnalysisFromFunction to use to construct a new Universe with MemoryReader:

  coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
                                     u.atoms).run().results.swapaxes(0, 1)
  u2 = mda.Universe(PDB, coordinates, format=MemoryReader)

Since MemoryReader is introduced in this release, it is still relatively easy to change the default layout of MemoryReader, without having to worry about breaking code. So, the question is: should we change the default layout of MemoryReader so that it upon construction uses 'fac' instead?
And if so, do we then keep the default of MemoryReader.timeseries() to be 'afc', to maintain consistency with DCDReader.timeseries?

Currently version of MDAnalysis:

0.16.0-dev0

History

2016-11-03 edited by @orbeckst (added agreed upon course of action and check list)
2016-11-06 edited by @orbeckst (change "format" to "order")

The text was updated successfully, but these errors were encountered:

richardjgowers · 2016-11-02T20:52:04Z

For those of us who haven't touched Timeseries and friends yet. what is afc and fac?

wouterboomsma · 2016-11-02T20:56:22Z

Hehe...sorry about that: a=atoms, f=frames, c=coordinates. So fac means that the first dimension in the numpy array corresponds to the frames, while the second corresponds to the atoms, and the third to the three coordinates.

kain88-de · 2016-11-02T21:02:56Z

I'm for the fac order. This is also the default in other md-python packages.

richardjgowers · 2016-11-03T11:28:54Z

thanks @wouterboomsma

Yeah it makes more sense for MemoryReader to use fac, a Reader is meant to serve up frame by frame. Also a good idea to "fix" this before it hits a release.

I can see why afc is useful sometimes though, so can we keep the option in timeseries the same? As long as you can always specify upon creation and avoid swapping it should be the same? Also, MR.timeseries and DCD.timeseries should behave the same to avoid headaches!

orbeckst · 2016-11-04T00:23:18Z

Also in favor of fac order for MemoryReader.

MR.timeseries and DCD.timeseries should behave the same to avoid headaches!

Yes.

At some point we need to look at the performance of timeseries and friends. I think DCD.timeseries outperforms frame-by-frame ingestion and it might be useful to write C-level timeseries implementations for some of the other major formats. But until that day I would rather not change the defaults but just document clearly what is going on.

orbeckst · 2016-11-04T00:39:29Z

While we're at it: should we change the name of the MemoryReader's format kwarg to something like order? That would allow us to pass it through a Universe

u = Universe(PSF, array, order="afc")

At the moment, Universe claims any format kwarg for itself and does not pass it on to the reader.

I know that DCD.timeseries() uses format but I think I could live with that inconsistency between MemoryReader and timeseries until the day we work on timeseries and change things in timeseries anyway...

EDIT: We could keep a not-documented format kwarg in MemoryReader that is overridden by order — if so desired.

kain88-de · 2016-11-04T08:33:28Z

I'm ok with using order the inconsistency doesn't bother me that much since timeseries aren't generally available yet.

kain88-de · 2016-11-04T08:44:50Z

At some point we need to look at the performance of timeseries and friends. I think DCD.timeseries outperforms frame-by-frame ingestion and it might be useful to write C-level timeseries implementations for some of the other major formats.

Is there another issue for that? And no the frame-by-frame ingestion is just as fast as a timeseries for DCD files (at least reading the coordinates).

# use workshop equilibrium trajectories.
from MDAnalysis.analysis.base import AnalysisFromFunction

u = mda.Universe('adk4AKE.psf', '1ake_009-nowater-core-dt240ps.dcd')

%timeit u.trajectory.timeseries(u.atoms, format='fac')
# 600 ms
%timeit AnalysisFromFunction(lambda ag: ag.positions.copy(), u.trajectory, u.atoms).run().results
# 500 ms

orbeckst · 2016-11-04T09:04:34Z

That's good news then that frame-by-frame is competitive with timeseries. (I am actually a bit surprised, given that timeseries fills the arrays at the C-level without ever going to Python but you never know until you actually benchmark.) So it seems that there is really no great urgency to look at timeseries because with
AnalysisFromFunction() we have pretty much an equivalent. (One could even add base.Reader.timeseries(), just for convenience and consistency.)

orbeckst · 2016-11-07T04:56:03Z

I updated the TODO to also include changing the "format" kwarg to "order".

orbeckst · 2016-11-14T18:54:45Z

@wouterboomsma I assigned this to you – feel free to delegate.

wouterboomsma · 2016-11-19T12:42:06Z

@orbeckst Yeah, it would be most natural for me to implement this. I'm a bit pressed for time this week, but should be able to find some time next weekend.

orbeckst · 2016-11-19T16:49:53Z

Would be great of this could get done over the next two weeks. Add a note here when you get started.

If anyone else feels like having a go at it earlier then they should simply add you as a code reviewer, code review should be less work than writing it yourself.

Oliver Beckstein
email: orbeckst@gmail.com

Am Nov 19, 2016 um 5:42 schrieb Wouter Boomsma notifications@github.com:

@orbeckst Yeah, it would be most natural for me to implement this. I'm a bit pressed for time this week, but should be able to find some time next weekend.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

wouterboomsma · 2016-11-28T00:27:18Z

@orbeckst I'm very sorry, but I simply haven't been able to find the time this weekend, and looking at my calendar it is unrealistic that I will be able to do much before Dec 12. I can definitely do it in the week of the 12th though.

orbeckst · 2016-11-28T23:14:07Z

Ok, thanks for the update, @wouterboomsma . Could you assign #1065 to yourself to keep track of what needs doing? (I cannot assign you as you haven't accepted the invitation for collaborator status). If someone else (@mtiberti ?) wants to take it on they can assign themselves, too – there can be more than one. Just communicate through the issue what you're doing.

wouterboomsma · 2016-11-30T09:22:03Z

@orbeckst, I don't seem to be able to assign issues (could be my ignorance of how github works). I've now accepted the invitation (sorry about the delay). If you could assign #1065 to me that would be great.

kain88-de · 2016-11-30T09:27:46Z

@wouterboomsma I assigned the issue to you. It might be that you are not allowed to do that as a reading member of this repository only.

orbeckst mentioned this issue Nov 4, 2016

Issue 1047 encore docs #1056

Merged

2 tasks

orbeckst added API Component-Readers labels Nov 4, 2016

orbeckst added this to the 0.16.0 milestone Nov 4, 2016

orbeckst added the Format-MemoryReader label Nov 7, 2016

orbeckst assigned wouterboomsma Nov 14, 2016

orbeckst mentioned this issue Nov 22, 2016

MemoryReader and MDAnalysis.analysis.distances.distance_array #1065

Closed

wouterboomsma mentioned this issue Dec 20, 2016

Changed default memory layout of MemoryReader to 'fac' #1132

Merged

4 tasks

kain88-de closed this as completed in #1132 Jan 15, 2017

orbeckst mentioned this issue Jul 4, 2017

Adds Reader.timeseries #1400

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default numpy array layout for MemoryReader #1063

Default numpy array layout for MemoryReader #1063

wouterboomsma commented Nov 2, 2016 •

edited by orbeckst

Loading

richardjgowers commented Nov 2, 2016

wouterboomsma commented Nov 2, 2016

kain88-de commented Nov 2, 2016

richardjgowers commented Nov 3, 2016

orbeckst commented Nov 4, 2016

orbeckst commented Nov 4, 2016 •

edited

Loading

kain88-de commented Nov 4, 2016

kain88-de commented Nov 4, 2016

orbeckst commented Nov 4, 2016

orbeckst commented Nov 7, 2016

orbeckst commented Nov 14, 2016

wouterboomsma commented Nov 19, 2016

orbeckst commented Nov 19, 2016

wouterboomsma commented Nov 28, 2016

orbeckst commented Nov 28, 2016

wouterboomsma commented Nov 30, 2016

kain88-de commented Nov 30, 2016

Default numpy array layout for MemoryReader #1063

Default numpy array layout for MemoryReader #1063

Comments

wouterboomsma commented Nov 2, 2016 • edited by orbeckst Loading

MemoryReader layout

Background

Currently version of MDAnalysis:

History

richardjgowers commented Nov 2, 2016

wouterboomsma commented Nov 2, 2016

kain88-de commented Nov 2, 2016

richardjgowers commented Nov 3, 2016

orbeckst commented Nov 4, 2016

orbeckst commented Nov 4, 2016 • edited Loading

kain88-de commented Nov 4, 2016

kain88-de commented Nov 4, 2016

orbeckst commented Nov 4, 2016

orbeckst commented Nov 7, 2016

orbeckst commented Nov 14, 2016

wouterboomsma commented Nov 19, 2016

orbeckst commented Nov 19, 2016

wouterboomsma commented Nov 28, 2016

orbeckst commented Nov 28, 2016

wouterboomsma commented Nov 30, 2016

kain88-de commented Nov 30, 2016

wouterboomsma commented Nov 2, 2016 •

edited by orbeckst

Loading

orbeckst commented Nov 4, 2016 •

edited

Loading