Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default numpy array layout for MemoryReader #1063

Closed
3 tasks
wouterboomsma opened this issue Nov 2, 2016 · 17 comments
Closed
3 tasks

Default numpy array layout for MemoryReader #1063

wouterboomsma opened this issue Nov 2, 2016 · 17 comments

Comments

@wouterboomsma
Copy link
Contributor

wouterboomsma commented Nov 2, 2016

MemoryReader layout

  1. Change the MemoryReader default order ("format") to fac (to be compatible with AnalysisFromFunction() and friends.
  2. Change the keyword argument "format" to "order" (and check other parts of the code (eg ENCORE) as well as the docs to follow this change)

Background

There are currently two different default layouts for numpy arrays in MDAnalysis. DCDReader.timeseries() uses an 'afc' format by default, which has now also been adopted as the default in MemoryReader, both when constructing a new MemoryReader, and when extracting the raw array using MemoryReader.timeseries().
On the other hand, the natural order when using AnalysisFromFunction is 'fac', which also intuitively (at least to me) is a more natural order.

These different default layouts leads to some unnecessary confusion. Note, for instance, the swapaxes that is necessary when extracting coordinates using AnalysisFromFunction to use to construct a new Universe with MemoryReader:

  coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
                                     u.atoms).run().results.swapaxes(0, 1)
  u2 = mda.Universe(PDB, coordinates, format=MemoryReader)

Since MemoryReader is introduced in this release, it is still relatively easy to change the default layout of MemoryReader, without having to worry about breaking code. So, the question is: should we change the default layout of MemoryReader so that it upon construction uses 'fac' instead?
And if so, do we then keep the default of MemoryReader.timeseries() to be 'afc', to maintain consistency with DCDReader.timeseries?

Currently version of MDAnalysis:

0.16.0-dev0

History

  • 2016-11-03 edited by @orbeckst (added agreed upon course of action and check list)
  • 2016-11-06 edited by @orbeckst (change "format" to "order")
@richardjgowers
Copy link
Member

For those of us who haven't touched Timeseries and friends yet. what is afc and fac?

@wouterboomsma
Copy link
Contributor Author

Hehe...sorry about that: a=atoms, f=frames, c=coordinates. So fac means that the first dimension in the numpy array corresponds to the frames, while the second corresponds to the atoms, and the third to the three coordinates.

@kain88-de
Copy link
Member

I'm for the fac order. This is also the default in other md-python packages.

@richardjgowers
Copy link
Member

thanks @wouterboomsma

Yeah it makes more sense for MemoryReader to use fac, a Reader is meant to serve up frame by frame. Also a good idea to "fix" this before it hits a release.

I can see why afc is useful sometimes though, so can we keep the option in timeseries the same? As long as you can always specify upon creation and avoid swapping it should be the same? Also, MR.timeseries and DCD.timeseries should behave the same to avoid headaches!

@orbeckst orbeckst mentioned this issue Nov 4, 2016
2 tasks
@orbeckst orbeckst added this to the 0.16.0 milestone Nov 4, 2016
@orbeckst
Copy link
Member

orbeckst commented Nov 4, 2016

Also in favor of fac order for MemoryReader.

MR.timeseries and DCD.timeseries should behave the same to avoid headaches!

Yes.

At some point we need to look at the performance of timeseries and friends. I think DCD.timeseries outperforms frame-by-frame ingestion and it might be useful to write C-level timeseries implementations for some of the other major formats. But until that day I would rather not change the defaults but just document clearly what is going on.

@orbeckst
Copy link
Member

orbeckst commented Nov 4, 2016

While we're at it: should we change the name of the MemoryReader's format kwarg to something like order? That would allow us to pass it through a Universe

u = Universe(PSF, array, order="afc")

At the moment, Universe claims any format kwarg for itself and does not pass it on to the reader.

I know that DCD.timeseries() uses format but I think I could live with that inconsistency between MemoryReader and timeseries until the day we work on timeseries and change things in timeseries anyway...

EDIT: We could keep a not-documented format kwarg in MemoryReader that is overridden by order — if so desired.

@kain88-de
Copy link
Member

I'm ok with using order the inconsistency doesn't bother me that much since timeseries aren't generally available yet.

@kain88-de
Copy link
Member

At some point we need to look at the performance of timeseries and friends. I think DCD.timeseries outperforms frame-by-frame ingestion and it might be useful to write C-level timeseries implementations for some of the other major formats.

Is there another issue for that? And no the frame-by-frame ingestion is just as fast as a timeseries for DCD files (at least reading the coordinates).

# use workshop equilibrium trajectories.
from MDAnalysis.analysis.base import AnalysisFromFunction

u = mda.Universe('adk4AKE.psf', '1ake_009-nowater-core-dt240ps.dcd')

%timeit u.trajectory.timeseries(u.atoms, format='fac')
# 600 ms
%timeit AnalysisFromFunction(lambda ag: ag.positions.copy(), u.trajectory, u.atoms).run().results
# 500 ms

@orbeckst
Copy link
Member

orbeckst commented Nov 4, 2016

That's good news then that frame-by-frame is competitive with timeseries. (I am actually a bit surprised, given that timeseries fills the arrays at the C-level without ever going to Python but you never know until you actually benchmark.) So it seems that there is really no great urgency to look at timeseries because with
AnalysisFromFunction() we have pretty much an equivalent. (One could even add base.Reader.timeseries(), just for convenience and consistency.)

@orbeckst
Copy link
Member

orbeckst commented Nov 7, 2016

I updated the TODO to also include changing the "format" kwarg to "order".

@orbeckst
Copy link
Member

@wouterboomsma I assigned this to you – feel free to delegate.

@wouterboomsma
Copy link
Contributor Author

@orbeckst Yeah, it would be most natural for me to implement this. I'm a bit pressed for time this week, but should be able to find some time next weekend.

@orbeckst
Copy link
Member

Would be great of this could get done over the next two weeks. Add a note here when you get started.

If anyone else feels like having a go at it earlier then they should simply add you as a code reviewer, code review should be less work than writing it yourself.

Oliver Beckstein
email: orbeckst@gmail.com

Am Nov 19, 2016 um 5:42 schrieb Wouter Boomsma notifications@github.com:

@orbeckst Yeah, it would be most natural for me to implement this. I'm a bit pressed for time this week, but should be able to find some time next weekend.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@wouterboomsma
Copy link
Contributor Author

@orbeckst I'm very sorry, but I simply haven't been able to find the time this weekend, and looking at my calendar it is unrealistic that I will be able to do much before Dec 12. I can definitely do it in the week of the 12th though.

@orbeckst
Copy link
Member

Ok, thanks for the update, @wouterboomsma . Could you assign #1065 to yourself to keep track of what needs doing? (I cannot assign you as you haven't accepted the invitation for collaborator status). If someone else (@mtiberti ?) wants to take it on they can assign themselves, too – there can be more than one. Just communicate through the issue what you're doing.

@wouterboomsma
Copy link
Contributor Author

@orbeckst, I don't seem to be able to assign issues (could be my ignorance of how github works). I've now accepted the invitation (sorry about the delay). If you could assign #1065 to me that would be great.

@kain88-de
Copy link
Member

@wouterboomsma I assigned the issue to you. It might be that you are not allowed to do that as a reading member of this repository only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants