-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default numpy array layout for MemoryReader #1063
Comments
For those of us who haven't touched Timeseries and friends yet. what is afc and fac? |
Hehe...sorry about that: a=atoms, f=frames, c=coordinates. So fac means that the first dimension in the numpy array corresponds to the frames, while the second corresponds to the atoms, and the third to the three coordinates. |
I'm for the fac order. This is also the default in other md-python packages. |
thanks @wouterboomsma Yeah it makes more sense for MemoryReader to use fac, a Reader is meant to serve up frame by frame. Also a good idea to "fix" this before it hits a release. I can see why afc is useful sometimes though, so can we keep the option in timeseries the same? As long as you can always specify upon creation and avoid swapping it should be the same? Also, MR.timeseries and DCD.timeseries should behave the same to avoid headaches! |
Also in favor of fac order for MemoryReader.
Yes. At some point we need to look at the performance of |
While we're at it: should we change the name of the MemoryReader's u = Universe(PSF, array, order="afc") At the moment, I know that EDIT: We could keep a not-documented |
I'm ok with using |
Is there another issue for that? And no the frame-by-frame ingestion is just as fast as a timeseries for DCD files (at least reading the coordinates). # use workshop equilibrium trajectories.
from MDAnalysis.analysis.base import AnalysisFromFunction
u = mda.Universe('adk4AKE.psf', '1ake_009-nowater-core-dt240ps.dcd')
%timeit u.trajectory.timeseries(u.atoms, format='fac')
# 600 ms
%timeit AnalysisFromFunction(lambda ag: ag.positions.copy(), u.trajectory, u.atoms).run().results
# 500 ms |
That's good news then that frame-by-frame is competitive with timeseries. (I am actually a bit surprised, given that timeseries fills the arrays at the C-level without ever going to Python but you never know until you actually benchmark.) So it seems that there is really no great urgency to look at timeseries because with |
I updated the TODO to also include changing the "format" kwarg to "order". |
@wouterboomsma I assigned this to you – feel free to delegate. |
@orbeckst Yeah, it would be most natural for me to implement this. I'm a bit pressed for time this week, but should be able to find some time next weekend. |
Would be great of this could get done over the next two weeks. Add a note here when you get started. If anyone else feels like having a go at it earlier then they should simply add you as a code reviewer, code review should be less work than writing it yourself. Oliver Beckstein
|
@orbeckst I'm very sorry, but I simply haven't been able to find the time this weekend, and looking at my calendar it is unrealistic that I will be able to do much before Dec 12. I can definitely do it in the week of the 12th though. |
Ok, thanks for the update, @wouterboomsma . Could you assign #1065 to yourself to keep track of what needs doing? (I cannot assign you as you haven't accepted the invitation for collaborator status). If someone else (@mtiberti ?) wants to take it on they can assign themselves, too – there can be more than one. Just communicate through the issue what you're doing. |
@wouterboomsma I assigned the issue to you. It might be that you are not allowed to do that as a reading member of this repository only. |
MemoryReader layout
AnalysisFromFunction()
and friends.Background
There are currently two different default layouts for numpy arrays in MDAnalysis.
DCDReader.timeseries()
uses an 'afc' format by default, which has now also been adopted as the default inMemoryReader
, both when constructing a newMemoryReader
, and when extracting the raw array usingMemoryReader.timeseries()
.On the other hand, the natural order when using
AnalysisFromFunction
is 'fac', which also intuitively (at least to me) is a more natural order.These different default layouts leads to some unnecessary confusion. Note, for instance, the
swapaxes
that is necessary when extracting coordinates usingAnalysisFromFunction
to use to construct a new Universe with MemoryReader:Since
MemoryReader
is introduced in this release, it is still relatively easy to change the default layout of MemoryReader, without having to worry about breaking code. So, the question is: should we change the default layout of MemoryReader so that it upon construction uses 'fac' instead?And if so, do we then keep the default of
MemoryReader.timeseries()
to be 'afc', to maintain consistency withDCDReader.timeseries
?Currently version of MDAnalysis:
0.16.0-dev0
History
The text was updated successfully, but these errors were encountered: