combine trajectory iterators with zip #747

kain88-de · 2016-02-29T12:57:09Z

I recently had to iterate over pairs of frames in a trajectory. My first approach was this.

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PDB, XTC

u = mda.Universe(PDB, XTC)

for ts_a, ts_b in zip(u.trajectory[:-1], u.trajectory[1:]):
    print(ts_a.frame, ts_b.frame)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)

As you can see this doesn't work. In the mean time a fix for me is

pos_prev = u.atoms.positions.copy()
for ts in u.trajectory[1:]:
    # do stuff
    pos_prev = u.atoms.positions.copy()

That works but is not very nice. It would already help if I could copy an AtomGroup instead of just the coordinates.

jbarnoud · 2016-02-29T18:08:49Z

Do you have the same issue with izip? Here, zip creates a list and unroll your trajectory to the end. When you get to first loop iteration you are already looking at the last TimeStep.

Somebody mentioned that it would be feasible to have a full TimeStep for each frame rather than updating the same TimeStep. I don't know what it would cost performance wise, but I like the idea.

richardjgowers · 2016-02-29T18:22:59Z

I think @kain88-de did that on the XDR rework (temporarily, it went back for uniformity) and apparently there wasn't a big performance hit. I'd want to check with a large (1M+) system to see what the memory usage is like for a quick iteration (just getting [ts.frame for ts in u.trajectory]). You'd be allocating and deallocating via garbage collection n times for n frames, so it'd be down to how good Python is at that.

But I'd like it if it could work, it would make things like this work nicely.

The other thing is I don't know if anywhere in the code we're using a pointer to .ts somewhere that will get broken once another ts is created.

kain88-de · 2016-02-29T18:52:24Z

Do you have the same issue with izip?

Yes using izip fixes the problem. Ok once we fully suppport Python3 this shouldn't be a problem anymore.

richardjgowers · 2016-02-29T18:58:19Z

I get this with the DCD Reader?

In [6]: for ts1, ts2 in izip(u.trajectory[:-2], u.trajectory[2:]):
    print ts1.frame, ts2.frame
   ...:     
2 2
3 3
4 4
5 5
6 6
7 7
8 8

Is xdr making a new ts every step?

kain88-de · 2016-02-29T19:09:43Z

Is xdr making a new ts every step?

As far as I can tell no.

kain88-de · 2016-02-29T21:37:51Z

For me this works as expected

from six.moves import zip
u = mda.Universe(PSF, DCD)

for ts_a, ts_b in zip(u.trajectory[:-2], u.trajectory[2:]):
    print(ts_a.frame, ts_b.frame)

orbeckst · 2016-03-01T01:10:50Z

Using old zip will give you a reference to the ts and then when you look at the list you are at the last one. You must use izip to get generators, that's expected behavior. I thought that was well-known but maybe we should have a "tips and tricks" page. Anyway, will close because apart from documenting it I don't really think anything needs to be changed.

(Reopen if you disagree and make a suggestion what to do.)

kain88-de · 2016-03-01T08:23:22Z

I get this with the DCD Reader?

I might have been a little bit tired yesterday. This is the same I get with XTC using iterators. So this is broken without returning new copies of the timesteps.

richardjgowers · 2016-03-01T11:38:54Z

Yeah I get the same with xdr. So what's happening is when izip does the [2:] generator it changes the result of the [:-2] generator. Like a miniature version of the zip problem inside the izip iteration.

I don't mind moving to a new copy of ts for each frame so long as it doesn't kill performance.

orbeckst · 2016-03-01T17:18:49Z

All of this used work, didn't it? If so what broke it?

Oliver Beckstein
email: orbeckst@gmail.com

Am Mar 1, 2016 um 4:38 schrieb Richard Gowers notifications@github.com:

Yeah I get the same with xdr. So what's happening is when izip does the [2:] generator it changes the result of the [:-2] generator. Like a miniature version of the zip problem inside the izip iteration.

I don't mind moving to a new copy of ts for each frame so long as it doesn't kill performance.

—
Reply to this email directly or view it on GitHub.

kain88-de · 2016-03-01T19:17:59Z

https://youtu.be/WjJUPxKB164?t=1117

maybe we have a problem with the generators we returned?

richardjgowers · 2016-03-01T19:32:37Z

I think it's just how our Reader model works currently?

# at Reader init
Reader.ts = Timestep()

u.trajectory[10]
# Reader goes to frame 10
# Reader fills .ts with frame 10 data
# Reader returns a **view** onto its .ts object

So then when we do the izip

# still only one ts object
Reader.ts = Timestep()

for ts1, ts2 in izip(u.trajectory[], u.trajectory[]):
    # izip asks the reader for the first frame from the first generator, changing the state of .ts
    # izip asks the reader for the first frame from the second generator, changing the state of .ts
    # izip returns these two (identical) objects, which are at the state of ts2 as that happened afterwards

So with the reference to the video, yeah we could make each iteration over the trajectory have its own .ts object, (this would also be parallel to #239 which wants a unique file handle for each iterator). We'd then have to think about what Reader.ts is at the end of an iteration

orbeckst · 2016-03-01T23:47:27Z

I'd like this to work again.

richardjgowers · 2016-03-02T13:00:22Z

@orbeckst I'm not sure this will have ever worked. The izip defect is just a smaller version of the zip case - it uses the same object to generate ts, so it changes the state of ts.

I think I like @kain88-de 's idea above to have one Timestep per generator (if that's what he was hinting at?). Then we're not reallocating memory for every frame we read.

kain88-de · 2016-03-02T13:07:45Z

think I like @kain88-de 's idea above to have one Timestep per generator (if that's what he was hinting at?). Then we're not reallocating memory for every frame we read.

Yeah that would work. Then we would only need to update the TimeStep of the generator. Does it take much to do this?

richardjgowers · 2016-03-02T13:14:27Z

Errr, we'd have to put in creating a Timestep to every generator, so for __iter__ it would roughly look like

    def __iter__(self):
        self._reopen()
        ts = self._Timestep()  # need to remember args for this in __init__
        ts = self.ts.copy()  # or maybe this is better?
        while True:
            try:
                yield self._read_next_timestep(ts)  # need to pass in local version of ts to this method now
            except (EOFError, IOError):
                self.rewind()
                raise StopIteration

So the signature of _read_next_timestep now has to accept a ts to fill rather than use the one attached to the class.

I think that all Readers use most of the methods from the base class to do iteration, so it might not be too bad, just hacking their _read_next slightly and then the iterators on base.Reader

kain88-de · 2016-03-02T15:58:38Z

So for my XTC reader this would already work. I'm not sure about the others.

richardjgowers · 2016-03-03T11:11:45Z

Ok, so I just realised this doesn't really work for us

ag = u.select_atoms('something')

for ts1, ts2 in izip(u.traj[10:20], u.traj[20:30]):
    ag.positions = ???

So which ts does AtomGroup refer to?

orbeckst · 2016-03-03T16:46:20Z

I have no idea how you would make this work but in the context if RMSD we always said that you need two universes.

Oliver Beckstein
email: orbeckst@gmail.com

Am Mar 3, 2016 um 4:11 schrieb Richard Gowers notifications@github.com:

Ok, so I just realised this doesn't really work for us

ag = u.select_atoms('something')

for ts1, ts2 in izip(u.traj[10:20], u.traj[20:30]):
ag.positions = ???
So which ts does AtomGroup refer to?

—
Reply to this email directly or view it on GitHub.

hainm · 2016-03-03T18:30:43Z

alternatively, you can try itertools.tee to create independent iterators.

https://docs.python.org/3/library/itertools.html#itertools.tee

jbarnoud · 2016-03-03T18:36:40Z

@hainm This will not solve the problem. When one of the iterator will move forward, it will change the Timestep for all the other iterators as they all share a view to the same object.

I am not sure the issue can be solved as the dynamic aspect of the AtomGroups is based on the fact that ts is a view. I would be tempted to close the issue with a won't fix tag. I someone want to use zip on trajectories, he should use izip on different universes. Hopefully, with #363, Universes will become light enough so it won't be a problem to have a bunch of them.

hainm · 2016-03-03T18:38:40Z

@hainm This will not solve the problem. When one of the iterator will move forward, it will change the Timestep for all the other iterators as they all share a view to the same object.

ah, yeah.

richardjgowers · 2016-03-03T18:50:14Z

@kain88-de I'm thinking this is wont-fix, with the solution being use 2 Universes

@jbarnoud moving forward I'd like to have 2 universes share a topology, so very light indeed :)

kain88-de · 2016-03-03T20:10:17Z

I'm ok with wont-fix

orbeckst · 2016-03-03T20:23:30Z

Yes, wont fix.

(Btw, I am sorry, I didn't realize that the zip was over the same universe... of course, that never worked.)

kain88-de added enhancement usability Component-Readers labels Feb 29, 2016

orbeckst closed this as completed Mar 1, 2016

kain88-de reopened this Mar 1, 2016

orbeckst added defect Component-Core labels Mar 1, 2016

orbeckst added this to the 0.14.1 - Bugfixes milestone Mar 1, 2016

orbeckst removed the enhancement label Mar 1, 2016

kain88-de added wontfix and removed defect labels Mar 3, 2016

kain88-de closed this as completed Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

combine trajectory iterators with zip #747

combine trajectory iterators with zip #747

kain88-de commented Feb 29, 2016

jbarnoud commented Feb 29, 2016

richardjgowers commented Feb 29, 2016

kain88-de commented Feb 29, 2016

richardjgowers commented Feb 29, 2016

kain88-de commented Feb 29, 2016

kain88-de commented Feb 29, 2016

orbeckst commented Mar 1, 2016

kain88-de commented Mar 1, 2016

richardjgowers commented Mar 1, 2016

orbeckst commented Mar 1, 2016

kain88-de commented Mar 1, 2016

richardjgowers commented Mar 1, 2016

orbeckst commented Mar 1, 2016

richardjgowers commented Mar 2, 2016

kain88-de commented Mar 2, 2016

richardjgowers commented Mar 2, 2016

kain88-de commented Mar 2, 2016

richardjgowers commented Mar 3, 2016

orbeckst commented Mar 3, 2016

hainm commented Mar 3, 2016

jbarnoud commented Mar 3, 2016

hainm commented Mar 3, 2016

richardjgowers commented Mar 3, 2016

kain88-de commented Mar 3, 2016

orbeckst commented Mar 3, 2016

combine trajectory iterators with zip #747

combine trajectory iterators with zip #747

Comments

kain88-de commented Feb 29, 2016

jbarnoud commented Feb 29, 2016

richardjgowers commented Feb 29, 2016

kain88-de commented Feb 29, 2016

richardjgowers commented Feb 29, 2016

kain88-de commented Feb 29, 2016

kain88-de commented Feb 29, 2016

orbeckst commented Mar 1, 2016

kain88-de commented Mar 1, 2016

richardjgowers commented Mar 1, 2016

orbeckst commented Mar 1, 2016

kain88-de commented Mar 1, 2016

richardjgowers commented Mar 1, 2016

orbeckst commented Mar 1, 2016

richardjgowers commented Mar 2, 2016

kain88-de commented Mar 2, 2016

richardjgowers commented Mar 2, 2016

kain88-de commented Mar 2, 2016

richardjgowers commented Mar 3, 2016

orbeckst commented Mar 3, 2016

hainm commented Mar 3, 2016

jbarnoud commented Mar 3, 2016

hainm commented Mar 3, 2016

richardjgowers commented Mar 3, 2016

kain88-de commented Mar 3, 2016

orbeckst commented Mar 3, 2016