Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine trajectory iterators with zip #747

Closed
kain88-de opened this issue Feb 29, 2016 · 25 comments
Closed

combine trajectory iterators with zip #747

kain88-de opened this issue Feb 29, 2016 · 25 comments

Comments

@kain88-de
Copy link
Member

I recently had to iterate over pairs of frames in a trajectory. My first approach was this.

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PDB, XTC

u = mda.Universe(PDB, XTC)

for ts_a, ts_b in zip(u.trajectory[:-1], u.trajectory[1:]):
    print(ts_a.frame, ts_b.frame)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)
>>> (9, 9)

As you can see this doesn't work. In the mean time a fix for me is

pos_prev = u.atoms.positions.copy()
for ts in u.trajectory[1:]:
    # do stuff
    pos_prev = u.atoms.positions.copy()

That works but is not very nice. It would already help if I could copy an AtomGroup instead of just the coordinates.

@jbarnoud
Copy link
Contributor

Do you have the same issue with izip? Here, zip creates a list and unroll your trajectory to the end. When you get to first loop iteration you are already looking at the last TimeStep.

Somebody mentioned that it would be feasible to have a full TimeStep for each frame rather than updating the same TimeStep. I don't know what it would cost performance wise, but I like the idea.

@richardjgowers
Copy link
Member

I think @kain88-de did that on the XDR rework (temporarily, it went back for uniformity) and apparently there wasn't a big performance hit. I'd want to check with a large (1M+) system to see what the memory usage is like for a quick iteration (just getting [ts.frame for ts in u.trajectory]). You'd be allocating and deallocating via garbage collection n times for n frames, so it'd be down to how good Python is at that.

But I'd like it if it could work, it would make things like this work nicely.

The other thing is I don't know if anywhere in the code we're using a pointer to .ts somewhere that will get broken once another ts is created.

@kain88-de
Copy link
Member Author

Do you have the same issue with izip?

Yes using izip fixes the problem. Ok once we fully suppport Python3 this shouldn't be a problem anymore.

@richardjgowers
Copy link
Member

I get this with the DCD Reader?

In [6]: for ts1, ts2 in izip(u.trajectory[:-2], u.trajectory[2:]):
    print ts1.frame, ts2.frame
   ...:     
2 2
3 3
4 4
5 5
6 6
7 7
8 8

Is xdr making a new ts every step?

@kain88-de
Copy link
Member Author

Is xdr making a new ts every step?

As far as I can tell no.

@kain88-de
Copy link
Member Author

For me this works as expected

from six.moves import zip
u = mda.Universe(PSF, DCD)

for ts_a, ts_b in zip(u.trajectory[:-2], u.trajectory[2:]):
    print(ts_a.frame, ts_b.frame)

@orbeckst
Copy link
Member

orbeckst commented Mar 1, 2016

Using old zip will give you a reference to the ts and then when you look at the list you are at the last one. You must use izip to get generators, that's expected behavior. I thought that was well-known but maybe we should have a "tips and tricks" page. Anyway, will close because apart from documenting it I don't really think anything needs to be changed.

(Reopen if you disagree and make a suggestion what to do.)

@orbeckst orbeckst closed this as completed Mar 1, 2016
@kain88-de kain88-de reopened this Mar 1, 2016
@kain88-de
Copy link
Member Author

I get this with the DCD Reader?

I might have been a little bit tired yesterday. This is the same I get with XTC using iterators. So this is broken without returning new copies of the timesteps.

@richardjgowers
Copy link
Member

Yeah I get the same with xdr. So what's happening is when izip does the [2:] generator it changes the result of the [:-2] generator. Like a miniature version of the zip problem inside the izip iteration.

I don't mind moving to a new copy of ts for each frame so long as it doesn't kill performance.

@orbeckst
Copy link
Member

orbeckst commented Mar 1, 2016

All of this used work, didn't it? If so what broke it?

Oliver Beckstein
email: orbeckst@gmail.com

Am Mar 1, 2016 um 4:38 schrieb Richard Gowers notifications@github.com:

Yeah I get the same with xdr. So what's happening is when izip does the [2:] generator it changes the result of the [:-2] generator. Like a miniature version of the zip problem inside the izip iteration.

I don't mind moving to a new copy of ts for each frame so long as it doesn't kill performance.


Reply to this email directly or view it on GitHub.

@kain88-de
Copy link
Member Author

https://youtu.be/WjJUPxKB164?t=1117

maybe we have a problem with the generators we returned?

@richardjgowers
Copy link
Member

I think it's just how our Reader model works currently?

# at Reader init
Reader.ts = Timestep()

u.trajectory[10]
# Reader goes to frame 10
# Reader fills .ts with frame 10 data
# Reader returns a **view** onto its .ts object

So then when we do the izip

# still only one ts object
Reader.ts = Timestep()

for ts1, ts2 in izip(u.trajectory[], u.trajectory[]):
    # izip asks the reader for the first frame from the first generator, changing the state of .ts
    # izip asks the reader for the first frame from the second generator, changing the state of .ts
    # izip returns these two (identical) objects, which are at the state of ts2 as that happened afterwards

So with the reference to the video, yeah we could make each iteration over the trajectory have its own .ts object, (this would also be parallel to #239 which wants a unique file handle for each iterator). We'd then have to think about what Reader.ts is at the end of an iteration

@orbeckst
Copy link
Member

orbeckst commented Mar 1, 2016

I'd like this to work again.

@orbeckst orbeckst added this to the 0.14.1 - Bugfixes milestone Mar 1, 2016
@richardjgowers
Copy link
Member

@orbeckst I'm not sure this will have ever worked. The izip defect is just a smaller version of the zip case - it uses the same object to generate ts, so it changes the state of ts.

I think I like @kain88-de 's idea above to have one Timestep per generator (if that's what he was hinting at?). Then we're not reallocating memory for every frame we read.

@kain88-de
Copy link
Member Author

think I like @kain88-de 's idea above to have one Timestep per generator (if that's what he was hinting at?). Then we're not reallocating memory for every frame we read.

Yeah that would work. Then we would only need to update the TimeStep of the generator. Does it take much to do this?

@richardjgowers
Copy link
Member

Errr, we'd have to put in creating a Timestep to every generator, so for __iter__ it would roughly look like

    def __iter__(self):
        self._reopen()
        ts = self._Timestep()  # need to remember args for this in __init__
        ts = self.ts.copy()  # or maybe this is better?
        while True:
            try:
                yield self._read_next_timestep(ts)  # need to pass in local version of ts to this method now
            except (EOFError, IOError):
                self.rewind()
                raise StopIteration

So the signature of _read_next_timestep now has to accept a ts to fill rather than use the one attached to the class.

I think that all Readers use most of the methods from the base class to do iteration, so it might not be too bad, just hacking their _read_next slightly and then the iterators on base.Reader

@kain88-de
Copy link
Member Author

So for my XTC reader this would already work. I'm not sure about the others.

@richardjgowers
Copy link
Member

Ok, so I just realised this doesn't really work for us

ag = u.select_atoms('something')

for ts1, ts2 in izip(u.traj[10:20], u.traj[20:30]):
    ag.positions = ???

So which ts does AtomGroup refer to?

@orbeckst
Copy link
Member

orbeckst commented Mar 3, 2016

I have no idea how you would make this work but in the context if RMSD we always said that you need two universes.

Oliver Beckstein
email: orbeckst@gmail.com

Am Mar 3, 2016 um 4:11 schrieb Richard Gowers notifications@github.com:

Ok, so I just realised this doesn't really work for us

ag = u.select_atoms('something')

for ts1, ts2 in izip(u.traj[10:20], u.traj[20:30]):
ag.positions = ???
So which ts does AtomGroup refer to?


Reply to this email directly or view it on GitHub.

@hainm
Copy link
Contributor

hainm commented Mar 3, 2016

alternatively, you can try itertools.tee to create independent iterators.

https://docs.python.org/3/library/itertools.html#itertools.tee

@jbarnoud
Copy link
Contributor

jbarnoud commented Mar 3, 2016

@hainm This will not solve the problem. When one of the iterator will move forward, it will change the Timestep for all the other iterators as they all share a view to the same object.

I am not sure the issue can be solved as the dynamic aspect of the AtomGroups is based on the fact that ts is a view. I would be tempted to close the issue with a won't fix tag. I someone want to use zip on trajectories, he should use izip on different universes. Hopefully, with #363, Universes will become light enough so it won't be a problem to have a bunch of them.

@hainm
Copy link
Contributor

hainm commented Mar 3, 2016

@hainm This will not solve the problem. When one of the iterator will move forward, it will change the Timestep for all the other iterators as they all share a view to the same object.

ah, yeah.

@richardjgowers
Copy link
Member

@kain88-de I'm thinking this is wont-fix, with the solution being use 2 Universes

@jbarnoud moving forward I'd like to have 2 universes share a topology, so very light indeed :)

@kain88-de kain88-de added wontfix and removed defect labels Mar 3, 2016
@kain88-de
Copy link
Member Author

I'm ok with wont-fix

@orbeckst
Copy link
Member

orbeckst commented Mar 3, 2016

Yes, wont fix.

(Btw, I am sorry, I didn't realize that the zip was over the same universe... of course, that never worked.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants