-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport fix for #2878 (pickle DCD and XDR readers at correct frame) #2908
Conversation
- fix issue #2878 - add tests to the base tests (but xfail for any format that does not implement pickling yet)
@yuxuanzhuang can you please have a look? The test test_pickle_next_ts_reader fails for DCD and XTC/TRR and I don't understand why. Am I missing another change somewhere?
|
I think the def __setstate__(self, state):
self.__dict__ = state
self[self.ts.frame] |
So if I understand correctly then I should not include the tests that check that a Reader can be pickled because that's really the new feature in 2.0.0. (For instance, I also didn't backport any |
I think so, maybe something similar to what was tested here ad48254#diff-7674f3848f653d737235c4f81a267c11R86 Besides, maybe it's a bit tricky to get to the last frame with only DCD/XDR file because you cannot do dcd.seek(len(dcd) - 1)
_ = dcd.read() # read next frame
dump = pickle.dumps(dcd)
new_dcd = pickle.loads(dump)
assert dcd.fname == new_dcd.fname
assert dcd.tell() == new_dcd.tell() |
These tests all FAIL at the moment.
all pickle tests FAIL at the moment
Hello @orbeckst! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-08-13 19:22:53 UTC |
@yuxuanzhuang I added additional testing for DCDFile pickling and XTC/TRRFile pickling and I find that even though Run
to see the failing tests. Furthermore, if I pickle a DCDFile before seeking then I cannot unpickle it (test test_pickle_immediately). Did you test the low-level pickling of the file classes or only the Readers? Any insights? (Also ping @kain88-de just in case he wants to take a look.) |
Sorry I am a bit confused, isn't the only error here is that Edit: Now I get the errors after I rerun build_ext. I will have a look. |
oops, the prec is my error – but I never saw it because my tests failed earlier. fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on code for additional rationale/explanations.
@@ -265,7 +265,8 @@ cdef class DCDFile: | |||
return | |||
|
|||
current_frame = state[1] | |||
self.seek(current_frame) | |||
self.seek(current_frame - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these lines make the low-level tests fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah...it messed up with the position of the file.
The fix is ditching current changes, and changing line 410 to:
if frame >= self.n_frames + 1:
Besides, I don't think current tests cover for the "true" last frame i.e.:
traj = DCDReader(DCD)
traj[-1] # move to last frame
dcd = traj._file
dcd.tell() == len(dcd) # instead of len(dcd) - 1
self.seek(current_frame - 1) | ||
self.current_frame = current_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these lines make the low-level tests fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current tests can pass if these changes are ditched, but still the last frame cannot be easily pickled as seek
here is sort of different from that in DCD files.
assert_almost_equal(frame.xyz, new_frame.xyz) | ||
assert_almost_equal(frame.unitcell, new_frame.unitcell) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests to check the contents of the current frame
|
||
assert dcd.fname == new_dcd.fname | ||
assert dcd.tell() == new_dcd.tell() | ||
def test_pickle(dcd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the generic pickle test start somewhere in the middle of the trajectory
|
||
def test_pickle_last(dcd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specific test for last frame (this was the original test_pickle
test)
def test_pickle_immediately(dcd): | ||
# do not seek before pickling: this seems to leave the DCDFile | ||
# object in weird state: is this supposed to work? | ||
new_dcd = pickle.loads(pickle.dumps(dcd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pickle always fails if we have never done a seek()
before attempting to pickle.
______________________________________ test_pickle_immediately _______________________________________
dcd = <MDAnalysis.lib.formats.libdcd.DCDFile object at 0x12a1e5890>
def test_pickle_immediately(dcd):
# do not seek before pickling: this seems to leave the DCDFile
# object in weird state: is this supposed to work?
> new_dcd = pickle.loads(pickle.dumps(dcd))
testsuite/MDAnalysisTests/formats/test_libdcd.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
MDAnalysis/lib/formats/libdcd.pyx:268: in MDAnalysis.lib.formats.libdcd.DCDFile.__setstate__
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E OSError: DCD seek failed with DCD error=Normal EOF
MDAnalysis/lib/formats/libdcd.pyx:422: OSError
I find this behavior surprising (as a user) but maybe I don't understand if this is expected. But in that case we should raise an appropriate exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks, then that's a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that it is only a bug in this PR... because it works with 1.0.0 and current develop
In [1]: import MDAnalysis as mda
In [2]: mda.__version__
Out[2]: '1.0.0'
In [3]: import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
For dcd
In [13]: dcd = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)
In [14]: pickle.dumps(dcd)
Out[14]: b'\x80\x03cMDAnalysis.lib.formats.libdcd\nDCDFile\nq\x00Xg\x00\x00\x00~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysisTests/data/coordinates/test.dcdq\x01X\x01\x00\x00\x00rq\x02\x86q\x03Rq\x04K\x01K\x00\x86q\x05b.'
In [15]: pickle.loads(pickle.dumps(dcd))
Out[15]: <MDAnalysis.lib.formats.libdcd.DCDFile at 0x118d75678>
and also works for trr
In [8]: trr = mda.lib.formats.libmdaxdr.TRRFile(data.COORDINATES_TRR)
In [9]: pickle.dumps(trr)
Out[9]: b'\x80\x03cMDAnalysis.lib.formats.libmdaxdr\nTRRFile\nq\x00Xg\x00\x00\x00~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysisTests/data/coordinates/test.trrq\x01X\x01\x00\x00\x00rq\x02\x86q\x03Rq\x04(K\x01K\x00NK\x00tq\x05b.'
In [10]: pickle.loads(pickle.dumps(trr))
Out[10]: <MDAnalysis.lib.formats.libmdaxdr.TRRFile at 0x118c0d438>
EDIT: added loading and fixed DCD example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach I took here is flawed (so it also shouldn't work in current develop--when pickle.loading
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I need to check what commit I used for this:
In [13]: dcd = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)
In [14]: pickle.dumps(dcd)
Out[14]: b'\x80\x03cMDAnalysis.lib.formats.libdcd\nDCDFile\nq\x00Xj\x00\x00\x00~/anaconda3/envs/mda3dev/lib/python3.7/site-packages/MDAnalysisTests/data/coordinates/test.dcdq\x01X\x01\x00\x00\x00rq\x02\x86q\x03Rq\x04K\x01K\x00\x86q\x05b.'
In [15]: mda.__version__
Out[15]: '2.0.0-dev0'
In [16]: pickle.loads(pickle.dumps(dcd))
Out[16]: <MDAnalysis.lib.formats.libdcd.DCDFile at 0x1268ffb90>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly enough, data.DCD
(which is used in the testsuite) fails at pickling while data.COORDINATES_DCD
fails at reading:
>>> dcd = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD)
>>> read_p = pickle.loads(pickle.dumps(dcd))
>>> read_p.read()
OSError: Reading DCD header failed: format of DCD file is wrong
assert_equal(old_reader.offsets, new_reader.offsets) | ||
assert_almost_equal(frame.x, new_frame.x) | ||
assert_almost_equal(frame.box, new_frame.box) | ||
assert frame.step == new_frame.step | ||
assert_almost_equal(frame.time, new_frame.time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests for contents of the frame
frame = old_reader.read() | ||
new_frame = new_reader.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read()
or next(reader)
is the only way to get the current frame
Some representative errors from
are pasted here below. The full output is attached as pytest.txt DCD
xdr (XTC/TRR)
|
@@ -265,7 +265,8 @@ cdef class DCDFile: | |||
return | |||
|
|||
current_frame = state[1] | |||
self.seek(current_frame) | |||
self.seek(current_frame - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.seek(current_frame - 1) | |
self.seek(current_frame) |
@@ -265,7 +265,8 @@ cdef class DCDFile: | |||
return | |||
|
|||
current_frame = state[1] | |||
self.seek(current_frame) | |||
self.seek(current_frame - 1) | |||
self.current_frame = current_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.current_frame = current_frame |
@@ -265,7 +265,8 @@ cdef class DCDFile: | |||
return | |||
|
|||
current_frame = state[1] | |||
self.seek(current_frame) | |||
self.seek(current_frame - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah...it messed up with the position of the file.
The fix is ditching current changes, and changing line 410 to:
if frame >= self.n_frames + 1:
Besides, I don't think current tests cover for the "true" last frame i.e.:
traj = DCDReader(DCD)
traj[-1] # move to last frame
dcd = traj._file
dcd.tell() == len(dcd) # instead of len(dcd) - 1
self.seek(current_frame - 1) | ||
self.current_frame = current_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current tests can pass if these changes are ditched, but still the last frame cannot be easily pickled as seek
here is sort of different from that in DCD files.
I am trying to understand how we use frames here. The basic problem seems to be that One can get to the last frame in a trajectory with the DCDFile and xdrFile readers. DCD>>> import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
>>> reader = mda.lib.formats.libdcd.DCDFile(data.COORDINATES_DCD) Behavior of tellAre we using 0- or 1 based frame numbers? The docs for DCDFile.seek say 0 based and trying to seek to >>> reader.seek(len(reader))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "MDAnalysis/lib/formats/libdcd.pyx", line 411, in MDAnalysis.lib.formats.libdcd.DCDFile.seek
EOFError: Trying to seek over max number of frames However, DCDFile.tell "current frame (0-based)" will always give the frame + 1: >>> reader.seek(0)
>>> frames = [reader.tell() for frame in reader]
>>> frames
[1, 2, 3, 4, 5] Maybe this is because at the time >>> centers = [(reader.tell(), frame.xyz.mean(axis=0)) for frame in reader]
>>> centers
[(1, array([6., 7., 8.], dtype=float32)), (2, array([12., 14., 16.], dtype=float32)), (3, array([24., 28., 32.], dtype=float32)), (4, array([48., 56., 64.], dtype=float32)), (5, array([ 96., 112., 128.], dtype=float32))] where we are now associating the wrong frame index with each observable. If we want to go back to a specific frame (here, frame 2) and we naively use tell after we read the frame (assuming that tell gives us the "current frame", i.e., the one we have been working with) then we will seek to the next frame instead: >>> reader.seek(2)
>>> frame_2 = reader.read()
>>> t_2 = reader.tell() # this is NOT frame 2 but frame 3 already
>>> t_2
3
>>> reader.seek(t_2)
>>> reader.tell()
3
>>> reader.seek(t_2 - 1) # need to remember to go one back...
>>> reader.tell()
2 last frameOne can get to the last frame >>> reader.seek(len(reader)-1)
>>> reader.read()
DCDFrame(xyz=array([[ 0., 16., 32.],
[ 48., 64., 80.],
[ 96., 112., 128.],
[144., 160., 176.],
[192., 208., 224.]], dtype=float32), unitcell=array([85.09999847, 85.40000153, 86.19999695, 80.40000153, 75.40000153,
87.30000305]))
>>> The above is really the last frame, as seen here:
XDR>>> import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
>>> reader = mda.lib.formats.libmdaxdr.TRRFile(data.COORDINATES_TRR) tellSame as above: tell provides the next frame >>> reader.seek(0)
>>> frames = [reader.tell() for frame in reader]
>>> frames
[1, 2, 3, 4, 5] and >>> reader.seek(2)
>>> frame_2 = reader.read()
>>> reader.seek(2)
>>> frame_2 = reader.read()
>>> t_2 = reader.tell() # this is NOT frame 2 but frame 3 already
>>> t_2
3
>>> reader.seek(t_2)
>>> reader.tell()
3
>>> frame_new = reader.read()
>>> frame_new.step
3
>>> frame.step
2
>>> reader.seek(t_2 - 1)
>>> frame_correct = reader.read()
>>> frame_correct.step
2 last frameYou can seek to the last frame in the trajectory by using >>> reader.seek(len(reader)-1)
>>> reader.read()
TRRFrame(x=array([[ 0. , 1.6 , 3.2 ],
[ 4.8 , 6.4 , 8. ],
[ 9.6 , 11.2 , 12.8 ],
[14.400001, 16. , 17.6 ],
[19.2 , 20.800001, 22.4 ]], dtype=float32), v=array([[0. , 0.16000001, 0.32000002],
[0.48000002, 0.64000005, 0.8 ],
[0.96000004, 1.12 , 1.2800001 ],
[1.4399999 , 1.6 , 1.7600001 ],
[1.9200001 , 2.08 , 2.24 ]], dtype=float32), f=array([[ 0. , 1.5999999, 3.1999998],
[ 4.7999997, 6.3999996, 8. ],
[ 9.599999 , 11.2 , 12.799999 ],
[14.400001 , 16. , 17.6 ],
[19.199999 , 20.8 , 22.4 ]], dtype=float32), box=array([[8.51 , 0. , 0. ],
[0.69131476, 8.592234 , 0. ],
[1.4558911 , 2.0905383 , 8.350026 ]], dtype=float32), step=4, time=4.0, lmbda=1.0, hasx=True, hasv=True, hasf=True) as demonstrated by the fact that we just got the last frame. The TRR has the steps explicitly numbered so I know that we got the last frame: >>> steps = [frame.step for frame in reader]
>>> steps
[0, 1, 2, 3, 4] |
By "last frame", I mean the state of the file when trajectory is in its last frame. i.e. >>> reader.seek(len(reader)-1)
>>> reader.read()
reader.tell() == len(reader) which cannot be pickled before. I think it makes sense that Or does it make more sense to make it init to -1, then we always get the 0-based current frame with >>> reader.seek(0)
>>> frames = [reader.tell() for frame in reader]
>>> frames
[0, 1, 2, 3, 4] |
Maybe it's worth adding these inclusive tests to develop branch with a new PR? Also maybe we could explore a bit more on how to actually fix the "last frame" pickling issue. |
- Fixes #2878 - Changes made in this Pull Request: - add more comprehensive tests for pickling low-level dcd and xdr files - fix a bug that was introduced in #2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR #2908) - update CHANGELOG - maybe backport in #2908
Closing; would need to port the changes/fixes from PR #2911. |
- Fixes MDAnalysis#2878 - Changes made in this Pull Request: - add more comprehensive tests for pickling low-level dcd and xdr files - fix a bug that was introduced in MDAnalysis#2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR MDAnalysis#2908) - update CHANGELOG - maybe backport in MDAnalysis#2908
Fixes #2878 (backport)
Changes made in this Pull Request:
__setstate__
for DCDReader and XTC/TRRReader according to Fail to pickle last frame of DCD, XDR files #2878 (comment)PR Checklist