-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Serialization Functionality from PR #2140 #2704
Conversation
and broke everything
update to HEAD
Codecov Report
@@ Coverage Diff @@
## develop #2704 +/- ##
===========================================
+ Coverage 91.22% 91.26% +0.03%
===========================================
Files 176 176
Lines 24115 24208 +93
Branches 3160 3160
===========================================
+ Hits 22000 22093 +93
- Misses 1492 1493 +1
+ Partials 623 622 -1
Continue to review full report at Codecov.
|
The linter is not happy with you in #2346.7
Please fix – generally, we'd like to be CI green :-). |
The other failing test Job #2346.8 ran too long. That's a general problem #2671 , not specific to you , but keep in mind keeping tests short. I tried restarting it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay in reviewing here, I'm just providing some small initial comments to begin with as there's a few things I still haven't fully wrapped my head around yet.
It would be good to add some detailed docstring about the behaviour of _[B/Ex]AsciiPickle and maybe some extra details in somewhere (maybe ReaderBase?) on what we expect Readers to have in order to be pickleable (e.g. self._f, etc...).
@@ -131,3 +135,4 @@ def _read_frame(self, frame): | |||
def _read_next_timestep(self) : | |||
"""read next frame in trajectory""" | |||
return self._read_frame(self._frame + 1) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 don't need a blank line on the last line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IAlibay do we have a PEP8 checker running? If not, we should add one to the linter.
@lilyminium does the user guide talk about PEP8-checking?
I'm just asking I find myself writing a lot of "PEP8" comments on PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we do - I think there's already an ongoing discussion in #2450
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the user guide talk about PEP8-checking?
Yes, the user guide highlights important points from PEP8, mentions tools like flake8 for linting, and autopep8/yapf for autoformatting. It doesn't mention stuff like spaces after commas and around comparison operators, or blank lines, though.
@@ -2069,6 +2078,54 @@ def _apply_transformations(self, ts): | |||
return ts | |||
|
|||
|
|||
class _AsciiPickle(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring explaining what these three classes do / when they should be useful would be very helpful here (especially when it comes to the implementation of future readers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document everything, even if you make it a private class. You're working on the core code, for years to come, developers will have to work with what you're writing here, so be super-clear about
- what your intent is (how it should behave)
- how it interacts with other parts of the code
- what the caveats, requirements, and implicit assumptions are
- any TODOS/known issues
@@ -272,10 +272,14 @@ def test_load_multiple_args(self): | |||
assert_equal(len(u.atoms), 3341, "Loading universe failed somehow") | |||
assert_equal(u.trajectory.n_frames, 2 * ref.trajectory.n_frames) | |||
|
|||
def test_pickle_raises_NotImplementedError(self): | |||
def test_pickle(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth testing more than PSF/DCD here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test here is just to make sure Universe pickling works. Since all the readers will be tested in test_multiprocessing.py, I am not sure it is worth testing more here again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding here is that some of the readers are going to behave a little bit differently under the hood for pickling (e.g. GSD). My comment here was more "is it worth doing a parametrize and looping over various file types to make sure none of them fail to pickle".
else: | ||
top, trj = request.param | ||
return mda.Universe(top, trj) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8, two blank lines between functions
|
||
# Define target functions here | ||
# inside test functions doesn't work | ||
def cog(u, ag, frame_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this and getnames be fixtures?
p.close() | ||
|
||
assert_equal(ref, res) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 two blank lines here
finally: | ||
# make sure file handle is closed afterwards | ||
r.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, PEP8 - two blank lines
self._f = anyopen(self._pickle_fn) | ||
|
||
del self._pickle_fn | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 here and below, two blank lines between classes.
For these big PRs it's a good idea to occasionally rebase against the latest develop as not to fall behind too far. There are also some speed-ups that run tests faster, run on other CI etc. |
For right now, don't worry about Python 2.7 and focus on making it work in 3. Make your tests XFAIL if Python 2. The current plan is to get MDAnalysis 1.0 #2443 out soon. This is the last release to officially support Python 2. Your code really only has to work with >1.x. We can add it as "experimental" in 1.0 if we make it fail in obvious ways but it doesn't have to be full featured. |
|
||
# AMBER NetCDF files should always have a convention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this comment was removed accidentally
if n_atoms is not None and n_atoms != self.n_atoms: | ||
errmsg = ("Supplied n_atoms ({0}) != natom from ncdf ({1}). " | ||
"Note: n_atoms can be None and then the ncdf value " | ||
"is used!".format(n_atoms, self.n_atoms)) | ||
raise ValueError(errmsg) | ||
except KeyError: | ||
errmsg = ("NCDF trajectory {0} does not contain atom " | ||
"information".format(self.filename)) | ||
"information".format( | ||
self._f.ConventionVersion, self.version)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this has been changed - it looks like maybe it was copied from elsewhere by accident?
self._f = scipy.io.netcdf.netcdf_file(self.filename, | ||
mmap=self._mmap) | ||
|
||
# @property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it was code that was added in @richardjgowers's original PR in conjunction with removing the code for setting self.n_frames
in __init__
, which has since been altered and so still in this PR up here . I'm not sure what the original intention for the move was, though, so don't know if it's better to remove this or move the updated n_frames
bit here
The inheritance approach used here could be a problem in the future if we need to customize for a reader how pickling works. Every time we want a different way of pickling we need to write a new base class and inherits from it. It might be we need to read two files for some formats. Another approach is to "teach" the objects used in the readers how to pickle themselves. This software development pattern is called composition. Wikipedia has a good comparison article Our problem is that the default file object in python cannot be pickled. So we can write our own file object that copies the interface of the default file. Interestingly in python, we use inheritance to copy the interface of another class. So you can write a new file class like this. class FilePickable(io.FileIO)
def init(self, name, *args, **kwargs):
super().__init__(*args, **kwargs)
self._args = args
self._kwargs = lwargs
def getstate(self):
return self.tell(), self._args, self._kwargs
def setstate(self, args):
super().__init__(*args[1], **args[2])
self.seek(args[0]) The This change would automatically make all trajectory readers using normal files pickable. The xdr file objects can already be pickled with a similar approach. The only thing we have to look out for is to use the Of course, this is only a rough sketch. There might be some edge-cases to figure out. |
@yuxuanzhuang I think @kain88-de is right here. Can you look into if this is possible? |
@richardjgowers Sounds like something I should definitely look into. @kain88-de Thanks for your advice! |
When I tried to rebase this PR, other PRs shows up in this thread, not sure what mistake I am making here by |
Can you post the output of `git reflog | head -n 20`. It will show your
last 10 git commands. That helps with finding out what happened.
…On Mon 8. Jun 2020 at 16:13, Yuxuan Zhuang ***@***.***> wrote:
For these big PRs it's a good idea to occasionally rebase against the
latest develop as not to fall behind too far. There are also some speed-ups
that run tests faster, run on other CI etc.
When I tried to rebase this PR, other PRs shows up in this thread, not
sure what mistake I am making here by git rebase mda_origin/develop
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2704 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA2OVUCUKPI2AETHN774XTRVTWY7ANCNFSM4NOIGZDQ>
.
|
|
So I added such modification to another branch, not to mess up with current one. It works fine with most cases when the file was read by In terms of guaranteeing thread-safe. I guess we should make sure the files open with writing permissions should not return with this new Class. The core codes read below: class FileIOPickable(io.FileIO):
def __init__(self, name, *args, **kwargs):
super().__init__(name, *args, **kwargs)
self._args = args
self._kwargs = kwargs
def __getstate__(self):
return self.tell(), self.__dict__
def __setstate__(self, args):
state = args[1]
super().__init__(state['name'], *state['_args'], **state['_kwargs'])
self.seek(args[0])
class TextIOPickable(io.TextIOWrapper):
def __init__(self, buffer, *args, **kwargs):
super().__init__(buffer, *args, **kwargs)
self._buffer_args = buffer.__dict__
self._args = args
self._kwargs = kwargs
def __getstate__(self):
return self.tell(), self.__dict__
def __setstate__(self, args):
state = args[1]
buffer = FileIOPickable(state['_buffer_args']['name'],
*state['_buffer_args']['_args'],
**state['_buffer_args']['_kwargs'])
super().__init__(buffer, *state['_args'], **state['_kwargs'])
self.seek(args[0])
def pickle_open(name, mode):
buffer = FileIOPickable(name, mode='rb')
if mode == 'rb':
return buffer
elif mode == 'rt' or mode == 'r':
return TextIOPickable(buffer)
...
handlers = {'bz2': bz2_open, 'gz': gzip.open, '': pickle_open }
... |
Thanks. Unfortunately, the rebase took more than 20 steps. In the output you can see every step that was applied by git, also every step of the rebase. The first cryptic name in each line is the SHA of the git commit. You can use this to go a specific commit with This is a long running PR that touches many places in the code. It is difficult to do a rebase every once in a while on such a PR. If you want to catch up with Develop a merge would be better suited, more towards the end once you settled on a design and work towards finalizing the PR. You can still reset this branch. Using git reflog you can look at what the first commit SHA before you started the rebase. To safely reset this branch you can then run the following git commands. git branch backup # just in case reference to start over.
git reset --hard <SHA> and merge with develop. git remote update # fetch remote changes without local updates
git merge mda_origin/develop About the changes regarding pickling. If you can show us the other branch with your changes that would help. Glad it seems to work. |
To update this branch on your remote you will have to force the push |
That should throw an exception when trying to pickle it. It is a truly exceptional case and should signal this to the caller. We can give the exception a short explanatory text. Something like "Writable file handler cannot be pickled". This is a better signal that an error occurred instead of silently returning nothing. |
Sorry, I suggested the rebase because we had a number of changes that would improve running tests. I didn't anticipate that the rebase would mess too much with history (normally I find that the rebases to develop are pretty clean, serial history). |
This is a pretty preliminary branch for IO serialization. It works for most case (and even slight faster I believe), while I attached current Reader failures in the comments. If we are opt for this, I guess we can merge those changes into this PR. I do love how neet it becomes with composition instead of inheritance. |
@yuxuanzhuang ok the branch looks good. I think a way forward here is to create a branch that implements only the |
* Fixes #2878 * basic approach: composition instead of inheritance for pickling Universe (which was tested in PR #2704 (which was derived from PR #2140)) * Changes made in this Pull Request: - add new classes and pickle_open function to picklable_file_io.py - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes. - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader` - fix DCD, XDR pickle issue - modified gsd and ncdf to be picklabel - modified ChainReader to be picklabel - ensure chemfiles is picklable * add tests (MultiFrameReader will test for serializability) * update CHANGELOG * update docs Note: This merge squashed 120 commits. See PR #2723 for the full history with 420 comments.
I think we can close this PR, now that PR #2723 is merged? |
Closing because it has been superseded by "composition over inheritance" PR #2723 . |
* Fixes MDAnalysis#2878 * basic approach: composition instead of inheritance for pickling Universe (which was tested in PR MDAnalysis#2704 (which was derived from PR MDAnalysis#2140)) * Changes made in this Pull Request: - add new classes and pickle_open function to picklable_file_io.py - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes. - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader` - fix DCD, XDR pickle issue - modified gsd and ncdf to be picklabel - modified ChainReader to be picklabel - ensure chemfiles is picklable * add tests (MultiFrameReader will test for serializability) * update CHANGELOG * update docs Note: This merge squashed 120 commits. See PR MDAnalysis#2723 for the full history with 420 comments.
Merging from and updating #2140
Changes made in this Pull Request:
PR Checklist