-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize FileIO and TextIOWrapper and Universe #2723
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2723 +/- ##
===========================================
+ Coverage 92.80% 92.86% +0.06%
===========================================
Files 185 186 +1
Lines 24205 24383 +178
Branches 3133 3149 +16
===========================================
+ Hits 22463 22643 +180
+ Misses 1696 1694 -2
Partials 46 46
Continue to review full report at Codecov.
|
@yuxuanzhuang ignore the failing Python 2.7 tests, once we release 1.0 we're not supporting Python 2 any more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comments
package/MDAnalysis/lib/util.py
Outdated
return buffer | ||
elif mode == 'rt' or mode == 'r': | ||
return TextIOPickable(buffer) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should fail for anything else (ValueError
for unsupported mode
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here saying that we should never get here; it's not obvious until you read all code up to this point.
Or better
assert False, "mode = {} argument was never processed".format(mode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean replacing the ValueError
for unsupported mode?
package/MDAnalysis/lib/util.py
Outdated
|
||
# not as comprehensive as built-in open func--no need for other args | ||
# only should be used for 'reading' modes | ||
def pickle_open(name, mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs
def test_offset(f): | ||
f.readline() | ||
f_pickled = pickle.loads(pickle.dumps(f)) | ||
assert_equal(f.tell(),f_pickled.tell()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 - space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can recommend running a formatting tool with every save. I use black for python nowadays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple of formatting/test changes
I guess these func and classes should be something for internal use, i.e. being called by |
This might be the case right now. But sometimes they find other uses, too. You could start out with them being "private" (but still fully documented) and we can later make them public with a defined API.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks very good and promising. I have a bunch of comments and suggestions (see inline), mainly
- docs
- suggestion to internally change the order of the pickle (purely for aesthetics and code readability)
- spelling: should be "picklable" instead of "pickable" everywhere; I made suggestions but didn't correct everything (such as all the tests are still "pickable")
If pickle_open()
is used elsewhere then we should also provide a corresponding context manager. I don't know how urgent that is and it does not need to happen in this PR.
As I also said in another comment, it might be worthwhile splitting lib.utils
and moving all file/IO related functionality into its own module.
package/MDAnalysis/lib/util.py
Outdated
return buffer | ||
elif mode == 'rt' or mode == 'r': | ||
return TextIOPickable(buffer) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here saying that we should never get here; it's not obvious until you read all code up to this point.
Or better
assert False, "mode = {} argument was never processed".format(mode)
Yeah I'd put this in a new file then maybe import it into the |
@richardjgowers do you mean a module |
@orbeckst the file can be called anything, so either of those are fine. I don't think a user needs to understand the file layout of things, instead in from picklable_file_io import FileIOPicklable, ... so |
('CHAIN', [PDB, PDB, PDB], dict()), | ||
('CHAIN', [XTC, XTC, XTC], dict()), | ||
]) | ||
def ref_reader(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to put that test in BaseReaderTest
(and maybe _SingleFrameReader
) so any new reader get tested? While it looks cleaner to have the test here, I am worried we will miss readers, especially when new readers get added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean cover all the _SingleFrameReader
as how class TestPDBReader(_SingleFrameReader)
is implemented? As for pickle test in BaseReaderTest
and MutliframeReaderTest
, they were already added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is what I meant, indeed. My concern when I see this fixture is that we will forget to add the new readers to the list. Ideally I would like something that fails if a reader is not tested for pickling and multiprocessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, only tests in NAMDBIN
, PDB
, PQR
utilize _SingleFrameReader
. Makes sense to create a new PR for other single frame readers e.g. MMTF, to be inherited from that? (which it along can be backported.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of getting the PR merged and open a new one where we think hard about how to do consistent testing of all our readers and parsers.
We'll also need to look at converters... but that's a different issue. Literally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any further questions I have re: documentation can/should be documented in the userguide or similar. I'm good with this, thanks for all the work @yuxuanzhuang !
ping reviewers @IAlibay @fiona-naughton – can you please check if your comments were addressed? If yes, please approve the PR. If not, please point out specifically what needs to be done, in the form of actionable requests. As discussed on the call, we'd like to get this merged ASAP so that @yuxuanzhuang can move forward. |
Oops... sorry @IAlibay – you approved while I was writing my message. Thanks! |
looks good to me - nice work @yuxuanzhuang ! |
@yuxuanzhuang congratulations – it is done! 🎉 |
Thanks for all the help! |
- Fixes #2878 - Changes made in this Pull Request: - add more comprehensive tests for pickling low-level dcd and xdr files - fix a bug that was introduced in #2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR #2908) - update CHANGELOG - maybe backport in #2908
* Fixes MDAnalysis#2878 * basic approach: composition instead of inheritance for pickling Universe (which was tested in PR MDAnalysis#2704 (which was derived from PR MDAnalysis#2140)) * Changes made in this Pull Request: - add new classes and pickle_open function to picklable_file_io.py - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes. - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader` - fix DCD, XDR pickle issue - modified gsd and ncdf to be picklabel - modified ChainReader to be picklabel - ensure chemfiles is picklable * add tests (MultiFrameReader will test for serializability) * update CHANGELOG * update docs Note: This merge squashed 120 commits. See PR MDAnalysis#2723 for the full history with 420 comments.
- Fixes MDAnalysis#2878 - Changes made in this Pull Request: - add more comprehensive tests for pickling low-level dcd and xdr files - fix a bug that was introduced in MDAnalysis#2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR MDAnalysis#2908) - update CHANGELOG - maybe backport in MDAnalysis#2908
Fixes #2878
Universe
in Updating Serialization Functionality from PR #2140 #2704Changes made in this Pull Request:
pickle_open
function topicklable_file_io.py
__getstate__
,__setstate__
toUniverse
andBaseReader
PR Checklist