Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize FileIO and TextIOWrapper and Universe #2723

Merged
merged 120 commits into from
Aug 8, 2020

Conversation

yuxuanzhuang
Copy link
Contributor

@yuxuanzhuang yuxuanzhuang commented Jun 9, 2020

Fixes #2878

Changes made in this Pull Request:

  • add new picklable BufferedReader, FileIO, and TextIOWrapper classes.
  • add new classes and pickle_open function to picklable_file_io.py
  • implement __getstate__, __setstate__ to Universe and BaseReader
  • fix DCD, XDR pickle issue

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #2723 into develop will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2723      +/-   ##
===========================================
+ Coverage    92.80%   92.86%   +0.06%     
===========================================
  Files          185      186       +1     
  Lines        24205    24383     +178     
  Branches      3133     3149      +16     
===========================================
+ Hits         22463    22643     +180     
+ Misses        1696     1694       -2     
  Partials        46       46              
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/DLPoly.py 98.80% <100.00%> (+<0.01%) ⬆️
package/MDAnalysis/coordinates/GSD.py 91.66% <100.00%> (+2.77%) ⬆️
package/MDAnalysis/coordinates/TRJ.py 97.69% <100.00%> (+0.02%) ⬆️
package/MDAnalysis/coordinates/base.py 94.88% <100.00%> (+0.06%) ⬆️
package/MDAnalysis/coordinates/chain.py 92.60% <100.00%> (+0.33%) ⬆️
package/MDAnalysis/coordinates/chemfiles.py 94.88% <100.00%> (+0.40%) ⬆️
package/MDAnalysis/core/universe.py 97.19% <100.00%> (+0.24%) ⬆️
package/MDAnalysis/lib/__init__.py 100.00% <100.00%> (ø)
package/MDAnalysis/lib/formats/libdcd.pyx 89.33% <100.00%> (+0.03%) ⬆️
package/MDAnalysis/lib/formats/libmdaxdr.pyx 89.96% <100.00%> (+0.03%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2550d06...5ace1e0. Read the comment docs.

@richardjgowers
Copy link
Member

@yuxuanzhuang ignore the failing Python 2.7 tests, once we release 1.0 we're not supporting Python 2 any more.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comments

return buffer
elif mode == 'rt' or mode == 'r':
return TextIOPickable(buffer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should fail for anything else (ValueError for unsupported mode).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here saying that we should never get here; it's not obvious until you read all code up to this point.

Or better

assert False, "mode = {} argument was never processed".format(mode)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean replacing the ValueError for unsupported mode?


# not as comprehensive as built-in open func--no need for other args
# only should be used for 'reading' modes
def pickle_open(name, mode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs

package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
def test_offset(f):
f.readline()
f_pickled = pickle.loads(pickle.dumps(f))
assert_equal(f.tell(),f_pickled.tell())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 - space after comma

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can recommend running a formatting tool with every save. I use black for python nowadays.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of formatting/test changes

package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/utils/test_pickleio.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
@yuxuanzhuang
Copy link
Contributor Author

I guess these func and classes should be something for internal use, i.e. being called by anyopen, instead of being called directly?

@orbeckst
Copy link
Member

I guess these func and classes should be something for internal use, i.e. being called by anyopen, instead of being called directly?

This might be the case right now. But sometimes they find other uses, too. You could start out with them being "private" (but still fully documented) and we can later make them public with a defined API.

lib.util is getting really big and unwieldy. Perhaps it makes sense to split off a fileutil module (or something with a better name, io is used everywhere but is nice and short)? – Opinions?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks very good and promising. I have a bunch of comments and suggestions (see inline), mainly

  • docs
  • suggestion to internally change the order of the pickle (purely for aesthetics and code readability)
  • spelling: should be "picklable" instead of "pickable" everywhere; I made suggestions but didn't correct everything (such as all the tests are still "pickable")

If pickle_open() is used elsewhere then we should also provide a corresponding context manager. I don't know how urgent that is and it does not need to happen in this PR.

As I also said in another comment, it might be worthwhile splitting lib.utils and moving all file/IO related functionality into its own module.

package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/util.py Outdated Show resolved Hide resolved
return buffer
elif mode == 'rt' or mode == 'r':
return TextIOPickable(buffer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here saying that we should never get here; it's not obvious until you read all code up to this point.

Or better

assert False, "mode = {} argument was never processed".format(mode)

@richardjgowers
Copy link
Member

Yeah I'd put this in a new file then maybe import it into the .lib namespace. So it's available as MDAnalysis.lib.PicklableFileIO

@orbeckst
Copy link
Member

@richardjgowers do you mean a module picklable_file_io.py (modules are traditionally lower case in Python and in MDA)? Maybe call it pio.py for "picklable I/O"?

@richardjgowers
Copy link
Member

@orbeckst the file can be called anything, so either of those are fine. I don't think a user needs to understand the file layout of things, instead in lib/__init__.py:

from picklable_file_io import FileIOPicklable, ...

so lib.FileIOPicklable is the class not the module

('CHAIN', [PDB, PDB, PDB], dict()),
('CHAIN', [XTC, XTC, XTC], dict()),
])
def ref_reader(request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put that test in BaseReaderTest (and maybe _SingleFrameReader) so any new reader get tested? While it looks cleaner to have the test here, I am worried we will miss readers, especially when new readers get added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean cover all the _SingleFrameReader as how class TestPDBReader(_SingleFrameReader) is implemented? As for pickle test in BaseReaderTest and MutliframeReaderTest, they were already added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is what I meant, indeed. My concern when I see this fixture is that we will forget to add the new readers to the list. Ideally I would like something that fails if a reader is not tested for pickling and multiprocessing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, only tests in NAMDBIN, PDB, PQR utilize _SingleFrameReader. Makes sense to create a new PR for other single frame readers e.g. MMTF, to be inherited from that? (which it along can be backported.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of getting the PR merged and open a new one where we think hard about how to do consistent testing of all our readers and parsers.

We'll also need to look at converters... but that's a different issue. Literally.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any further questions I have re: documentation can/should be documented in the userguide or similar. I'm good with this, thanks for all the work @yuxuanzhuang !

@orbeckst
Copy link
Member

orbeckst commented Aug 7, 2020

ping reviewers @IAlibay @fiona-naughton – can you please check if your comments were addressed? If yes, please approve the PR. If not, please point out specifically what needs to be done, in the form of actionable requests. As discussed on the call, we'd like to get this merged ASAP so that @yuxuanzhuang can move forward.

@orbeckst
Copy link
Member

orbeckst commented Aug 7, 2020

Oops... sorry @IAlibay – you approved while I was writing my message. Thanks!

@orbeckst orbeckst added the GSoC GSoC project label Aug 7, 2020
@fiona-naughton
Copy link
Contributor

looks good to me - nice work @yuxuanzhuang !

@orbeckst orbeckst merged commit 563995c into MDAnalysis:develop Aug 8, 2020
@orbeckst
Copy link
Member

orbeckst commented Aug 8, 2020

@yuxuanzhuang congratulations – it is done! 🎉

@yuxuanzhuang
Copy link
Contributor Author

Thanks for all the help!

@yuxuanzhuang yuxuanzhuang mentioned this pull request Aug 15, 2020
4 tasks
orbeckst pushed a commit that referenced this pull request Aug 17, 2020
- Fixes #2878
- Changes made in this Pull Request:
    - add more comprehensive tests for pickling low-level dcd and xdr files
    - fix a bug that was introduced in #2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR #2908)
- update CHANGELOG
- maybe backport in #2908
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* Fixes MDAnalysis#2878
* basic approach: composition instead of inheritance for pickling Universe (which was
   tested in PR MDAnalysis#2704 (which was derived from PR MDAnalysis#2140))
* Changes made in this Pull Request:
    - add new classes and pickle_open function to picklable_file_io.py
    - add new picklable `BufferedReader`, `FileIO`, and `TextIOWrapper` classes.
    - implement `__getstate__`, `__setstate__` to `Universe` and `BaseReader`
    - fix DCD, XDR pickle issue
    - modified gsd and ncdf to be picklabel
    - modified ChainReader to be picklabel
    - ensure chemfiles is picklable
* add tests (MultiFrameReader will test for serializability)
* update CHANGELOG
* update docs

Note: This merge squashed 120 commits. See PR MDAnalysis#2723 for the full history with 420 comments.
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
- Fixes MDAnalysis#2878
- Changes made in this Pull Request:
    - add more comprehensive tests for pickling low-level dcd and xdr files
    - fix a bug that was introduced in MDAnalysis#2723 (wrong seeking in DCD and XDR, note: this bug was only in develop, never in released code, see discussion in PR MDAnalysis#2908)
- update CHANGELOG
- maybe backport in MDAnalysis#2908
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to pickle last frame of DCD, XDR files