Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize FileIO and TextIOWrapper and Universe #2723

Merged
merged 120 commits into from
Aug 8, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
931d9b5
Merge pull request #1 from MDAnalysis/develop
yuxuanzhuang May 20, 2020
d69ce98
Merge remote-tracking branch 'mda_origin/develop' into develop
yuxuanzhuang Jun 9, 2020
432edf3
add pickle function to fileio, textio
yuxuanzhuang Jun 9, 2020
bf55bf3
add basic test for pickle io
yuxuanzhuang Jun 9, 2020
cd9a485
add comments
yuxuanzhuang Jun 9, 2020
f33629f
xfail on python2
yuxuanzhuang Jun 10, 2020
aca4496
add doc and exception for pickle_open
yuxuanzhuang Jun 11, 2020
cd4ffe3
add doc for textio fileio class
yuxuanzhuang Jun 11, 2020
ec5bd3c
add parallel test for textio
yuxuanzhuang Jun 11, 2020
f6515ee
pep8
yuxuanzhuang Jun 11, 2020
cf29764
add an extra bufferlayer for FileIO for fast access
yuxuanzhuang Jun 11, 2020
6b09e20
Merge remote-tracking branch 'mda_origin/develop' into serialize_io
yuxuanzhuang Jun 11, 2020
c29c316
ditch py2
yuxuanzhuang Jun 11, 2020
db47e27
pep8 and doc
yuxuanzhuang Jun 11, 2020
7e9d6d3
add test for unsupported mode
yuxuanzhuang Jun 11, 2020
eb83a7c
pep8
yuxuanzhuang Jun 11, 2020
0baa868
typo
yuxuanzhuang Jun 12, 2020
c8e63b2
pickle reorder
yuxuanzhuang Jun 12, 2020
658b446
pickle_open as context manager
yuxuanzhuang Jun 12, 2020
1aa6003
format
yuxuanzhuang Jun 12, 2020
f2738bc
move pickle-io to a separate file
yuxuanzhuang Jun 12, 2020
001f3b8
doc
yuxuanzhuang Jun 12, 2020
d0374f5
FileIOPicklable class now only supports name as input, (preventing us…
yuxuanzhuang Jun 12, 2020
8c62df8
pickle open doc and add fspath for the filename
yuxuanzhuang Jun 12, 2020
94f1f8d
absolute import
yuxuanzhuang Jun 12, 2020
acbadec
more doc
yuxuanzhuang Jun 13, 2020
7043d2d
more pep8 and format
yuxuanzhuang Jun 15, 2020
546b05d
Merge remote-tracking branch 'mda_origin/develop' into serialize_io
yuxuanzhuang Jun 15, 2020
c259143
sphinx mark up
yuxuanzhuang Jun 15, 2020
a016a65
add pickle_open example
yuxuanzhuang Jun 15, 2020
401e6ae
changelog
yuxuanzhuang Jun 15, 2020
9225c71
sphinx more
yuxuanzhuang Jun 15, 2020
46c43af
add context manager approach text
yuxuanzhuang Jun 15, 2020
21fe5aa
add match for test valueerror
yuxuanzhuang Jun 15, 2020
821c822
typo
yuxuanzhuang Jun 15, 2020
5e25380
Merge branch 'develop' into serialize_io
yuxuanzhuang Jun 15, 2020
2541a3e
tell error and fileio cov
yuxuanzhuang Jun 19, 2020
1003cd3
Merge branch 'serialize_io' of https://github.com/yuxuanzhuang/mdanal…
yuxuanzhuang Jun 19, 2020
1b7a798
Merge branch 'develop' into serialize_io
yuxuanzhuang Jun 19, 2020
b79d282
remove future import
yuxuanzhuang Jun 19, 2020
d909d63
merge to develop
yuxuanzhuang Jun 19, 2020
cafc596
sphinx block code
yuxuanzhuang Jun 19, 2020
2db1ef2
typo
yuxuanzhuang Jun 20, 2020
33ef68a
Merge branch 'develop' into serialize_io
yuxuanzhuang Jun 21, 2020
24f2a34
Merge branch 'develop' into serialize_io
yuxuanzhuang Jun 22, 2020
84baca9
pickle open pdb and xyz
yuxuanzhuang Jun 23, 2020
108ebde
Merge branch 'serialize_io' of https://github.com/yuxuanzhuang/mdanal…
yuxuanzhuang Jun 23, 2020
7cb40ad
add pickle support to universe, add test, add chainreader
yuxuanzhuang Jun 24, 2020
352ab96
fix misc issues
yuxuanzhuang Jun 24, 2020
356986f
remove python2 legacy bz2
yuxuanzhuang Jun 24, 2020
e5ef732
remove fail test for offset
yuxuanzhuang Jun 24, 2020
aa6e40d
issue raised in changelog
yuxuanzhuang Jun 24, 2020
43a62d5
pep8
yuxuanzhuang Jun 24, 2020
2559625
add pickle func to ReaderBase and set offset
yuxuanzhuang Jun 26, 2020
507f8f5
add test for bz2 gzip and class check
yuxuanzhuang Jun 26, 2020
26fcfe9
add test for gsd, ncdf
yuxuanzhuang Jun 26, 2020
405a6dc
add test for trajectory.next after pickling
yuxuanzhuang Jun 26, 2020
2380a47
older gsd file
yuxuanzhuang Jun 26, 2020
dab38c1
move gsd, ncdf to coord
yuxuanzhuang Jun 29, 2020
5c07901
add chainreader state
yuxuanzhuang Jun 29, 2020
b324791
test timestep
yuxuanzhuang Jun 29, 2020
49f959d
doc
yuxuanzhuang Jun 29, 2020
773524d
add doc version change
yuxuanzhuang Jun 30, 2020
b7e4ef0
chainreader fix
yuxuanzhuang Jun 30, 2020
9d376b7
docstring error
yuxuanzhuang Jun 30, 2020
11cceb4
check dt before pickle
yuxuanzhuang Jul 1, 2020
a3130f5
add pickle test to base
yuxuanzhuang Jul 3, 2020
faf1e01
Merge branch 'develop' into serialize_io
yuxuanzhuang Jul 3, 2020
df7eb86
add chemfiles pickle
yuxuanzhuang Jul 4, 2020
72ba276
doc
yuxuanzhuang Jul 4, 2020
aa62ff0
doc add note
yuxuanzhuang Jul 5, 2020
04be63d
merge to develop
yuxuanzhuang Jul 5, 2020
f01769f
merge to develop
yuxuanzhuang Jul 5, 2020
5a2b28d
change chain getstate
yuxuanzhuang Jul 5, 2020
b5f5270
add in-line comments
yuxuanzhuang Jul 5, 2020
e1facfb
pep8
yuxuanzhuang Jul 5, 2020
cba4456
add chemfile test
yuxuanzhuang Jul 6, 2020
5622b51
pep8
yuxuanzhuang Jul 6, 2020
46cda48
raise error with mode
yuxuanzhuang Jul 7, 2020
5e2ee79
change to read_step
yuxuanzhuang Jul 8, 2020
b23b2fb
change to almost_equal
yuxuanzhuang Jul 8, 2020
cd03058
save frame
yuxuanzhuang Jul 8, 2020
3ce8ba7
save frame pep
yuxuanzhuang Jul 8, 2020
a5da2f7
add doc for pickle
yuxuanzhuang Jul 8, 2020
5a9ad4d
timestep pickle doc
yuxuanzhuang Jul 8, 2020
bc60aa7
doc serialize
yuxuanzhuang Jul 9, 2020
01fc644
doc sphinx
yuxuanzhuang Jul 10, 2020
84eb61f
pickle u with getsetstate
yuxuanzhuang Jul 10, 2020
9f18ccd
pep
yuxuanzhuang Jul 10, 2020
8d07004
Merge branch 'develop' into serialize_io
yuxuanzhuang Jul 10, 2020
e37c84a
warning on cfg
yuxuanzhuang Jul 10, 2020
2d3de99
sep files
yuxuanzhuang Jul 13, 2020
67b65d1
merge to develop
yuxuanzhuang Jul 13, 2020
18d146b
sep to two files
yuxuanzhuang Jul 13, 2020
8679e50
fixed failed merge in CHANGELOG
orbeckst Jul 14, 2020
0ceffe5
removed superfluous blank lines from CHANGELOG
orbeckst Jul 14, 2020
688041c
xdr dcd seek error
yuxuanzhuang Jul 16, 2020
204545b
Merge branch 'serialize_io' of https://github.com/yuxuanzhuang/mdanal…
yuxuanzhuang Jul 16, 2020
3c71f8a
Merge remote-tracking branch 'mda_origin/develop' into serialize_io
yuxuanzhuang Jul 16, 2020
f2239bb
current frame xdr/dcd
yuxuanzhuang Jul 16, 2020
78c93a0
Merge branch 'develop' into serialize_io
orbeckst Jul 17, 2020
c0d241e
remove tests not needed
yuxuanzhuang Jul 19, 2020
68b1c2a
pep
yuxuanzhuang Jul 19, 2020
4061434
Merge branch 'develop' into serialize_io
yuxuanzhuang Jul 20, 2020
d457491
test title more accurate
yuxuanzhuang Jul 20, 2020
4c70dcb
Merge remote-tracking branch 'mda_origin/develop' into serialize_io
yuxuanzhuang Jul 27, 2020
0496ca1
misc
yuxuanzhuang Jul 27, 2020
df061fc
gsd dim
yuxuanzhuang Jul 28, 2020
abe92da
add test for runtimee pickle
yuxuanzhuang Jul 29, 2020
b3469fe
add test for runtimee pickle
yuxuanzhuang Jul 29, 2020
b12eb0d
pep
yuxuanzhuang Jul 29, 2020
fae4797
doc pickle_reader
yuxuanzhuang Jul 29, 2020
52a981e
mock chemfiles
yuxuanzhuang Jul 30, 2020
c4ec287
chemfiles mock when not found
yuxuanzhuang Aug 1, 2020
8804e5b
doc revised
yuxuanzhuang Aug 3, 2020
c99867f
add pickle test to single_framereader
yuxuanzhuang Aug 3, 2020
a70bc8b
add pickle test to fhiams
yuxuanzhuang Aug 3, 2020
bc487a5
test doc
yuxuanzhuang Aug 6, 2020
a1bb47e
test doc title
yuxuanzhuang Aug 6, 2020
5ace1e0
test doc title 2
yuxuanzhuang Aug 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 92 additions & 1 deletion package/MDAnalysis/lib/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,6 @@ class PathLike(object):
pass



def filename(name, ext=None, keep=False):
"""Return a new name that has suffix attached; replaces other extensions.

Expand Down Expand Up @@ -265,6 +264,98 @@ def filename(name, ext=None, keep=False):
return name if isstream(name) else str(name)


class FileIOPickable(io.FileIO):
yuxuanzhuang marked this conversation as resolved.
Show resolved Hide resolved
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
"""Stream for read a file

Picklable FiloIO class that only support read mode.

orbeckst marked this conversation as resolved.
Show resolved Hide resolved
"""
def __getstate__(self):
return self.tell(), self.name
orbeckst marked this conversation as resolved.
Show resolved Hide resolved

def __setstate__(self, args):
name = args[1]
super().__init__(name)
self.seek(args[0])
orbeckst marked this conversation as resolved.
Show resolved Hide resolved


class BufferIOPickable(io.BufferedReader):
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
"""A pickable buffer for a readable FilIO object

Wrap raw FileIOPickable inside

"""
def __init__(self, raw):
super().__init__(raw)
self.raw_class = raw.__class__

def __getstate__(self):
return self.tell(), self.name, self.raw_class
orbeckst marked this conversation as resolved.
Show resolved Hide resolved

def __setstate__(self, args):
name = args[1]
raw_class = args[2]
raw = raw_class(name)
super().__init__(raw)
self.seek(args[0])
orbeckst marked this conversation as resolved.
Show resolved Hide resolved


class TextIOPickable(io.TextIOWrapper):
yuxuanzhuang marked this conversation as resolved.
Show resolved Hide resolved
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
"""Character and line based layer over a pickable FileIO based object.

Example
-------
file = FileIOPickable('filename')
text_wrapped = TextIOPickable(file)
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
"""
def __init__(self, raw):
super().__init__(raw)
self.raw_class = raw.__class__

def __getstate__(self):
return self.tell(), self.name, self.raw_class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.tell(), self.name, self.raw_class
return self.raw_class, self.name, self.tell()


def __setstate__(self, args):
name = args[1]
raw_class = args[2]
# raw_class is used for further expansion this functionality to
# GZip files, which also requires a text wrapper.
raw = raw_class(name)
super().__init__(raw)
self.seek(args[0])

yuxuanzhuang marked this conversation as resolved.
Show resolved Hide resolved
# not as comprehensive as built-in open func--no need for other args
# only should be used for 'reading' modes
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
def pickle_open(name, mode='rt'):
"""Open file and return a stream with pickle function implemented.

orbeckst marked this conversation as resolved.
Show resolved Hide resolved
Parameters
----------
name : str;
a filename given a text or byte string.
mode: {'r', 'rt', 'rb'} (optional)
'r': open for reading in text mode;
'rt': read in text mode (default);
'rb': read in binary mode;
raise ValueError with other modes.
orbeckst marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
stream-like object

See Also
--------
:func:`anyopen`
"""
if mode not in {'r', 'rt', 'rb'}:
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError("Only read mode ('r', 'rt', 'rb') files can be pickled.")
yuxuanzhuang marked this conversation as resolved.
Show resolved Hide resolved
raw = FileIOPickable(name)
if mode == 'rb':
return BufferIOPickable(raw)
elif mode == 'rt' or mode == 'r':
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
return TextIOPickable(raw)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should fail for anything else (ValueError for unsupported mode).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here saying that we should never get here; it's not obvious until you read all code up to this point.

Or better

assert False, "mode = {} argument was never processed".format(mode)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean replacing the ValueError for unsupported mode?


@contextmanager
def openany(datasource, mode='rt', reset=True):
"""Context manager for :func:`anyopen`.
Expand Down
53 changes: 53 additions & 0 deletions testsuite/MDAnalysisTests/parallelism/test_multiprocessing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 fileencoding=utf-8
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the GNU Public Licence, v2 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#

from __future__ import absolute_import
IAlibay marked this conversation as resolved.
Show resolved Hide resolved
import sys
import multiprocessing
import numpy as np
import pytest

from MDAnalysis.lib.util import pickle_open
from MDAnalysisTests.datafiles import (
PDB
)

from numpy.testing import assert_equal


def textio_line(file, i):
return file.readlines()[i]


@pytest.mark.xfail(sys.version_info < (3, 0), reason="pickle function not \
working in python 2")
def test_multiprocess_fileio():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already test that files can be pickled that is enough as multiprocessing is using pickles internally.

p = multiprocessing.Pool(2)
PDB_file = pickle_open(PDB)
ref = PDB_file.readlines()[:4]
PDB_file.close()
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
PDB_file = pickle_open(PDB)
res = np.array([p.apply(textio_line, args=(PDB_file, i))
for i in range(4)])
p.close()
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
assert_equal(res, ref)
58 changes: 58 additions & 0 deletions testsuite/MDAnalysisTests/utils/test_pickleio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
IAlibay marked this conversation as resolved.
Show resolved Hide resolved
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 fileencoding=utf-8
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the GNU Public Licence, v2 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
from __future__ import absolute_import

import pytest
from numpy.testing import assert_equal

import MDAnalysis.lib.util as util
from MDAnalysis.tests.datafiles import PDB

import pickle
fiona-naughton marked this conversation as resolved.
Show resolved Hide resolved
import sys


@pytest.fixture(params=[
# filename mode
(PDB, 'r'),
(PDB, 'rt'),
(PDB, 'rb'),
])
def f(request):
filename, mode = request.param
return util.pickle_open(filename, mode)


@pytest.mark.xfail(sys.version_info < (3, 0), reason="pickle function not \
working in python 2")
def test_iopickle(f):
f_pickled = pickle.loads(pickle.dumps(f))
assert_equal(f.readline(), f_pickled.readline())

orbeckst marked this conversation as resolved.
Show resolved Hide resolved

@pytest.mark.xfail(sys.version_info < (3, 0), reason="pickle function not \
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
working in python 2")
def test_offset(f):
f.readline()
f_pickled = pickle.loads(pickle.dumps(f))
assert_equal(f.tell(), f_pickled.tell())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not sure how relevant this is to how we use file objects, but I know that you can get f.tell() to be "disabled" if you happened to call f.next() (I think it only happens if you only partially iterate through the file). Might be worth keeping in mind as a potential failure point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true! I already noticed this failure for gms, mdcdr formats. I think in the context of pickling trajectories, there's not much use (?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know of a failure point then we should test for it – if nothing else test that it raises an exception as we expect. At least then we know how to reproduce the problem when we need to look closer into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I added a test to raise this error. Maybe in the future, we can either:

  • remove this functionality if it proves there's little use
  • return 0 if f.tell() fails.