Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5MD fails to serialize with pickle #2890

Closed
orbeckst opened this issue Aug 8, 2020 · 6 comments · Fixed by #2894
Closed

H5MD fails to serialize with pickle #2890

orbeckst opened this issue Aug 8, 2020 · 6 comments · Fixed by #2894
Assignees
Labels
defect Format-H5MD hdf5-based H5MD trajectory format persistence

Comments

@orbeckst
Copy link
Member

orbeckst commented Aug 8, 2020

Expected behavior

All readers can be pickled and pass the tests for MultiframeReaderTest

Actual behavior

TestH5MDReader.test_pickle_reader fails, see https://travis-ci.com/github/MDAnalysis/mdanalysis/jobs/369892996

(Note that this test failure occured becaused the H5MD PR #2787 was merged before the serialization PR and the latter did not have the H5MD PR #2723 merged into it.)

Code to reproduce the behavior

Run TestH5MDReader.test_pickle_reader

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") current develop at 563995c
  • Which version of Python (python -V)? any
  • Which operating system? all on CI
@orbeckst orbeckst added Format-H5MD hdf5-based H5MD trajectory format persistence labels Aug 8, 2020
@orbeckst
Copy link
Member Author

orbeckst commented Aug 8, 2020

@edisj please have a look at this issue as soon as possible as it will currently lead to all CI failing. See https://docs.mdanalysis.org/2.0.0-dev0/documentation_pages/coordinates/pickle_readers.html for notes on what needs to be done.

@yuxuanzhuang please give advice where possible.

@edisj
Copy link
Contributor

edisj commented Aug 8, 2020

@orbeckst looking into it now

@edisj
Copy link
Contributor

edisj commented Aug 8, 2020

@orbeckst @yuxuanzhuang I've been playing around and trying to implement an H5PYPicklable class similar to the ways you've done in the other modules, but it seems like h5py.File objects are harder to pickle. There's a package that might solve this - https://github.com/DaanVanVugt/h5pickle, that I'm trying to use now

Edit: reading Andrew Collette's comment here h5py/h5py#531 (comment) makes sense as to why its difficult - the information needed to pickle/unpickle is not in the h5py.File object itself, but rather "resides with HDF5"

@yuxuanzhuang
Copy link
Contributor

Have a quick look, and it seems that the issue is both _file and _particle_group fail to be pickled. So aside from creating a picklable file object:

class H5MDPicklable(h5py.File):
    def __getstate__(self):
        try:
            driver = self._driver
        except AttributeError:
            driver = None

        return {
                'name': self.filename,
                'mode': self.mode,
                'driver': driver
               }

    def __setstate__(self, state):
        self.__init__(name=state['name'],
                      mode=state['mode'],
                      driver=state['driver'])

    def __getnewargs__(self):
        """Override the h5py getnewargs to skip its error message"""
        return ()

We also need to deal with _particle_group, by removing and recreating it during serialization

class H5MDReader(base.ReaderBase):
    
    ...

    def __getstate__(self):
        state = self.__dict__.copy()
        del state['_particle_group']
        return state

    def __setstate__(self, state):
        self.__dict__ = state
        self._particle_group = self._file['particles'][
            list(self._file['particles'])[0]]

With that the test will pass, but I am not sure it works in real life (I need to read more about H5MD).

@orbeckst
Copy link
Member Author

@edisj @yuxuanzhuang did you have a look at https://github.com/DaanVanVugt/h5pickle if they are doing something special?

@edisj
Copy link
Contributor

edisj commented Aug 11, 2020

The problem I'm trying to figure out is how to actually get the MPI.Comm object from an h5py.File. There is a way where you can do something like:

In      f = h5py.File('filename.h5md', 'r', driver='mpio', comm=MPI.COMM_WORLD)
        f.id.get_access_plist().get_fapl_mpio()[0]

Out     <mpi4py.MPI.Comm at 0x7fb1059c1890>

but this is not the same thing as MPI.COMM_WORLD. If I do:

In    isinstance(MPI.COMM_WORLD, MPI.Intracomm)

Out   True

In    comm2 = f.id.get_access_plist().get_fapl_mpio()[0]
In    isinstance(comm2, MPI.Intracomm)

Out   False

So I don't think the <mpi4py.MPI.Comm at 0x7fb1059c1890> is the object I'm looking for.

@yuxuanzhuang if I understand correctly, H5PYPicklable has to be able to "save" the MPI.Comm object used to open the file with __getstate__. Can __getstate__ save things from arguments passed to H5PYPicklable? Or does it have to be an attribute of the file like self.filename?

orbeckst pushed a commit that referenced this issue Aug 12, 2020
* fix #2890 
* added H5PYPicklable class (works in serial but not with driver="mpio" and MPI comm)
* mock h5py so that the docs build even in the absence of h5py
* tests (anything related to mpi is excluded from coverage)
* update CHANGELOG
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this issue Mar 30, 2021
* fix MDAnalysis#2890 
* added H5PYPicklable class (works in serial but not with driver="mpio" and MPI comm)
* mock h5py so that the docs build even in the absence of h5py
* tests (anything related to mpi is excluded from coverage)
* update CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Format-H5MD hdf5-based H5MD trajectory format persistence
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants