Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use information in H5MD file for topology #4320

Closed
supernova4869 opened this issue Oct 15, 2023 · 12 comments
Closed

use information in H5MD file for topology #4320

supernova4869 opened this issue Oct 15, 2023 · 12 comments
Labels
enhancement Format-H5MD hdf5-based H5MD trajectory format topology-building

Comments

@supernova4869
Copy link

supernova4869 commented Oct 15, 2023

I generated h5md trajectories with MD simulation program Mindspore SPONGE. But the trajectory cannot be loaded by MDAnalysis simply. Hope that the H5MD reader will be better supported by MDAnalysis so that we can analysis the h5md trajectory like generated by other MD programs.

@orbeckst orbeckst added the Format-H5MD hdf5-based H5MD trajectory format label Oct 15, 2023
@orbeckst
Copy link
Member

@supernovaZhangJiaXing I haven't heard of Mindspore SPONGE before. MDAnalysis implements the official h5md specifications so I'd like to understand how MDAnalysis fails.

  1. Can you please link to a description that states what this program outputs?
  2. Does SPONGE follow the official h5md format description when writing h5md?
  3. How does MDAnalysis fail? Copy and paste ALL input and output please.
  4. Can you read your trajectory with the reference h5md implementation https://github.com/pdebuyl/pyh5md ?
  5. Can you make a small example trajectory available?

@orbeckst orbeckst added the more information needed Please reply to requests for information or the issue will be closed. label Oct 15, 2023
@orbeckst
Copy link
Member

Unless we get more information there's not much we can do and we will close the issue in about a week.

@orbeckst orbeckst added the close? Evaluate if issue/PR is stale and can be closed. label Oct 26, 2023
@supernova4869
Copy link
Author

Thank you very much! I have checked the files.

Firstly, the previous trajectory file I analyzed exactly contains no valuable information because of my mistake. I have regenerated a new simple trajectory with 12 water molecules. Now I can clearly see the molecules with VMD and the data inside with VSCode extension H5Web. Then I face two problems:

  1. MDAnalysis needs two parameters, i.e., topology and trajectory, but the topology (tpr, gro, pdb, etc.) needs to be re-generated from the h5md file. This lacks convinence because I need to extract the structure from h5md again. Also, MDAnalysis does not support Gromacs 2023 (tpx version 129?), so sometimes I need to re-generate tpr file by another version of gromacs. Hope that MDAnalysis can directly read the h5md file by u=mda.Universe("traj.h5md") instead of u=mda.Universe("system.gro", "traj.h5md")
  2. The trajectory-loading api raises error:
...omitted...
File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/core/universe.py:365, in Universe.__init__(self, topology, all_coordinates, format, topology_format, transformations, guess_bonds, vdwradii, in_memory, in_memory_step, *coordinates, **kwargs)
    360 coordinates = _resolve_coordinates(self.filename, *coordinates,
    361                                    format=format,
    362                                    all_coordinates=all_coordinates)
    364 if coordinates:
--> 365     self.load_new(coordinates, format=format, in_memory=in_memory,
    366                 in_memory_step=in_memory_step, **kwargs)
    368 if transformations:
    369     if callable(transformations):

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/core/universe.py:565, in Universe.load_new(self, filename, format, in_memory, in_memory_step, **kwargs)
    562 # supply number of atoms for readers that cannot do it for themselves
    563 kwargs['n_atoms'] = self.atoms.n_atoms
--> 565 self.trajectory = reader(filename, format=format, **kwargs)
    566 if self.trajectory.n_atoms != len(self.atoms):
    567     raise ValueError("The topology and {form} trajectory files don't"
    568                      " have the same number of atoms!\n"
    569                      "Topology number of atoms {top_n_atoms}\n"
...
           dict_keys(['CHAIN', 'CHEMFILES', 'CRD', 'DCD', 'CONFIG', 'HISTORY', 'DMS', 'GMS', 'GRO', 'INPCRD', 'RESTRT', 'LAMMPS', 'DATA', 'LAMMPSDUMP', 'MOL2', 'PDB', 'ENT', 'XPDB', 'PDBQT', 'PQR', 'TRJ', 'MDCRD', 'CRDBOX', 'NCDF', 'NC', 'TRR', 'H5MD', 'TRZ', 'XTC', 'XYZ', 'TXYZ', 'ARC', 'MEMORY', 'MMTF', 'GSD', 'COOR', 'NAMDBIN', 'IN', 'FHIAIMS', 'PARMED', 'RDKIT', 'OPENMMSIMULATION', 'OPENMMAPP'])
           are implemented in MDAnalysis.
           See https://docs.mdanalysis.org/documentation_pages/coordinates/init.html#id1
           Use the format keyword to explicitly set the format: 'Universe(...,format=FORMAT)'
           For missing formats, raise an issue at https://github.com/MDAnalysis/mdanalysis/issues
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?8fdb0afd-1ed5-4c89-a9a3-40474dde7117) or open in a [text editor](command:workbench.action.openLargeOutput?8fdb0afd-1ed5-4c89-a9a3-40474dde7117). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/run/media/Programs/test/h5md/test.ipynb 单元格 2 line 4
      [2](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2235353043227d/run/media/Programs/test/h5md/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1) import numpy as np
      [3](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2235353043227d/run/media/Programs/test/h5md/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2) # f = h5py.File("tutorial_b03.h5md")
----> [4](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2235353043227d/run/media/Programs/test/h5md/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3) u = mda.Universe("system.gro", "tutorial_b03.h5md")
      [5](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2235353043227d/run/media/Programs/test/h5md/test.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4) # np.array(f["/particles/trajectory/position/value"])[0, :, :]

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/core/universe.py:365, in Universe.__init__(self, topology, all_coordinates, format, topology_format, transformations, guess_bonds, vdwradii, in_memory, in_memory_step, *coordinates, **kwargs)
    360 coordinates = _resolve_coordinates(self.filename, *coordinates,
    361                                    format=format,
    362                                    all_coordinates=all_coordinates)
    364 if coordinates:
--> 365     self.load_new(coordinates, format=format, in_memory=in_memory,
    366                 in_memory_step=in_memory_step, **kwargs)
    368 if transformations:
    369     if callable(transformations):

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/core/universe.py:565, in Universe.load_new(self, filename, format, in_memory, in_memory_step, **kwargs)
    562 # supply number of atoms for readers that cannot do it for themselves
    563 kwargs['n_atoms'] = self.atoms.n_atoms
--> 565 self.trajectory = reader(filename, format=format, **kwargs)
    566 if self.trajectory.n_atoms != len(self.atoms):
    567     raise ValueError("The topology and {form} trajectory files don't"
    568                      " have the same number of atoms!\n"
    569                      "Topology number of atoms {top_n_atoms}\n"
   (...)
    573                          fname=filename,
    574                          trj_n_atoms=self.trajectory.n_atoms))

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/lib/util.py:2458, in store_init_arguments.<locals>.wrapper(self, *args, **kwargs)
   2456             else:
   2457                 self._kwargs[key] = arg
-> 2458 return func(self, *args, **kwargs)

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/coordinates/H5MD.py:480, in H5MDReader.__init__(self, filename, convert_units, driver, comm, **kwargs)
    475 self.units = {'time': None,
    476               'length': None,
    477               'velocity': None,
    478               'force': None}
    479 self._set_translated_units()  # fills units dictionary
--> 480 self._read_next_timestep()

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/coordinates/H5MD.py:724, in H5MDReader._read_next_timestep(self)
    722 def _read_next_timestep(self):
    723     """read next frame in trajectory"""
--> 724     return self._read_frame(self._frame + 1)

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/coordinates/H5MD.py:639, in H5MDReader._read_frame(self, frame)
    634 ts.frame = frame
    636 # fills data dictionary from 'observables' group
    637 # Note: dt is not read into data as it is not decided whether
    638 # Timestep should have a dt attribute (see Issue #2825)
--> 639 self._copy_to_data()
    641 # Sets frame box dimensions
    642 # Note: H5MD files must contain 'box' group in each 'particles' group
    643 if 'edges' in particle_group['box']:

File /opt/anaconda3/envs/mindspore/lib/python3.9/site-packages/MDAnalysis/coordinates/H5MD.py:667, in H5MDReader._copy_to_data(self)
    665 if 'observables' in self._file:
    666     for key in self._file['observables'].keys():
--> 667         self.ts.data[key] = self._file['observables'][key][
    668             'value'][self._frame]
    670 # pulls 'time' and 'step' out of first available parent group
    671 for name, value in self._has.items():

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File [~/.local/lib/python3.9/site-packages/h5py/_hl/group.py:357](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2235353043227d.vscode-resource.vscode-cdn.net/run/media/Programs/test/h5md/~/.local/lib/python3.9/site-packages/h5py/_hl/group.py:357), in Group.__getitem__(self, name)
    355         raise ValueError("Invalid HDF5 object reference")
    356 elif isinstance(name, (bytes, str)):
--> 357     oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    358 else:
    359     raise TypeError("Accessing a group is done with bytes or str, "
    360                     "not {}".format(type(name)))

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5o.pyx:190, in h5py.h5o.open()

KeyError: "Unable to open object (object 'value' doesn't exist)"

I found that at MDAnalysis/coordinates/H5MD.py:667, it passed 'value' string after keys to get observable values, but I think it should be "energies", "potential_energy" and "total_energy" (As the hierarchy figure). Maybe this causes the loading error.

hierarchy

The related files h5md.zip have been uploaded. Hope the problem I found could be discussed.

Thank you!

@orbeckst orbeckst removed more information needed Please reply to requests for information or the issue will be closed. close? Evaluate if issue/PR is stale and can be closed. labels Nov 8, 2023
@orbeckst
Copy link
Member

orbeckst commented Nov 8, 2023

MDAnalysis needs two parameters, i.e., topology and trajectory, but the topology (tpr, gro, pdb, etc.) needs to be re-generated from the h5md file. This lacks convinence because I need to extract the structure from h5md again.

It depends on the information in the h5md — if there is enough to build a minimal topology, e.g., from the Particles, then one could write a H5MDParser (topology reader). We just don't have that yet. You can raise a separate issue. Perhaps someone is keen to work on it. The more information (including links to the format) you provide there, the more likely it is that someone might take it up.

Also, MDAnalysis does not support Gromacs 2023 (tpx version 129?), so sometimes I need to re-generate tpr file by another version of gromacs. Hope that MDAnalysis can directly read the h5md file by u=mda.Universe("traj.h5md") instead of u=mda.Universe("system.gro", "traj.h5md")

According to #4047, GROMACS 2023 TPR is supported since MDAnalysis 2.5.0. Which version are you using? (You didn't supply this information as part of the issue — next time please follow the issue template.)

@orbeckst
Copy link
Member

orbeckst commented Nov 8, 2023

@edisj are you able to look at the H5MD problem here, in particular the issue reading the attached file above with error

KeyError: "Unable to open object (object 'value' doesn't exist)"

@supernova4869
Copy link
Author

MDAnalysis needs two parameters, i.e., topology and trajectory, but the topology (tpr, gro, pdb, etc.) needs to be re-generated from the h5md file. This lacks convinence because I need to extract the structure from h5md again.

It depends on the information in the h5md — if there is enough to build a minimal topology, e.g., from the Particles, then one could write a H5MDParser (topology reader). We just don't have that yet. You can raise a separate issue. Perhaps someone is keen to work on it. The more information (including links to the format) you provide there, the more likely it is that someone might take it up.

Also, MDAnalysis does not support Gromacs 2023 (tpx version 129?), so sometimes I need to re-generate tpr file by another version of gromacs. Hope that MDAnalysis can directly read the h5md file by u=mda.Universe("traj.h5md") instead of u=mda.Universe("system.gro", "traj.h5md")

According to #4047, GROMACS 2023 TPR is supported since MDAnalysis 2.5.0. Which version are you using? (You didn't supply this information as part of the issue — next time please follow the issue template.)

Thank you and sorry for not providing the version. I installed MDAnalysis by conda from conda-forge. The version is 2.6.1 as following code:

>>> import MDAnalysis
/opt/anaconda3/lib/python3.11/site-packages/MDAnalysis/topology/TPRParser.py:161: DeprecationWarning: 'xdrlib' is deprecated and slated for removal in Python 3.13
  import xdrlib
>>> MDAnalysis.__version__
'2.6.1'

@orbeckst
Copy link
Member

orbeckst commented Nov 8, 2023

If MDA 2.6.1 can't read a GROMACS 2023 TPX 129 file then please open a new issue; it's really important for us to keep different problems separate so that they can be worked on separately. Different developers are also interested in different problems so having it as a separate issue will allow us to involve developers that might not care a lot about H5MD. Thanks!

@orbeckst
Copy link
Member

orbeckst commented Nov 8, 2023

Regarding your h5md:

  1. Can you read your trajectory with the reference h5md implementation https://github.com/pdebuyl/pyh5md

Could you please install pyh5md and report on the results of reading your file with it?

@supernova4869
Copy link
Author

If MDA 2.6.1 can't read a GROMACS 2023 TPX 129 file then please open a new issue; it's really important for us to keep different problems separate so that they can be worked on separately. Different developers are also interested in different problems so having it as a separate issue will allow us to involve developers that might not care a lot about H5MD. Thanks!

Thank you! I'll open a new issue about this problem.

@orbeckst
Copy link
Member

@supernovaZhangJiaXing did you check as requested in #4320 (comment) ? If there's no new information then I may close the issue as stale.

@orbeckst orbeckst added more information needed Please reply to requests for information or the issue will be closed. close? Evaluate if issue/PR is stale and can be closed. labels Mar 29, 2024
@orbeckst
Copy link
Member

orbeckst commented Jun 1, 2024

The problem with H5MD and the KeyError: "Unable to open object (object 'value' doesn't exist)" error described here is being discussed in #4598.

I am leaving this issue open for the feature request to be able to use a h5md file to supply (minimal) topology information. I am changing the title to make this clearer.

@orbeckst orbeckst added enhancement topology-building and removed more information needed Please reply to requests for information or the issue will be closed. close? Evaluate if issue/PR is stale and can be closed. labels Jun 1, 2024
@orbeckst orbeckst changed the title Can MDAnalysis support reading h5md trajectory? use information in H5MD file for topology Jun 1, 2024
@hmacdope
Copy link
Member

hmacdope commented Jun 2, 2024

GROMACS indicated they have some plans for this in MDDB work but no solid standard yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Format-H5MD hdf5-based H5MD trajectory format topology-building
Projects
None yet
Development

No branches or pull requests

3 participants