Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting links and references #389

Open
jakirkham opened this issue Jan 12, 2019 · 8 comments
Open

Supporting links and references #389

jakirkham opened this issue Jan 12, 2019 · 8 comments
Labels
enhancement New features or improvements

Comments

@jakirkham
Copy link
Member

In HDF5, there are a few different mechanisms used to refer to other data in an HDF5 file from a location other than where they are stored.

For example, HDF5 supports different types of links like hard links, soft links, and external links. These are analogous to links on the filesystem with the exception of external links, which constitute a soft link to a different HDF5 file. These can refer to groups or datasets in HDF5.

Also HDF5 supports a couple kinds of references such as object references and region references. These refer simply to datasets or some subselection of datasets.

It would be useful to support these in Zarr as there appear to be some use cases for them. ( #297 ) ( #298 ) ( https://github.com/zarr-developers/zarr/issues/333 ) Also downstream libraries like pynwb would like to use them. ( NeurodataWithoutBorders/pynwb#230 )

Am raising here to discuss how we might support these features in Zarr across various stores.

@amkigit
Copy link

amkigit commented Oct 1, 2019

I like and use the feature of hdf5 to use object references. Instead of using symbolic links would it be possible to create a "virtual" Group, a directory or a data storage, which does contain references (path) to the effective Group members. Instead of using the file system the reference is in a lets say .vgroup file. May be an easy and flexible approach.

@NumesSanguis
Copy link

Someone recently made an issue over at ASDF (Advanced Scientific Data Format) to integrate with Zarr: asdf-format/asdf#718
ASDF supports:

ASDF currently doesn't support chunking however, which would make Zarr a good addition to ASDF. The original issue points out that integrating with Zarr would benefit parallel computation with ASDF (e.g. with Dask).

@alimanfoo
Copy link
Member

Thanks @NumesSanguis for making the connection, very interesting.

@perrygreenfield
Copy link

Thanks @NumesSanguis from the ASDF side as well. This is something we want to take a serious look at.

@Cadair
Copy link

Cadair commented Dec 12, 2019

Thanks for making the link @NumesSanguis once we have wrapped our heads around things a little more I might open some issues here 😀

@NumesSanguis
Copy link

Is there any update on this issue? It would be great to be able to refer to parts of a larger array as a new dataset, without having to copy the data or having to create your own parsing code.

@perrygreenfield
Copy link

From our view we would like to start working on this for ASDF in a couple months (after a new staff member starts). If someone is willing to help, even better.

@marcel-goldschen-ohm
Copy link

Any news on this? I'm currently trying to plan out how to use zarr for electrophysiology datasets. These data can have repeated stimulus patterns and identical time arrays for which shared arrays (see #690) and/or groups seem like they could be really useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
None yet
Development

No branches or pull requests

8 participants