Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in a zarr.sparse module? #424

Open
daletovar opened this issue Apr 1, 2019 · 5 comments
Open

Interest in a zarr.sparse module? #424

daletovar opened this issue Apr 1, 2019 · 5 comments
Labels
enhancement New features or improvements

Comments

@daletovar
Copy link

Hey there,
For a project I've been working on I wanted a zarr-based sparse matrix class so I recently made one: https://github.com/daletovar/zsparse

I've added a notebook with a few examples. After it gets to a more stable and faster place I was planning on making it a stand alone package. However, I've been thinking it might make sense to just add it to zarr. I won't be offended if you guys aren't interested. At the very least, I thought you guys might like to know about it, especially because it solves #152.

Right now there's support for csr and csc matrices and saving and loading pydata/sparse arrays. A potential problem with making a COO class for pydata/sparse is that doing a large number of binary searches on zarr arrays takes much longer than it does for numpy arrays. The code would also need to be written in cython instead of numba because numba doesn't support zarr. I'd like to see how cython does on the csr and csc classes as it's all currently written in pure python. All of this is to say, if you were wondering why there isn't a COO class, these are some of the concerns I've had.

Thanks for listening. I'm curious what you guys think about all of this.

@alimanfoo
Copy link
Member

alimanfoo commented Apr 1, 2019 via email

@daletovar
Copy link
Author

Thanks, I appreciate that.

@hammer
Copy link

hammer commented Oct 24, 2020

As the only open issue I could find about storing sparse arrays in Zarr, I thought I'd comment here that the AnnData project's .h5ad file documentation claims that "Sparse arrays don’t have a native representations in HDF5 or Zarr, so we've defined our own". It may be worth extending the Zarr spec to formalize how sparse arrays are stored in Zarr. My apologies if that's already been done.

@rabernat
Copy link
Contributor

rabernat commented Oct 7, 2022

Now that we have the meta_array option (see #934), which allows loading of data to different array types, it should be more straightforward to implement some sort of sparse support (i.e. meta_array=sparse.SparseArray). We would just need to pick an on-disk storage format, which could potentially be implemented as a numcodecs codec.

Perhaps now is the time to revisit this feature.

cc @alxmrs

@jakirkham
Copy link
Member

cc @ivirshup (who has also expressed interest in some form of sparse support in Zarr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
None yet
Development

No branches or pull requests

6 participants