Interest in a zarr.sparse module? #424

daletovar · 2019-04-01T06:21:38Z

Hey there,
For a project I've been working on I wanted a zarr-based sparse matrix class so I recently made one: https://github.com/daletovar/zsparse

I've added a notebook with a few examples. After it gets to a more stable and faster place I was planning on making it a stand alone package. However, I've been thinking it might make sense to just add it to zarr. I won't be offended if you guys aren't interested. At the very least, I thought you guys might like to know about it, especially because it solves #152.

Right now there's support for csr and csc matrices and saving and loading pydata/sparse arrays. A potential problem with making a COO class for pydata/sparse is that doing a large number of binary searches on zarr arrays takes much longer than it does for numpy arrays. The code would also need to be written in cython instead of numba because numba doesn't support zarr. I'd like to see how cython does on the csr and csc classes as it's all currently written in pure python. All of this is to say, if you were wondering why there isn't a COO class, these are some of the concerns I've had.

Thanks for listening. I'm curious what you guys think about all of this.

alimanfoo · 2019-04-01T13:27:42Z

Hi Dale, demo notebook is very cool, thanks a lot for posting. I'm on leave for a couple of weeks but look forward to digging a bit deeper.

…

On Mon, 1 Apr 2019, 14:21 Dale Tovar, ***@***.***> wrote: Hey there, For a project I've been working on I wanted a zarr-based sparse matrix class so I recently made one: https://github.com/daletovar/zsparse I've added a notebook with a few examples. After it gets to a more stable and faster place I was planning on making it a stand alone package. However, I've been thinking it might make sense to just add it to zarr. I won't be offended if you guys aren't interested. At the very least, I thought you guys might like to know about it, especially because it solves #152 <#152>. Right now there's support for csr and csc matrices and saving and loading pydata/sparse arrays. A potential problem with making a COO class for pydata/sparse is that doing a large number of binary searches on zarr arrays takes much longer than it does for numpy arrays. The code would also need to be written in cython instead of numba because numba doesn't support zarr. I'd like to see how cython does on the csr and csc classes as it's all currently written in pure python. All of this is to say, if you were wondering why there isn't a COO class, these are some of the concerns I've had. Thanks for listening. I'm curious what you guys think about all of this. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#424>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8Qs4zFK9aVFEo7TEV4scZh2oumsT_ks5vcaVzgaJpZM4cU8Do> .

daletovar · 2019-04-03T02:15:43Z

Thanks, I appreciate that.

hammer · 2020-10-24T12:42:04Z

As the only open issue I could find about storing sparse arrays in Zarr, I thought I'd comment here that the AnnData project's .h5ad file documentation claims that "Sparse arrays don’t have a native representations in HDF5 or Zarr, so we've defined our own". It may be worth extending the Zarr spec to formalize how sparse arrays are stored in Zarr. My apologies if that's already been done.

rabernat · 2022-10-07T13:45:52Z

Now that we have the meta_array option (see #934), which allows loading of data to different array types, it should be more straightforward to implement some sort of sparse support (i.e. meta_array=sparse.SparseArray). We would just need to pick an on-disk storage format, which could potentially be implemented as a numcodecs codec.

Perhaps now is the time to revisit this feature.

cc @alxmrs

jakirkham · 2022-10-07T15:59:09Z

cc @ivirshup (who has also expressed interest in some form of sparse support in Zarr)

daletovar mentioned this issue Apr 3, 2019

Storing Sparse arrays to Zarr pydata/sparse#222

Open

alimanfoo mentioned this issue May 29, 2019

Database sources where each array element is a separate database row #438

Open

benbovy mentioned this issue Feb 8, 2021

Sparse arrays xarray-contrib/xarray-simlab#165

Open

joshmoore mentioned this issue Sep 23, 2021

Outreachy project proposals (Oct. 2021) zarr-developers/community#39

Closed

elyall mentioned this issue May 25, 2023

Adding sparse array support zarr-developers/zarr-specs#245

Open

gareth-j mentioned this issue Oct 17, 2023

Have a look into sparse storage of the footprint data openghg/openghg#778

Open

dstansby added the enhancement New features or improvements label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interest in a zarr.sparse module? #424

Interest in a zarr.sparse module? #424

daletovar commented Apr 1, 2019

alimanfoo commented Apr 1, 2019 via email

daletovar commented Apr 3, 2019

hammer commented Oct 24, 2020

rabernat commented Oct 7, 2022

jakirkham commented Oct 7, 2022

Interest in a zarr.sparse module? #424

Interest in a zarr.sparse module? #424

Comments

daletovar commented Apr 1, 2019

alimanfoo commented Apr 1, 2019 via email

daletovar commented Apr 3, 2019

hammer commented Oct 24, 2020

rabernat commented Oct 7, 2022

jakirkham commented Oct 7, 2022