-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-scale datasets and custom indexes #5376
Comments
I don't think I am familiar enough to really judge between the suggestions, @benbovy, but I'm intrigued. I think there's certainly something to be won just by having a data structure which says these arrays/datasets represent a multiscale series. One real benefit though will be when access of that structure can simplify the client code needed to interactively load that data, e.g. with prefetching. |
I agree, but I'm wondering whether the multiscale series couldn't be also viewed as something that can be abstracted away, i.e., the original dataset (level 0) is the "real" dataset while all other levels are some derived datasets that are convenient for some specific applications (e.g., visualization) but not very useful for general use. Having a single Some related questions (out of curiosity):
|
I'm not sure when dynamic downsampling would be preferred over loading previously downsampled images from disk. In my usage, the application consuming the multiresolution images is an interactive data visualization tool and the goal is to minimize latency / maximize responsiveness of the visualization, and this would be difficult if the multiresolution images were generated dynamically from the full image -- under a dynamic scheme the lowest resolution image, i.e. the one that should be fastest to load, would instead require the most I/O and compute to generate....
Although I do not do this today, I can think of a lot of uses for this functionality -- an data processing pipeline could expose intermediate data over http via xpublish, but this would require a good caching layer to prevent re-computing the same region of the data repeatedly. |
@benbovy I also agree that a data structure that encapsulates a scale into a nice API, where you set the scale currently desired, and the same Xarray Dataset/DataArray API is available, and that scale can optionally be lazily be loaded. Maybe an Index as proposed could be a good API, but I do not have a good enough understanding of how the interface is used in general. What would be other examples like Regarding dynamic multi-scale, etc., one use case of interest is where you are interactively processing a larger-then memory dataset, and want to visualize the result over a limited domain on an intermediate scale. |
I do think multi-scale datasets are common enough across different scientific fields (remote sensing, bio-imaging, simulation output, etc) that this could be worth considering. |
There can be many examples like spatial indexes, complex grid indexes (select cell centers/faces of a staggered grid), distributed indexes, etc. Some of them are illustrated in a presentation I gave a couple of weeks ago (slides here). Although all those examples actually do data indexing. In the multi-scale context, I admit that the name "index" may sound confusing since an Such The goal with Xarray custom indexes is to allow (many) kinds of objects with a scope possibly much more narrow than, e.g., |
I've been wondering if:
Dataset
and/orDataArray
method(s)I'm thinking of an API that would look like this:
where
ImagePyramidIndex
is not a "common" index, i.e., it cannot be used directly with Xarray's.sel()
nor for data alignment. Using an index here might still make sense for such data extraction and resampling operation IMHO. We could extend thexarray.Index
API to handle multi-scale datasets, so thatImagePyramidIndex
could either do the scaling dynamically (maybe using a cache) or just lazily load pre-computed data, e.g., from a NGFF / OME-Zarr dataset... Both the implementation and functionality can be pretty flexible. Custom options may be passed through the Xarray API either when creating the index or when extracting a data slice.A hierarchical structure of
xarray.Dataset
objects is already discussed in #4118 for multi-scale datasets, but I'm wondering if using indexes could be an alternative approach (it could also be complementary, i.e.,ImagePyramidIndex
could rely on such hierarchical structure under the hood).I'd see some advantages of the index approach, although this is the perspective from a naive user who is not working with multi-scale datasets:
xarray.Dataset
+ a "black-box" index in which we abstract away all the implementation details. The API example shown above seems more intuitive to me than having to deal directly with Dataset groups.ImagePyramidIndex
variants. Xarray already provides an extension mechanism (accessors) for methods likesel_and_rescale
in the example above...That said, I'd also see the benefits of exposing Dataset groups more transparently to users (in case those are loaded from a store that supports it).
cc @thewtex @joshmoore @d-v-b
The text was updated successfully, but these errors were encountered: