Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow chunk spec per variable #4623

Open
ravwojdyla opened this issue Nov 30, 2020 · 3 comments
Open

Allow chunk spec per variable #4623

ravwojdyla opened this issue Nov 30, 2020 · 3 comments

Comments

@ravwojdyla
Copy link

Say, I have a zarr dataset with multiple variables Foo, Bar and Baz (and potentially, many more), there are 2 dimensions: x, y (potentially more). Say both Foo and Bar are large 2d arrays dims: x, y, Baz is relatively small 1d array dim: y. Say I would like to read that dataset with xarray but increase chunk from the native zarr chunk size for x and y but only for Foo and Bar, I would like to keep native chunking for Baz. afaiu currently I would do that with chunks parameter to open_dataset/open_zarr, but if I do do that via say dict(x=N, y=M) that will change chunking for all variables that use those dimensions, which isn't exactly what I need, I need those changed only for Foo and Bar. Is there a way to do that? Should that be part of the "harmonisation"? One could imagine that xarray could accept a dict of dict akin to {var: {dim: chunk_spec}} to specify chunking for specific variables.

Note that rechunk after reading is not what I want, I would like to specify chunking at read op.

Originally posted by @ravwojdyla in #4496 (comment)

@shoyer
Copy link
Member

shoyer commented Dec 17, 2020

This seems like a totally reasonable feature to add. The main tricky part would be figuring out the syntax, since we already use dictionaries like {dim: chunk_spec}. It's not obvious to me if a nested dict would mean {var: {dim: chunk_spec}} or {dim: {var: chunk_spec}}`. Perhaps we should try to come up with another, more explicit option?

@ravwojdyla
Copy link
Author

ravwojdyla commented Dec 18, 2020

Thought through a couple of options, including simple value classes, but in the end they did not fit the current API. If we try to stick with the current style, it makes a bit more sense to go in the direction of {dim: {var: chink_spec}} since there is already {dim: x}, so should a user want a variables specific chunking they would need to adjust it to {dim: {var: y, ...:x}}, .../Ellipsis standing for "all other variables" with dim. wdyt @shoyer?

@keewis
Copy link
Collaborator

keewis commented Dec 19, 2020

we could also allow special cases: {dim: x, (dim, var): y}, where dim: x has the same effect as (dim,): x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants