Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow engine='zarr' and passing args for new xarray API #89

Closed
wants to merge 2 commits into from

Conversation

martindurant
Copy link
Member

No description provided.

Comment on lines +28 to +30
self._ds = xr.open_dataset(self.urlpath, engine='zarr',
backend_kwargs={"storage_options": self.storage_options},
**self.kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add in xr.open_mfdataset in here, or do it in a separate PR?

Suggested change
self._ds = xr.open_dataset(self.urlpath, engine='zarr',
backend_kwargs={"storage_options": self.storage_options},
**self.kwargs)
if "*" in url or isinstance(url, list):
_open_dataset = xr.open_mfdataset
else:
_open_dataset = xr.open_dataset
self._ds = _open_dataset(self.urlpath, engine='zarr',
backend_kwargs={"storage_options": self.storage_options},
**self.kwargs)

Probably need to include more code following the NetCDF driver at

if "*" in url or isinstance(url, list):
_open_dataset = xr.open_mfdataset
if self.pattern:
kwargs.update(preprocess=self._add_path_to_ds)
if self.combine is not None:
if 'combine' in kwargs:
raise Exception("Setting 'combine' argument twice in the catalog is invalid")
kwargs.update(combine=self.combine)
if self.concat_dim is not None:
if 'concat_dim' in kwargs:
raise Exception("Setting 'concat_dim' argument twice in the catalog is invalid")
kwargs.update(concat_dim=self.concat_dim)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but let's make it a separate follow-on PR (since the zarr backend didn't support multiple datasets before anyway). That code looks like it doesn't have anything specific to zarr, so it should be reusable.

Copy link
Collaborator

@weiji14 weiji14 Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍 Edit: Actually, I just remembered that @Mikejmnez had a PR for multiple zarr at #80 already!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do have something kind of working. @martindurant was working on doing some changes at the xarray level that would have changed the way multiple zarr files are passed through intake. I see that that pull request hasn't passed... I guess I am just waiting on the new implemented changes and to see how tests are going to be changing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be close, that PR at pydata/xarray#4461 is approved, just needs to get merged

@@ -45,6 +47,7 @@ def test_list(data_server):


def test_open_rasterio(data_server):
pytest.importorskip("resterio")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be rasterio?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem likely... It would not have made a difference on CI, since rasterio is included in the env.

@martindurant
Copy link
Member Author

(reminder: this PR is no good until pydata/xarray#4461 is merged/reworked/whatever and released)

@scottyhq
Copy link
Collaborator

Catching up on intake and xarray development this morning ;) If I'm following correctly, this PR now depends on pydata/xarray#4823, and it would close #70 possibly #94 as well

@martindurant
Copy link
Member Author

Yes, that is the new PR. There is no hurry to get this in.

@martindurant
Copy link
Member Author

I mean, no hurry on the intake-xarray PR (this one), but I really do want to get the xarray one done!

@weiji14
Copy link
Collaborator

weiji14 commented Mar 15, 2021

Catching up on intake and xarray development this morning ;) If I'm following correctly, this PR now depends on pydata/xarray#4823, and it would close #70 possibly #94 as well

Looks like pydata/xarray#4823 got merged and xarray v0.17.0 has been released. Anyone want to work on this? If not I'll try and get my head around what happened and move things forward.

@martindurant
Copy link
Member Author

@weiji14 , you are welcome to.

However, there i also a lot of conversation around how to pass URLs (or file objects, mappers) to xarray which depends on the backend eventually selected. intake-xarray is in a good position to make decisions and add functionality such as path-name interpreting, but in many cases (like zarr), it makes sense to defer to xrray's logic when it exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants