-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unconfigured coordinate variables for SHOC Standard Datasets #132
Comments
Experiment 1 trying to find a workaround for the time dimension index:
this does make the
But it also uses the dimension name as the coordinate name (they should be different) and removes |
Experiment 2:
This also sets up an indexed coordinate variable that uses the dimension name:
BUT unlike the previous experiment, it doesn't remove the |
Possibly related? pydata/xarray#4417 |
xarray handles indexing in a way that is not useful for SHOC standard datasets. Specifically it will only create indexes for coordinates whos variable name exactly matches the single dimension name it uses. Fixing this is (I think) part of https://github.com/pydata/xarray/projects/1, also see the discussion on the flexible indexing design document As it stands, emsarray does not modify any datasets when they are opened. Adding an index on the time coordinate is useful, but doing so automatically poses some difficulties. As emsarray is exposed as an xarray accessor it isn't always involved in opening datasets. For instances where the dataset is opened through other means, emsarray is invoked lazily. This means there is no reliable way for emsarray to modify a dataset automatically in a way that will work consistently. A new method such as def make_indexable(dataset: xarray.Dataset, name: Hashable) -> xarray.Dataset:
"""
Ensure that `name` is an indexable coordinate in `dataset`.
This is useful for adding indexes for coordinates whos name does not match the dimension name,
e.g. a 'time' coordinate defined on the 'record' dimension.
"""
if name not in dataset.coords:
dataset = dataset.set_coords([name])
if name not in dataset.indexes:
dataset = dataset.set_xindex([name])
return dataset
dataset = emsarray.open_dataset(...)
dataset = make_indexable(dataset, dataset.ems.get_time_name()) |
I am closing this issue, as the issue has little to do with emsarray itself. This has more to do with how xarray chooses what variables to automatically add indexes to. For the cases where xarray does not choose to add an index to a variable, the above comment has an applicable code example. The function uses no emsarray-specific functionality, and it is applicable to all dataset conventions where xarray may not add the desired indexes. |
If you open a SHOC standard NetCDF file with emsarray, the
xarray.Dataset
has properties like:Note that this includes quite a few dimensions which are not linked to coordinate variables, including the
t
variable which is actually the time-coordinate variable for therecord
dimension. Because that coordinate/dim combination is not configured, it is not possible to use xarray time coordinate slicing on a SHOC standard dataset.It would be great if more (or all!) of the SHOC standard coordinate variables could be configured at
emsarray/src/emsarray/conventions/shoc.py
Line 44 in 7c1baf8
The text was updated successfully, but these errors were encountered: