Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiindex level serialization after reset_index #8672

Merged
merged 4 commits into from
Jan 31, 2024

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Jan 26, 2024

@benbovy benbovy requested a review from dcherian January 26, 2024 10:59
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't fully understand but happy to defer to you.

@dcherian dcherian enabled auto-merge (squash) January 26, 2024 16:58
@dcherian dcherian merged commit f9f4c73 into pydata:main Jan 31, 2024
27 of 29 checks passed
@benbovy benbovy deleted the fix-multiindex-level-serialization branch February 5, 2024 12:31
@slevang
Copy link
Contributor

slevang commented Feb 6, 2024

This does not seem to have fixed xarray-contrib/xeofs#148, which worked prior to 2024.1.0

@benbovy
Copy link
Member Author

benbovy commented Feb 6, 2024

@slevang could you provide an example (MVCE), please?

@slevang
Copy link
Contributor

slevang commented Feb 6, 2024

An MCVE with xeofs is:

import xarray as xr
import xeofs

data = xr.tutorial.open_dataset('air_temperature').air
model = xeofs.models.EOF(n_modes=10)
model.fit(data, "time")
model.save("test.zarr")
File ~/xarray/xarray/conventions.py:92, in ensure_not_multiindex(var, name)
     90     name = var.name
     91 if var.dims == (name,):
---> 92     raise NotImplementedError(
     93         f"variable {name!r} is a MultiIndex, which cannot yet be "
     94         "serialized. Instead, either use reset_index() "
     95         "to convert MultiIndex levels into coordinate variables instead "
     96         "or use https://cf-xarray.readthedocs.io/en/latest/coding.html."
     97     )

NotImplementedError: variable 'feature' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html.

I'm trying to figure out a pure xarray version though. We should just be doing the same stack(feature=[...]) and reset_index("feature") ops as in #8628 but there must be something else in there.

@slevang
Copy link
Contributor

slevang commented Feb 6, 2024

Looks like the key difference is that in the previous issue, isinstance(var, IndexVariable) evaluates to False, then True in my case so name gets set and trips the error. Not sure how we get there.

@benbovy
Copy link
Member Author

benbovy commented Feb 6, 2024

Ok, we are still raising that error (and we shouldn't do so) in the case where a reset multi-index level coordinate has been renamed to its dimension:

import pandas as pd
import xarray as xr

midx = pd.MultiIndex.from_product([["a", "b"]], names=["foo"])
coords = xr.Coordinates.from_pandas_multiindex(midx, "x")

ds = xr.Dataset(coords=coords)
ds_reset = ds.reset_index("x").rename_vars(foo="x")

type(ds_reset.x.variable._data)
# xarray.core.indexing.PandasMultiIndexingAdapter

ds_reset.to_netcdf("test.nc")
# error

Instead of adding more checks like in this PR, I think that a proper fix would be to unwrap the multi-index level coordinate variable data from xarray.core.indexing.PandasMultiIndexingAdapter (e.g., convert it to numpy arrays) after resetting or dropping the multi-index.

@slevang
Copy link
Contributor

slevang commented Feb 23, 2024

@benbovy any more guidance on how to go about fixing this edge case? I can try a PR but am pretty unfamiliar with the new indexers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

objects remain unserializable after reset_index
3 participants