Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset_index not resetting levels of MultiIndex #6946

Closed
4 tasks done
aulemahal opened this issue Aug 22, 2022 · 3 comments · Fixed by #6992
Closed
4 tasks done

reset_index not resetting levels of MultiIndex #6946

aulemahal opened this issue Aug 22, 2022 · 3 comments · Fixed by #6992

Comments

@aulemahal
Copy link
Contributor

What happened?

I'm not sure my usecase is the simplest way to demonstrate the issue, but let's try anyway.

I have a DataArray with two coordinates and I stack them into a new multi-index. I want to pass the levels of that new multi-index into a function, but as dask arrays. Turns out, it is not straightforward to chunk these variables because they act like IndexVariable objects and refuse to be chunked.

Thus, I reset the multi-index, drop it, but the variables still don't want to be chunked!

What did you expect to happen?

I expected the levels to be chunkable after the sequence : stack, reset_index.

Minimal Complete Verifiable Example

import xarray as xr
ds = xr.tutorial.open_dataset('air_temperature')

ds = ds.stack(spatial=['lon', 'lat'])
ds = ds.reset_index('spatial', drop=True)  # I don't think the drop is important here.
lon_chunked = ds.lon.chunk() # woups, doesn't do anything!

type(ds.lon.variable) # xarray.core.variable.IndexVariable  # I assumed either the stack or the reset_index would have modified this type into a normal variable.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

Seems kinda related to the issues around reset_index. I thinks this is related to (but not a duplicate of) #4366.

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('en_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.22.4
scipy: 1.9.0
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: 2.12.0
cftime: 1.6.1
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.8.0
distributed: 2022.8.0
matplotlib: 3.5.2
cartopy: 0.20.3
seaborn: None
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 63.4.2
pip: 22.2.2
conda: None
pytest: None
IPython: 8.4.0
sphinx: 5.1.1

@aulemahal aulemahal added bug needs triage Issue that has not been reviewed by xarray team member labels Aug 22, 2022
@benbovy benbovy self-assigned this Aug 31, 2022
@benbovy benbovy added regression topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 5, 2022
@benbovy
Copy link
Member

benbovy commented Sep 5, 2022

Thanks for the report @aulemahal!

I've found another regression with reset_index and MultiIndex -> #6989.

@benbovy benbovy mentioned this issue Sep 5, 2022
10 tasks
@aulemahal
Copy link
Contributor Author

aulemahal commented Sep 8, 2022

I found another problem that this issue causes : the dataset can't be written to netCDF and the error message suggests exactly what has already been done:

import xarray as xr
ds = xr.tutorial.open_dataset('air_temperature')

ds = ds.stack(spatial=['lon', 'lat'])
ds = ds.reset_index('spatial', drop=True)  

ds.to_netcdf('test.nc')

raises:

NotImplementedError: variable 'lat' is a MultiIndex, which cannot yet be serialized to netCDF files (https://github.com/pydata/xarray/issues/1077). Use reset_index() to convert MultiIndex levels into coordinate variables instead.

@benbovy I'm guessing your PR will fix this, but I'm raising the issue to be sure.

@benbovy
Copy link
Member

benbovy commented Sep 8, 2022

Thanks for raising it @aulemahal, I confirm this is fixed in #6992.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants