Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_zarr() failing after concatenating netcdfs with different time indexes #4347

Open
playertr opened this issue Aug 18, 2020 · 5 comments
Open

Comments

@playertr
Copy link

What happened:
After concatenating two NetCDF DataSets with different cftime.DateTimeNoLeap coordinates, attempting to write to a Zarr store with ds.to_zarr() fails with an OutOfBoundsDatetime exception.

What you expected to happen:
I expect to_zarr() to execute successfully.

Minimal Complete Verifiable Example:

import xarray as xr
import cftime
import pandas as pd

# open a generic CESM dataset containing a time_bnds variable
url = 'http://adss.apcc21.org/opendap/CMIP5DB/cmip5_daily_BT/pr_day_CESM1-BGC_rcp85_r1i1p1_20760101-21001231.nc'
ds = xr.open_dataset(url)

# create two new DataSets with different, overlapping time indexes.
ds2 = ds.sel(time=slice(None, cftime.DatetimeNoLeap(2076, 3, 1, 1, 0, 0, 0)))
ds3 = ds.sel(time=slice(None, cftime.DatetimeNoLeap(2076, 2, 1, 1, 0, 0, 0)))

# concatenate the two DataSets, using the default fillvalue
ds4 = xr.concat([ds2, ds3], dim=pd.Index(['ds2','ds3'], name='ds'))

# fails with OutOfBoundsDatetime exception
zs = ds4.to_zarr('/tmp/my_zarr.zarr')

Anything else we need to know?:
I believe the problem is related to the implicit NaN fillvalue used in concatenating the time_bnds variable. I arrived at the code while trying to produce a minimal example of a similar error, where a DataSet concatenation would fail with a SerializationError if I tried to concatenate multiple datasets containing time_bnds variables. In the above example and in my earlier troubleshooting in production code, removing time_bnds with ds = ds.drop('time_bnds') made the to_zarr() command work.

This is a common use case since time_bnds indexes are generated by CESM climate model output.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.7 (default, Mar 23 2020, 22:36:06)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.3.0-1032-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.20.0
distributed: 2.20.0
matplotlib: 3.2.2
cartopy: 0.17.0
seaborn: None
numbagg: None
pint: None
setuptools: 49.2.0.post20200714
pip: 20.1.1
conda: None
pytest: None
IPython: 7.16.1
sphinx: None
/home/ubuntu/a

@spencerkclark
Copy link
Member

Thanks for raising this issue. Indeed, there is currently not a well-defined way of handling missing cftime values, so I'm not surprised that encoding failed with an obscure error. I think a good path forward here would be to start with addressing Unidata/cftime#145, which we could build off of in xarray.

@dcherian
Copy link
Contributor

Maybe we can raise a more useful error linking back to this issue?

@spencerkclark
Copy link
Member

For sure, that sounds like a good idea in the meantime.

@playertr
Copy link
Author

Thank you! I second the idea for a more useful error, Deepak. Hopefully this Github issue also helps people whose concatenations mysteriously fail, find a workaround in the meantime.

@stale
Copy link

stale bot commented Apr 28, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 28, 2022
@dcherian dcherian removed the stale label Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants