[Bug]: regrid2 with open_mfdataset #615

lee1043 · 2024-03-15T17:01:51Z

What happened?

My actual use case is opening multiple netcdf files using open_mfdataset and do regrid using regrid2, and there were about half of CMIP6 models that regrid2 was failing. I tracked down to where the issue was coming from.

I found that for those models, if I open a single netcdf file via open_dataset regrid2 works well, but if I open the same file via open_mfdataset regrid2 fails.

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

import os
import xcdat as xc

target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0)

input_data = os.path.join(
    "/p/css03/esgf_publish/",
    "CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_dataset(input_data)
ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")  # WORKS FINE

ds = xc.open_mfdataset(input_data)
ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")  # ERROR

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[54], line 1
----> 1 ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/accessor.py:324, in RegridderAccessor.horizontal(self, data_var, output_grid, tool, **options)
    322 input_grid = _get_input_grid(self._ds, data_var, ["X", "Y"])
    323 regridder = regrid_tool(input_grid, output_grid, **options)
--> 324 output_ds = regridder.horizontal(data_var, self._ds)
    326 return output_ds

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:81, in Regrid2Regridder.horizontal(self, data_var, ds)
     78     # Xarray defaults to masking with np.nan, CDAT masked with _FillValue or missing_value which defaults to 1e20
     79     input_data_var = input_data_var.where(src_mask != 0.0, masked_value)
---> 81 output_data = _regrid(
     82     input_data_var, src_lat_bnds, src_lon_bnds, dst_lat_bnds, dst_lon_bnds
     83 )
     85 output_ds = _build_dataset(
     86     ds,
     87     data_var,
   (...)
     92     self._output_grid,
     93 )
     95 return output_ds

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:105, in _regrid(input_data_var, src_lat_bnds, src_lon_bnds, dst_lat_bnds, dst_lon_bnds)
     98 def _regrid(
     99     input_data_var: xr.DataArray,
    100     src_lat_bnds: np.ndarray,
   (...)
    103     dst_lon_bnds: np.ndarray,
    104 ) -> np.ndarray:
--> 105     lat_mapping, lat_weights = _map_latitude(src_lat_bnds, dst_lat_bnds)
    106     lon_mapping, lon_weights = _map_longitude(src_lon_bnds, dst_lon_bnds)
    108     # convert to pure numpy

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:258, in _map_latitude(src, dst)
    252 bounds = [
    253     (np.minimum(dst_north[x], src_north[y]), np.maximum(dst_south[x], src_south[y]))
    254     for x, y in enumerate(mapping)
    255 ]
    257 # convert latitude to cell weight (difference of height above/below equator)
--> 258 weights = [
    259     (np.sin(np.deg2rad(x)) - np.sin(np.deg2rad(y))).reshape((-1, 1))
    260     for x, y in bounds
    261 ]
    263 return mapping, weights

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:259, in <listcomp>(.0)
    252 bounds = [
    253     (np.minimum(dst_north[x], src_north[y]), np.maximum(dst_south[x], src_south[y]))
    254     for x, y in enumerate(mapping)
    255 ]
    257 # convert latitude to cell weight (difference of height above/below equator)
    258 weights = [
--> 259     (np.sin(np.deg2rad(x)) - np.sin(np.deg2rad(y))).reshape((-1, 1))
    260     for x, y in bounds
    261 ]
    263 return mapping, weights

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape)
   2217 if len(shape) == 1 and not isinstance(shape[0], Number):
   2218     shape = shape[0]
-> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/reshape.py:218, in reshape(x, shape, merge_chunks, limit)
    216     if len(shape) == 1 and x.ndim == 1:
    217         return x
--> 218     missing_size = sanitize_index(x.size / reduce(mul, known_sizes, 1))
    219     shape = tuple(missing_size if s == -1 else s for s in shape)
    221 if np.isnan(sum(x.shape)):

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/slicing.py:72, in sanitize_index(ind)
     66     return slice(
     67         _sanitize_index_element(ind.start),
     68         _sanitize_index_element(ind.stop),
     69         _sanitize_index_element(ind.step),
     70     )
     71 elif isinstance(ind, Number):
---> 72     return _sanitize_index_element(ind)
     73 elif is_dask_collection(ind):
     74     return ind

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/slicing.py:26, in _sanitize_index_element(ind)
     24 """Sanitize a one-element index."""
     25 if isinstance(ind, Number):
---> 26     ind2 = int(ind)
     27     if ind2 != ind:
     28         raise IndexError("Bad index.  Must be integer-like: %s" % ind)

ValueError: cannot convert float NaN to integer

Anything else we need to know?

I am using the latest xCDAT main branch version.

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.108.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.2.0
pandas: 2.2.0
numpy: 1.23.5
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2024.2.0
distributed: 2024.2.0
matplotlib: 3.7.1
cartopy: 0.22.0
seaborn: 0.12.2
numbagg: None
fsspec: 2024.2.0
cupy: None
pint: None
sparse: 0.15.1
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.3.1
mypy: None
IPython: 8.11.0
sphinx: 5.3.0

The text was updated successfully, but these errors were encountered:

tomvothecoder · 2024-03-15T20:45:50Z

@jasonb5 Maybe this can be addressed in #613?

jasonb5 · 2024-03-19T22:30:11Z

I'll followup this week and check with #613.

tomvothecoder · 2024-03-20T16:26:42Z

@jasonb5 Sounds good, thanks.

tomvothecoder · 2024-03-28T21:24:38Z

I reproduced this issue on the current latest version of xcdat (v0.6.1) and am trying to debug it.

tomvothecoder · 2024-03-28T22:13:37Z

This issue will be fixed by PR #613. Here is the PR review comment with more info on the root cause and solution.

lee1043 · 2024-04-01T20:47:15Z

@tomvothecoder @jasonb5 I confirmed that the above minimal code works well with #613.

github-project-automation bot added this to xCDAT Development Mar 15, 2024

github-project-automation bot moved this to Todo in xCDAT Development Mar 15, 2024

tomvothecoder assigned jasonb5 Mar 15, 2024

tomvothecoder added this to the FY24Q2 (01/01/24 - 03/31/24) milestone Mar 15, 2024

tomvothecoder moved this from Todo to In Progress in xCDAT Development Mar 20, 2024

tomvothecoder mentioned this issue Mar 28, 2024

[PR]: Update Regrid2 missing and fill value behaviors to align with CDAT and add unmapped_to_nan arg for output data #613

Merged

9 tasks

lee1043 mentioned this issue Apr 2, 2024

[PR]: Regridding nan update (dask part only) #634

Merged

9 tasks

tomvothecoder closed this as completed in #634 Apr 2, 2024

github-project-automation bot moved this from In Progress to Done in xCDAT Development Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: regrid2 with open_mfdataset #615

[Bug]: regrid2 with open_mfdataset #615

lee1043 commented Mar 15, 2024

tomvothecoder commented Mar 15, 2024

jasonb5 commented Mar 19, 2024

tomvothecoder commented Mar 20, 2024

tomvothecoder commented Mar 28, 2024

tomvothecoder commented Mar 28, 2024 •

edited

Loading

lee1043 commented Apr 1, 2024

[Bug]: regrid2 with open_mfdataset #615

[Bug]: regrid2 with open_mfdataset #615

Comments

lee1043 commented Mar 15, 2024

What happened?

What did you expect to happen? Are there are possible answers you came across?

Minimal Complete Verifiable Example (MVCE)

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

tomvothecoder commented Mar 15, 2024

jasonb5 commented Mar 19, 2024

tomvothecoder commented Mar 20, 2024

tomvothecoder commented Mar 28, 2024

tomvothecoder commented Mar 28, 2024 • edited Loading

lee1043 commented Apr 1, 2024

tomvothecoder commented Mar 28, 2024 •

edited

Loading