Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: regrid2 with open_mfdataset #615

Closed
lee1043 opened this issue Mar 15, 2024 · 6 comments · Fixed by #613 or #634
Closed

[Bug]: regrid2 with open_mfdataset #615

lee1043 opened this issue Mar 15, 2024 · 6 comments · Fixed by #613 or #634
Assignees

Comments

@lee1043
Copy link
Collaborator

lee1043 commented Mar 15, 2024

What happened?

My actual use case is opening multiple netcdf files using open_mfdataset and do regrid using regrid2, and there were about half of CMIP6 models that regrid2 was failing. I tracked down to where the issue was coming from.

I found that for those models, if I open a single netcdf file via open_dataset regrid2 works well, but if I open the same file via open_mfdataset regrid2 fails.

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

import os
import xcdat as xc

target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0)

input_data = os.path.join(
    "/p/css03/esgf_publish/",
    "CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_dataset(input_data)
ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")  # WORKS FINE

ds = xc.open_mfdataset(input_data)
ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")  # ERROR

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[54], line 1
----> 1 ds_regrid = ds.regridder.horizontal("psl", target_grid, tool="regrid2")

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/accessor.py:324, in RegridderAccessor.horizontal(self, data_var, output_grid, tool, **options)
    322 input_grid = _get_input_grid(self._ds, data_var, ["X", "Y"])
    323 regridder = regrid_tool(input_grid, output_grid, **options)
--> 324 output_ds = regridder.horizontal(data_var, self._ds)
    326 return output_ds

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:81, in Regrid2Regridder.horizontal(self, data_var, ds)
     78     # Xarray defaults to masking with np.nan, CDAT masked with _FillValue or missing_value which defaults to 1e20
     79     input_data_var = input_data_var.where(src_mask != 0.0, masked_value)
---> 81 output_data = _regrid(
     82     input_data_var, src_lat_bnds, src_lon_bnds, dst_lat_bnds, dst_lon_bnds
     83 )
     85 output_ds = _build_dataset(
     86     ds,
     87     data_var,
   (...)
     92     self._output_grid,
     93 )
     95 return output_ds

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:105, in _regrid(input_data_var, src_lat_bnds, src_lon_bnds, dst_lat_bnds, dst_lon_bnds)
     98 def _regrid(
     99     input_data_var: xr.DataArray,
    100     src_lat_bnds: np.ndarray,
   (...)
    103     dst_lon_bnds: np.ndarray,
    104 ) -> np.ndarray:
--> 105     lat_mapping, lat_weights = _map_latitude(src_lat_bnds, dst_lat_bnds)
    106     lon_mapping, lon_weights = _map_longitude(src_lon_bnds, dst_lon_bnds)
    108     # convert to pure numpy

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:258, in _map_latitude(src, dst)
    252 bounds = [
    253     (np.minimum(dst_north[x], src_north[y]), np.maximum(dst_south[x], src_south[y]))
    254     for x, y in enumerate(mapping)
    255 ]
    257 # convert latitude to cell weight (difference of height above/below equator)
--> 258 weights = [
    259     (np.sin(np.deg2rad(x)) - np.sin(np.deg2rad(y))).reshape((-1, 1))
    260     for x, y in bounds
    261 ]
    263 return mapping, weights

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/xcdat/regridder/regrid2.py:259, in <listcomp>(.0)
    252 bounds = [
    253     (np.minimum(dst_north[x], src_north[y]), np.maximum(dst_south[x], src_south[y]))
    254     for x, y in enumerate(mapping)
    255 ]
    257 # convert latitude to cell weight (difference of height above/below equator)
    258 weights = [
--> 259     (np.sin(np.deg2rad(x)) - np.sin(np.deg2rad(y))).reshape((-1, 1))
    260     for x, y in bounds
    261 ]
    263 return mapping, weights

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape)
   2217 if len(shape) == 1 and not isinstance(shape[0], Number):
   2218     shape = shape[0]
-> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/reshape.py:218, in reshape(x, shape, merge_chunks, limit)
    216     if len(shape) == 1 and x.ndim == 1:
    217         return x
--> 218     missing_size = sanitize_index(x.size / reduce(mul, known_sizes, 1))
    219     shape = tuple(missing_size if s == -1 else s for s in shape)
    221 if np.isnan(sum(x.shape)):

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/slicing.py:72, in sanitize_index(ind)
     66     return slice(
     67         _sanitize_index_element(ind.start),
     68         _sanitize_index_element(ind.stop),
     69         _sanitize_index_element(ind.step),
     70     )
     71 elif isinstance(ind, Number):
---> 72     return _sanitize_index_element(ind)
     73 elif is_dask_collection(ind):
     74     return ind

File ~/.conda/envs/pcmdi_metrics_dev_20240220/lib/python3.10/site-packages/dask/array/slicing.py:26, in _sanitize_index_element(ind)
     24 """Sanitize a one-element index."""
     25 if isinstance(ind, Number):
---> 26     ind2 = int(ind)
     27     if ind2 != ind:
     28         raise IndexError("Bad index.  Must be integer-like: %s" % ind)

ValueError: cannot convert float NaN to integer

Anything else we need to know?

I am using the latest xCDAT main branch version.

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.108.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.2.0
pandas: 2.2.0
numpy: 1.23.5
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2024.2.0
distributed: 2024.2.0
matplotlib: 3.7.1
cartopy: 0.22.0
seaborn: 0.12.2
numbagg: None
fsspec: 2024.2.0
cupy: None
pint: None
sparse: 0.15.1
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.3.1
mypy: None
IPython: 8.11.0
sphinx: 5.3.0

@tomvothecoder
Copy link
Collaborator

@jasonb5 Maybe this can be addressed in #613?

@jasonb5
Copy link
Collaborator

jasonb5 commented Mar 19, 2024

I'll followup this week and check with #613.

@tomvothecoder
Copy link
Collaborator

@jasonb5 Sounds good, thanks.

@tomvothecoder
Copy link
Collaborator

I reproduced this issue on the current latest version of xcdat (v0.6.1) and am trying to debug it.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Mar 28, 2024

This issue will be fixed by PR #613. Here is the PR review comment with more info on the root cause and solution.

@lee1043
Copy link
Collaborator Author

lee1043 commented Apr 1, 2024

@tomvothecoder @jasonb5 I confirmed that the above minimal code works well with #613.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment