Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unmeshed lat/lon w/ different names and in Datasets w/ multiple data_vars #5

Closed
spencerahill opened this issue Oct 18, 2017 · 6 comments

Comments

@spencerahill
Copy link

spencerahill commented Oct 18, 2017

Thanks for xESMF! It's been great for my recent re-gridding needs.

The data I need to regrid generally has lat and lon each stored as 1D arrays, rather than as meshes, and with names 'lat' and 'lon', respectively, rather than 'x', and 'y'. Also, I am regridding Datasets with multiple data_vars.

Each of these issues required a little bit of extra code before it would work with xESMF. Since I think these are common use cases, I thought I'd paste it here. Hopefully they're pretty self explanatory...I didn't take the time to write full docstrings. The logic could likely be improved also, especially the last one.

You're welcome to incorporate whatever you'd like of them into xESMF if you'd like. But no problem if not.

import numpy as np
import xarray as xr
import xesmf as xe

LON_STR = 'lon'
LAT_STR = 'lat'

def _replace_dim(da, olddim, newdim):
    """From https://github.com/pydata/xarray/issues/1273."""
    renamed = da.rename({olddim: newdim.name})

    # note that alignment along a dimension is skipped when you are overriding
    # the relevant coordinate values
    renamed.coords[newdim.name] = newdim
    return renamed


def meshed_latlon_arr(arr, lon_str=LON_STR, lat_str=LAT_STR):
    """Create new array with meshed lat and lon coordinates."""
    lonmesh, latmesh = np.meshgrid(arr[lon_str], arr[lat_str])
    renamed = arr.rename({lat_str: 'y', lon_str: 'x'})
    renamed.coords[lat_str] = (('y', 'x'), latmesh)
    renamed.coords[lon_str] = (('y', 'x'), lonmesh)
    return renamed
       

def meshed_latlon_ds(ds, lon_str=LON_STR, lat_str=LAT_STR):
    """Create new dataset with meshed lat and lon coordinates."""
    meshed_arrs = {}
    for name, arr in ds.data_vars.items():
        meshed_arrs.update({name: meshed_latlon_arr(arr)})
    return xr.Dataset(data_vars=meshed_arrs, attrs=ds.attrs)
    

def regrid_unmeshed_arr(arr, target_grid, method='bilinear'):
    """Regrid the DataArray that has 1D lat and lon coordinates."""
    meshed = meshed_latlon_ds(arr)
    return xe.regrid(meshed, target_grid, 
                     meshed, method=method, verbose=True)

    
def regrid_unmeshed_ds(ds, target_grid, method='bilinear'):
    """Regrid the Dataset that has 1D lat and lon coordinates."""
    meshed = meshed_latlon_ds(ds)
    
    data_vars = {}
    for name, data_var in meshed.data_vars.items():
        regridded = xe.regrid(meshed, target_grid, 
                              data_var, method=method)
        data_vars.update({name: regridded})
    
    coords = {}
    xy_set = False
    for name, coord in meshed.coords.items():
        if ('y' in coord or 'x' in coord) and name not in ('y', 'x', 'lat', 'lon'):
            regridded = xe.regrid(meshed, target_grid, 
                                  coord, method=method)
            coords.update({name: regridded})
            if not xy_set:
                coords.update({'y': regridded['y']})
                coords.update({'x': regridded['x']})
                coords.update({'lat': regridded['lat']})
                coords.update({'lon': regridded['lon']})
                xy_set = True
        elif name not in ('y', 'x', 'lat', 'lon'):
            coords.update({name: coord})

    return xr.Dataset(data_vars=data_vars,
                      coords=coords, attrs=ds.attrs) 
@JiaweiZhuang
Copy link
Owner

Thanks for being interested in xESMF and for sharing the codes!

Both are good suggestions. I am planning to re-design xESMF internals in the next version (solve #2 ), after that it should have much more native support for xarray datatypes 😄

@spencerahill
Copy link
Author

spencerahill commented Oct 19, 2017

Cool, looking forward to it!

One other feature I was hoping to eventually implement was auto-generating meshed lon and lat bound arrays (when they are missing or 1D), in order to support conservative interpolation.

@spencerahill
Copy link
Author

One other feature I was hoping to eventually implement was auto-generating meshed lon and lat bound arrays (when they are missing or 1D), in order to support conservative interpolation.

I have now implemented this:

LAT_STR = 'lat'
LON_STR = 'lon'
LAT_BOUNDS_STR = 'lat_b'
LON_BOUNDS_STR = 'lon_b'
X_STR = 'x'
Y_STR = 'y'
X_BOUNDS_STR = 'x_b'
Y_BOUNDS_STR = 'y_b'


def add_lat_lon_bounds(arr, lat_str=LAT_STR, lon_str=LON_STR,
                       lat_bounds_str=LAT_BOUNDS_STR,
                       lon_bounds_str=LON_BOUNDS_STR,
                       lat_min=-90., lat_max=90.,
                       lon_min=0., lon_max=360.):
    """Add bounding arrays to lat and lon arrays."""
    lon_vals = arr[lon_str].values
    lon_bounds_values = 0.5*(lon_vals[:-1] + lon_vals[1:])
    lon_bounds = np.concatenate([[lon_min], lon_bounds_values, [lon_max]])
    
    lat_vals = arr[lat_str].values
    lat_bounds_values = 0.5*(lat_vals[:-1] + lat_vals[1:])
    lat_bounds = np.concatenate([[lat_min], lat_bounds_values, [lat_max]])
    
    ds = arr.to_dataset()
    ds.coords[lon_bounds_str] = xr.DataArray(lon_bounds, dims=[X_BOUNDS_STR])
    ds.coords[lat_bounds_str] = xr.DataArray(lat_bounds, dims=[Y_BOUNDS_STR])
    
    return ds


def mesh_latlon_arrs_and_bounds(arr, lat_str=LAT_STR, lon_str=LON_STR,
                                lat_bounds_str=LAT_BOUNDS_STR,
                                lon_bounds_str=LON_BOUNDS_STR,
                                x_str=X_STR, y_str=Y_STR,
                                x_bounds_str=X_BOUNDS_STR,
                                y_bounds_str=Y_BOUNDS_STR, 
                                lat_min=-90., lat_max=90., 
                                lon_min=0., lon_max=360.):
    """Add lat/lon bounds and mesh lat, lon, and bounds.
    
    Resulting xarray.Dataset can be used by xESMF.regrid using the 
    'conservative' method.
    
    """
    arr_with_bounds = add_lat_lon_bounds(
        arr, lat_str=lat_str, lon_str=lon_str, 
        lat_bounds_str=lat_bounds_str,
        lon_bounds_str=lon_bounds_str, lat_min=lat_min, lat_max=lat_max,
        lon_min=lon_min, lon_max=lon_max)
    
    meshed_with_bounds = meshed_latlon_arr(arr_with_bounds,
                                           lon_str=lon_str, lat_str=lat_str,
                                           x_str=x_str, y_str=y_str)
    
    return meshed_latlon_arr(meshed_with_bounds, 
                             lat_str=lat_bounds_str, 
                             lon_str=lon_bounds_str,
                             x_str=x_bounds_str, y_str=y_bounds_str)

meshed_latlon_arr is as defined in my previous comment. The dataset outputted by the mesh_latlon_arrs_and_bounds function is then ready to be used for conservative interpolation:

meshed = mesh_latlon_arrs_and_bounds(arr)
conservative = xe.regrid(meshed, target_grid180, meshed['precip'], 
                         method='conservative')

FYI, as you suggested in #10 (comment), I can confirm that using conservative interpolation does prevent the missing wraparound point.

@JiaweiZhuang
Copy link
Owner

There are 3 issues in this thread (better break down?):

v0.1.1 now accepts 1D arrays for rectilinear grids (demo).

The ability to regrid xarray.DataSet is planned for the next version.

The center-bound switching is pretty tricky. For rectilinear grids, I can add a convenience function to xesmf.util, but inferring bounds for curvilinear grids can be hard especially when the grid spans over the pole (related issue: pydata/xarray#781).

@rabernat
Copy link

I think it should be possible to resolve some of these issues via integration with xgcm. I opened #13 to discuss how this might work.

@JiaweiZhuang
Copy link
Owner

v0.2 now takes multi-variable dataset as input.

Use #13 for the center-bound switching problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants