Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ESMF.LocStream capabilities to xESMF #81

Merged
merged 13 commits into from
Mar 6, 2020

Conversation

raphaeldussin
Copy link
Contributor

@JiaweiZhuang this PR allows xESMF to use LocStream objects (sequence of geographical points).
This is particularly useful to interpolate onto ocean sections, creating open boundary conditions,...

I have made minimal inserts, that do not change any default behavior. Also added some testing.
Let me know if there's more to be done for the PR.

Best,
Raf

@codecov-io
Copy link

codecov-io commented Feb 19, 2020

Codecov Report

Merging #81 into master will increase coverage by 0.78%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
+ Coverage   95.76%   96.55%   +0.78%     
==========================================
  Files           6        6              
  Lines         260      319      +59     
==========================================
+ Hits          249      308      +59     
  Misses         11       11
Impacted Files Coverage Δ
xesmf/frontend.py 95.58% <100%> (+1.59%) ⬆️
xesmf/backend.py 96.51% <100%> (+0.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7479110...b84f2d7. Read the comment docs.

class Regridder(object):
def __init__(self, ds_in, ds_out, method, periodic=False,
filename=None, reuse_weights=False, ignore_degenerate=None):
filename=None, reuse_weights=False, ignore_degenerate=None,
locstream_out=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locstream might also be input I think? (regridding locstream -> grid)

Comment on lines 93 to 101
if len(lon.shape) > 2:
raise ValueError("lon can only be 1d or 2d")
if len(lat.shape) > 2:
raise ValueError("lat can only be 1d or 2d")

if len(lon.shape) == 2:
lon = lon.flatten()
if len(lat.shape) == 2:
lat = lat.flatten()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better require strict 1-D shape which is the definition of locstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it! maybe the flexibility is not a good thing here actually.

@JiaweiZhuang
Copy link
Owner

Thanks very much for the PR! This is something that I've been wanting to add but never got time to actually do it. Also thanks for the unit test. What would be even more useful is an example notebook under doc/notebooks to show a use case.

I am mostly thinking about how to have a consistent API to handle various cases:

  • grid -> grid (current only option)
  • grid -> locstream (your PR)
  • locstream -> grid (this may have more demand, as the above one can also be achieved by xarray Advanced Interpolation)
  • locstream -> locstream (also possible but rarely used)

Options are:

  1. Add locstream_out and locstream_in kwarg to Regridder, following your PR
  2. Have a new subclass LocStreamRegridder to handle locstream input/output, and avoid touching the original Regridder API. It might feel like different classes under sklearn linear regressors
  3. Mark the locstream attribute in the grid object ds_in / ds_out (detect coordinate name like locations, but it won't work for dict), and avoid additional kwargs or classes.

I was leaning towards no. 2, but your approach also makes sense.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Feb 19, 2020

Another complication from LocStream is that this class can only be used for:

  • Destination for bilinear and patch (cannot be source)
  • Source and destination for nearest neighbor

It cannot be used for conservative method, for either source or destination.

Ref: ESMF Regridding Update

def test_build_regridder(method):
regridder = xe.Regridder(ds_in, ds_out, method)
@pytest.mark.parametrize('locstream', [True, False])
def test_build_regridder(method, locstream):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think locstream can be used with conservative method. Does it actually work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I hacked it on line 57 :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use more explicit expression and avoid overwriting method internally:

@pytest.mark.parametrize(
    "locstream,method", [
     (False, method_list), 
     (True, ['bilinear', 'nearest_s2d', 'nearest_d2s'])   # conservative method does not work for LocStream
])

@raphaeldussin
Copy link
Contributor Author

About the more general comments:

I went for option 1 because that's in my opinion, the one that added the less code. Once the ESMF Fields are created, ESMF Regridder works the same (not allowing all types of regridding of course).
I think adding a locstream_in a way that's symetric to locstream_out should not be a big issues if you
think that might be useful (see previous point). Even with LocStream in, I think 95+% of users will be coming for the grid-> grid interpolation. What do you think?

I'm gonna put some more tests, the codecov for my PR is not very good. An example notebook with follow once we ironed out the API details.

@JiaweiZhuang
Copy link
Owner

I think adding a locstream_in a way that's symetric to locstream_out should not be a big issues if you think that might be useful

Yeah I think it is good to have both locstream_out and locstream_in as options. The latter is useful for aligning observation data to model grid (similar to supersampling).

For the example notebook, it can be with synthetic data replicating advanced interpolation or with real data from xr.tutorial. Can revert the resulting locstream back to model grid via locstream_in.

@raphaeldussin
Copy link
Contributor Author

@JiaweiZhuang sorry for the repeated commits, it's ready to be reviewed/merged.

@raphaeldussin
Copy link
Contributor Author

@JiaweiZhuang I realized that I have missed one of your comments. Fixed now and I also expanded the test_build_regridder to include locstream_in.

Let me know if there's more that needs to be done. I'd really like to move fwd with it.

@JiaweiZhuang
Copy link
Owner

@raphaeldussin Sorry I had a long backlog. A few more comments on the test code, and it should be good to go.

Comment on lines 355 to 370

# -----------------------------------------
# testing locstream out
regridder = xe.Regridder(ds_in, ds_locs, 'bilinear', locstream_out=True)
ds_result = regridder(ds_in)
# clean-up
regridder.clean_weight_file()

# -----------------------------------------
# testing locstream in
regridder = xe.Regridder(ds_locs, ds_in, 'nearest_s2d', locstream_in=True)

outdata = regridder(ds_locs)

# clean-up
regridder.clean_weight_file()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate out the tests for locstreams into different functions


outdata = regridder(ds_locs['lat'])

# clean-up
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate out the tests for locstreams into different functions

Comment on lines 264 to 281
# -----------------------------------------
# testing locstream out
regridder = xe.Regridder(ds_in, ds_locs, 'bilinear', locstream_out=True)

indata = ds_in_chunked['data4D'].data
outdata = regridder(indata)

# clean-up
regridder.clean_weight_file()

# -----------------------------------------
# testing locstream in
regridder = xe.Regridder(ds_locs, ds_in, 'nearest_s2d', locstream_in=True)

outdata = regridder(ds_locs['lat'].data)

# clean-up
regridder.clean_weight_file()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate out the tests for locstreams into different functions

Comment on lines 225 to 236
# -----------------------------------------
# testing locstream in
regridder = xe.Regridder(ds_locs, ds_in, 'nearest_s2d', locstream_in=True)

outdata = regridder(ds_locs['lat'].values) # pure numpy array
dr_out = regridder(ds_locs['lat']) # xarray DataArray

# DataArray and numpy array should lead to the same result
assert_equal(outdata, dr_out.values)

# clean-up
regridder.clean_weight_file()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate out the tests for locstreams into different functions

@raphaeldussin
Copy link
Contributor Author

move the locstream tests into their own functions.

@JiaweiZhuang JiaweiZhuang merged commit 60cd934 into JiaweiZhuang:master Mar 6, 2020
@JiaweiZhuang
Copy link
Owner

Thanks @raphaeldussin! Added to 0.3.0 release.

@raphaeldussin raphaeldussin deleted the locstream branch July 25, 2020 16:08
aulemahal pushed a commit to Ouranosinc/xESMF that referenced this pull request May 18, 2021
locstream dim name consistent with ds_out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants