support for additional scipy nd interpolants #9599

hollymandel · 2024-10-09T18:15:24Z

Closes follow upstream scipy interpolation improvements #7704
Adds support for n dimensional tensor product interpolants slinear, cubic, quintic, and pchip.
Updates documentation of DataArray/Dataset.interp and DataArray/Dataset.interp_like to reflect updates to scipy interpolants called.
Adds argument reduce to allow disabling of reduction of interpolation along independent dimensions (see DataArray.interp() : poor performance #2223). Prior to this the behavior was always reduce=True. However for certain n-dimensional interpolants this will change the mathematical behavior of the interpolation (I think), so I have added an option for users to turn this off.

Outstanding uncertainty: dask/chunking behavior? Nothing about this PR touches the chunking of interpolation but I am a bit worried about this. Many interpolants, including those previously supported in one dimension, require nearby/all points from the original coordinate, and also don't parallelize well over the new coordinate. I'm not quite sure how this is currently being handled.

headtr1ck · 2024-10-13T17:32:57Z

Thanks for this contribution.

Some remarks:

What is the minimal scipy version that is required for these new interpolators? Is it aligned with the minimum version for xarray or do we need some checks for that?

You have added several formatting changes to totally unrelated parts that make it more difficult to review then necessary. Please try to avoid that.

Doc build failed, have to check if random or related to your docstrings.

hollymandel · 2024-10-14T15:20:07Z

Thanks for the comments!

What is the minimal scipy version that is required for these new interpolators? Is it aligned with the minimum version for xarray or do we need some checks for that?

I'm seeing 1.11 in ci/requirements/min-all-deps.yml and these interpolants were added in 1.10 so I think we're good.

You have added several formatting changes to totally unrelated parts that make it more difficult to review then necessary. Please try to avoid that.

Oops, reverted, sorry!

headtr1ck · 2024-10-14T16:08:39Z

Also, I think the name "reduce" is not very intuitive, but I fail to come up with an alternative. "decompose" seems weird out of context. Does anyone have a suggestion?

hollymandel · 2024-10-17T20:45:25Z

Also, I think the name "reduce" is not very intuitive, but I fail to come up with an alternative. "decompose" seems weird out of context. Does anyone have a suggestion?

Perhaps the _interp makes decompose_interp sound less weird? Separate or split also sound OK to me.

xarray/core/missing.py

xarray/core/dataarray.py

xarray/core/missing.py

dcherian · 2024-10-21T21:11:25Z

Many interpolants, including those previously supported in one dimension, require nearby/all points from the original coordinate, and also don't parallelize well over the new coordinate. I'm not quite sure how this is currently being handled.

I think we end up rechunking to a single in a very roundabout way. I don't think you need to be too concerned about it for this PR. It does need an overhaul though

…ot be 1d decomposed

dcherian · 2024-10-23T17:09:32Z

xarray/tests/test_interp.py

+    # grid -> grid
+    coords["xdestnp"] = np.linspace(0.1, 1.0, 11)
+    coords["ydestnp"] = np.linspace(0.0, 0.2, 10)
+    coords["zdestnp"] = da.get_index("z")  # type: ignore[assignment]


Suggested change

coords["zdestnp"] = da.get_index("z") # type: ignore[assignment]

coords["zdestnp"] = da.z.data

Is this equivalent?

dcherian · 2024-10-23T17:11:46Z

xarray/tests/test_interp.py

-    # linear interpolation is separateable
-    expected = da.interp(x=xdestnp, method="linear")
-    expected = expected.interp(y=ydestnp, method="linear")
+    # `method` is separable


is pchip really separable too? We could gain some confidence here by constructing expected using scipy directly with "xdestnp" and "ydestnp"

My experiments suggest only cubic and quintic (or other splines with order > 1) are "truly multidimensional." I'm interpreting errors on the order of 1e-16 relative to the SD of the expected function as machine error, but errors on the order of 1e-5 as true errors.

To me it feels uncomfortable to operate "empirically" like this, but digging through the formalism behind these interpolants would be a big lift and I'm not able to take it on right now. I sort of wish the separation optimization didn't exist and we could just provide a transparent interface to SciPy, but I guess this is insufficiently considerate to the user.

import scipy import xarray as xr import numpy as np import matplotlib.pyplot as plt import seaborn as sb orig_x = np.linspace(0,2,10) orig_y = np.linspace(0,1,10) orig_f = np.random.randn(10,10) fine_x = np.linspace(0,2,200) fine_y = np.linspace(0,1,100) mesh_x, mesh_y = np.meshgrid(fine_x, fine_y) all_points = np.asarray([x for x in zip(mesh_x.ravel(), mesh_y.ravel())]) new_x_da = xr.DataArray(data = np.transpose(mesh_x), dims = ["x","y"], coords = {"x": fine_x, "y": fine_y}) new_y_da = xr.DataArray(data = np.transpose(mesh_y), dims = ["x","y"], coords = {"x": fine_x, "y": fine_y}) mtds = ["linear", "nearest", "slinear", "cubic", "quintic", "pchip"] for mtd in mtds: result_interpn = scipy.interpolate.interpn(points = (orig_x, orig_y), values = orig_f, xi = all_points, method = mtd).reshape(200,100, order="F") result_rgi = scipy.interpolate.RegularGridInterpolator(points = (orig_x, orig_y), values = orig_f, method= mtd)(all_points).reshape(200,100, order="F") result_xr = xr.DataArray(data = orig_f, dims = ["x","y"], coords = {"x": orig_x, "y": orig_y}).interp(method= mtd, x = new_x_da, y = new_y_da) if mtd == "pchip": result_sequential = scipy.interpolate.PchipInterpolator(orig_y, scipy.interpolate.PchipInterpolator(orig_x, orig_f, axis = 0)(fine_x), axis = 1)(fine_y) result_sequential_2 = scipy.interpolate.PchipInterpolator(orig_x, scipy.interpolate.PchipInterpolator(orig_y, orig_f, axis = 1)(fine_y), axis = 0)(fine_x) else: if mtd == "quintic": mtd = 5 result_sequential = scipy.interpolate.interp1d(orig_y, scipy.interpolate.interp1d(orig_x, orig_f, axis = 0, kind = mtd)(fine_x), axis = 1, kind = mtd)(fine_y) result_sequential_2 = scipy.interpolate.interp1d(orig_x, scipy.interpolate.interp1d(orig_y, orig_f, axis = 1, kind = mtd)(fine_y), axis = 0, kind = mtd)(fine_x) fig, axs = plt.subplots(nrows = 2, ncols = 2, figsize = [15,12]) fig.suptitle(f"method = {mtd}") sb.heatmap(result_sequential_2-result_interpn, ax = axs[0,0]) axs[0,0].set_title("sequential versus scipy.interpn") sb.heatmap(result_sequential_2-result_rgi, ax = axs[1,0]) axs[1,0].set_title("sequential versus RegularGridInterpolator") sb.heatmap(result_sequential_2-result_xr, ax = axs[0,1]) axs[0,1].set_title("sequential versus xarray.DataArray.interp()") sb.heatmap(result_sequential - result_sequential_2, ax = axs[1,1]) axs[1,1].set_title("sequential versus reverse-order sequential")

dcherian · 2024-10-23T17:12:14Z

xarray/core/missing.py

-    # decompose the interpolation into a succession of independent interpolation
-    for indep_indexes_coords in decompose_interp(indexes_coords):
+
+    if method in ["linear", "nearest"]:


The test seems to suggest that "pchip" and "slinear" are acceptable too

dcherian · 2024-10-23T17:16:19Z

xarray/tests/test_interp.py

    assert_allclose(actual.transpose("y", "z"), expected)


 @requires_scipy
-def test_interpolate_nd_nd() -> None:
+@pytest.mark.parametrize("method", ("cubic", "quintic"))


Is it possible to test all methods supported by interpn here? “linear”, “nearest”, “slinear”, “cubic”, “quintic”, “pchip”

Basically the "separation" is an implementation detail. We just want to be sure we get correct answers regardless of that detail.

xarray/tests/test_interp.py

dcherian · 2024-10-23T17:18:21Z

xarray/tests/test_interp.py

-@pytest.mark.parametrize(
-    "method", ["linear", "nearest", "zero", "slinear", "quadratic", "cubic"]
-)
+@pytest.mark.parametrize("method", ["linear", "nearest"])


It's not clear how you're choosing which methods to paramterize over?

Perhaps we should add some global constants that can be used repeatedly

INTERP_METHODS_SEPARABLE=["linear", "nearest", "slinear"] INTERP_METHODS_INSEPARABLE = ... INTERP_METHODS_SUBSET = ["linear", "cubic"]

xarray/core/dataarray.py

dcherian · 2024-10-23T17:19:16Z

xarray/core/dataarray.py


        See Also
        --------
-        scipy.interpolate.interp1d


would be good to preserve these scipy references too.

dcherian · 2024-10-23T17:19:49Z

Thanks @hollymandel this is looking quite close. I only have a few small refinement requests.

hollymandel force-pushed the issue_7704 branch 8 times, most recently from b8e4c50 to 34fb497 Compare October 11, 2024 19:09

hollymandel marked this pull request as ready for review October 11, 2024 19:14

hollymandel force-pushed the issue_7704 branch from 8568ca4 to 1def6d0 Compare October 12, 2024 02:14

hollymandel force-pushed the issue_7704 branch 2 times, most recently from 4a26a7c to 34fb497 Compare October 14, 2024 15:18

hollymandel force-pushed the issue_7704 branch 2 times, most recently from 33f52ec to 211984d Compare October 14, 2024 15:49

hollymandel force-pushed the issue_7704 branch 2 times, most recently from cf8fa9f to 721695d Compare October 17, 2024 20:38

hollymandel force-pushed the issue_7704 branch from c7c71ad to d1ac5be Compare October 17, 2024 21:00