Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for additional scipy nd interpolants #9599

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

hollymandel
Copy link
Contributor

@hollymandel hollymandel commented Oct 9, 2024

  • Closes follow upstream scipy interpolation improvements #7704
  • Adds support for n dimensional tensor product interpolants slinear, cubic, quintic, and pchip.
  • Updates documentation of DataArray/Dataset.interp and DataArray/Dataset.interp_like to reflect updates to scipy interpolants called.
  • Adds argument reduce to allow disabling of reduction of interpolation along independent dimensions (see DataArray.interp() : poor performance #2223). Prior to this the behavior was always reduce=True. However for certain n-dimensional interpolants this will change the mathematical behavior of the interpolation (I think), so I have added an option for users to turn this off.

Outstanding uncertainty: dask/chunking behavior? Nothing about this PR touches the chunking of interpolation but I am a bit worried about this. Many interpolants, including those previously supported in one dimension, require nearby/all points from the original coordinate, and also don't parallelize well over the new coordinate. I'm not quite sure how this is currently being handled.

@hollymandel hollymandel force-pushed the issue_7704 branch 8 times, most recently from b8e4c50 to 34fb497 Compare October 11, 2024 19:09
@hollymandel hollymandel marked this pull request as ready for review October 11, 2024 19:14
@headtr1ck
Copy link
Collaborator

Thanks for this contribution.

Some remarks:

What is the minimal scipy version that is required for these new interpolators? Is it aligned with the minimum version for xarray or do we need some checks for that?

You have added several formatting changes to totally unrelated parts that make it more difficult to review then necessary. Please try to avoid that.

Doc build failed, have to check if random or related to your docstrings.

@hollymandel hollymandel force-pushed the issue_7704 branch 2 times, most recently from 4a26a7c to 34fb497 Compare October 14, 2024 15:18
@hollymandel
Copy link
Contributor Author

Thanks for the comments!

What is the minimal scipy version that is required for these new interpolators? Is it aligned with the minimum version for xarray or do we need some checks for that?

I'm seeing 1.11 in ci/requirements/min-all-deps.yml and these interpolants were added in 1.10 so I think we're good.

You have added several formatting changes to totally unrelated parts that make it more difficult to review then necessary. Please try to avoid that.

Oops, reverted, sorry!

@hollymandel hollymandel force-pushed the issue_7704 branch 2 times, most recently from 33f52ec to 211984d Compare October 14, 2024 15:49
@headtr1ck
Copy link
Collaborator

Also, I think the name "reduce" is not very intuitive, but I fail to come up with an alternative. "decompose" seems weird out of context. Does anyone have a suggestion?

@hollymandel hollymandel force-pushed the issue_7704 branch 2 times, most recently from cf8fa9f to 721695d Compare October 17, 2024 20:38
@hollymandel
Copy link
Contributor Author

Also, I think the name "reduce" is not very intuitive, but I fail to come up with an alternative. "decompose" seems weird out of context. Does anyone have a suggestion?

Perhaps the _interp makes decompose_interp sound less weird? Separate or split also sound OK to me.

xarray/core/missing.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
@dcherian
Copy link
Contributor

Many interpolants, including those previously supported in one dimension, require nearby/all points from the original coordinate, and also don't parallelize well over the new coordinate. I'm not quite sure how this is currently being handled.

I think we end up rechunking to a single in a very roundabout way. I don't think you need to be too concerned about it for this PR. It does need an overhaul though

# grid -> grid
coords["xdestnp"] = np.linspace(0.1, 1.0, 11)
coords["ydestnp"] = np.linspace(0.0, 0.2, 10)
coords["zdestnp"] = da.get_index("z") # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coords["zdestnp"] = da.get_index("z") # type: ignore[assignment]
coords["zdestnp"] = da.z.data

Is this equivalent?

# linear interpolation is separateable
expected = da.interp(x=xdestnp, method="linear")
expected = expected.interp(y=ydestnp, method="linear")
# `method` is separable
Copy link
Contributor

@dcherian dcherian Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is pchip really separable too? We could gain some confidence here by constructing expected using scipy directly with "xdestnp" and "ydestnp"

Copy link
Contributor Author

@hollymandel hollymandel Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experiments suggest only cubic and quintic (or other splines with order > 1) are "truly multidimensional." I'm interpreting errors on the order of 1e-16 relative to the SD of the expected function as machine error, but errors on the order of 1e-5 as true errors.

To me it feels uncomfortable to operate "empirically" like this, but digging through the formalism behind these interpolants would be a big lift and I'm not able to take it on right now. I sort of wish the separation optimization didn't exist and we could just provide a transparent interface to SciPy, but I guess this is insufficiently considerate to the user.

import scipy
import xarray as xr
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sb

orig_x = np.linspace(0,2,10)
orig_y = np.linspace(0,1,10)
orig_f = np.random.randn(10,10)

fine_x = np.linspace(0,2,200)
fine_y = np.linspace(0,1,100)
mesh_x, mesh_y = np.meshgrid(fine_x, fine_y)
all_points = np.asarray([x for x in zip(mesh_x.ravel(), mesh_y.ravel())])
new_x_da = xr.DataArray(data = np.transpose(mesh_x), dims = ["x","y"], coords = {"x": fine_x, "y": fine_y})
new_y_da = xr.DataArray(data = np.transpose(mesh_y), dims = ["x","y"], coords = {"x": fine_x, "y": fine_y})

mtds = ["linear", "nearest", "slinear", "cubic", "quintic", "pchip"]
for mtd in mtds:
    result_interpn = scipy.interpolate.interpn(points = (orig_x, orig_y), values = orig_f, xi = all_points, method = mtd).reshape(200,100, order="F")
    result_rgi = scipy.interpolate.RegularGridInterpolator(points = (orig_x, orig_y), values = orig_f, method= mtd)(all_points).reshape(200,100, order="F")
    result_xr = xr.DataArray(data = orig_f, dims = ["x","y"], coords = {"x": orig_x, "y": orig_y}).interp(method= mtd, x = new_x_da, y = new_y_da)
    
    if mtd == "pchip":
        result_sequential = scipy.interpolate.PchipInterpolator(orig_y, scipy.interpolate.PchipInterpolator(orig_x, orig_f, axis = 0)(fine_x), axis = 1)(fine_y)
        result_sequential_2 = scipy.interpolate.PchipInterpolator(orig_x, scipy.interpolate.PchipInterpolator(orig_y, orig_f, axis = 1)(fine_y), axis = 0)(fine_x)
    else:
        if mtd == "quintic":
            mtd = 5
        result_sequential = scipy.interpolate.interp1d(orig_y, scipy.interpolate.interp1d(orig_x, orig_f, axis = 0, kind = mtd)(fine_x), axis = 1, kind = mtd)(fine_y)
        result_sequential_2 = scipy.interpolate.interp1d(orig_x, scipy.interpolate.interp1d(orig_y, orig_f, axis = 1, kind = mtd)(fine_y), axis = 0, kind = mtd)(fine_x)


    fig, axs = plt.subplots(nrows = 2, ncols = 2, figsize = [15,12])

    fig.suptitle(f"method = {mtd}")
    sb.heatmap(result_sequential_2-result_interpn, ax = axs[0,0])
    axs[0,0].set_title("sequential versus scipy.interpn")
    
    sb.heatmap(result_sequential_2-result_rgi, ax = axs[1,0])
    axs[1,0].set_title("sequential versus RegularGridInterpolator")
    
    sb.heatmap(result_sequential_2-result_xr, ax = axs[0,1])
    axs[0,1].set_title("sequential versus xarray.DataArray.interp()")
    
    sb.heatmap(result_sequential - result_sequential_2, ax = axs[1,1])
    axs[1,1].set_title("sequential versus reverse-order sequential")

# decompose the interpolation into a succession of independent interpolation
for indep_indexes_coords in decompose_interp(indexes_coords):

if method in ["linear", "nearest"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test seems to suggest that "pchip" and "slinear" are acceptable too

assert_allclose(actual.transpose("y", "z"), expected)


@requires_scipy
def test_interpolate_nd_nd() -> None:
@pytest.mark.parametrize("method", ("cubic", "quintic"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test all methods supported by interpn here? “linear”, “nearest”, “slinear”, “cubic”, “quintic”, “pchip”

Basically the "separation" is an implementation detail. We just want to be sure we get correct answers regardless of that detail.

@pytest.mark.parametrize(
"method", ["linear", "nearest", "zero", "slinear", "quadratic", "cubic"]
)
@pytest.mark.parametrize("method", ["linear", "nearest"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how you're choosing which methods to paramterize over?

Perhaps we should add some global constants that can be used repeatedly

INTERP_METHODS_SEPARABLE=["linear", "nearest", "slinear"]
INTERP_METHODS_INSEPARABLE = ...
INTERP_METHODS_SUBSET = ["linear", "cubic"]

xarray/core/dataarray.py Outdated Show resolved Hide resolved

See Also
--------
scipy.interpolate.interp1d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to preserve these scipy references too.

@dcherian
Copy link
Contributor

Thanks @hollymandel this is looking quite close. I only have a few small refinement requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

follow upstream scipy interpolation improvements
3 participants