-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for additional scipy nd interpolants #9599
base: main
Are you sure you want to change the base?
Conversation
b8e4c50
to
34fb497
Compare
8568ca4
to
1def6d0
Compare
Thanks for this contribution. Some remarks: What is the minimal scipy version that is required for these new interpolators? Is it aligned with the minimum version for xarray or do we need some checks for that? You have added several formatting changes to totally unrelated parts that make it more difficult to review then necessary. Please try to avoid that. Doc build failed, have to check if random or related to your docstrings. |
4a26a7c
to
34fb497
Compare
Thanks for the comments!
I'm seeing
Oops, reverted, sorry! |
33f52ec
to
211984d
Compare
Also, I think the name "reduce" is not very intuitive, but I fail to come up with an alternative. "decompose" seems weird out of context. Does anyone have a suggestion? |
cf8fa9f
to
721695d
Compare
Perhaps the |
c7c71ad
to
d1ac5be
Compare
I think we end up rechunking to a single in a very roundabout way. I don't think you need to be too concerned about it for this PR. It does need an overhaul though |
2b128a2
to
047ee75
Compare
# grid -> grid | ||
coords["xdestnp"] = np.linspace(0.1, 1.0, 11) | ||
coords["ydestnp"] = np.linspace(0.0, 0.2, 10) | ||
coords["zdestnp"] = da.get_index("z") # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coords["zdestnp"] = da.get_index("z") # type: ignore[assignment] | |
coords["zdestnp"] = da.z.data |
Is this equivalent?
# linear interpolation is separateable | ||
expected = da.interp(x=xdestnp, method="linear") | ||
expected = expected.interp(y=ydestnp, method="linear") | ||
# `method` is separable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is pchip
really separable too? We could gain some confidence here by constructing expected using scipy
directly with "xdestnp"
and "ydestnp"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experiments suggest only cubic and quintic (or other splines with order > 1) are "truly multidimensional." I'm interpreting errors on the order of 1e-16 relative to the SD of the expected function as machine error, but errors on the order of 1e-5 as true errors.
To me it feels uncomfortable to operate "empirically" like this, but digging through the formalism behind these interpolants would be a big lift and I'm not able to take it on right now. I sort of wish the separation optimization didn't exist and we could just provide a transparent interface to SciPy, but I guess this is insufficiently considerate to the user.
import scipy
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
orig_x = np.linspace(0,2,10)
orig_y = np.linspace(0,1,10)
orig_f = np.random.randn(10,10)
fine_x = np.linspace(0,2,200)
fine_y = np.linspace(0,1,100)
mesh_x, mesh_y = np.meshgrid(fine_x, fine_y)
all_points = np.asarray([x for x in zip(mesh_x.ravel(), mesh_y.ravel())])
new_x_da = xr.DataArray(data = np.transpose(mesh_x), dims = ["x","y"], coords = {"x": fine_x, "y": fine_y})
new_y_da = xr.DataArray(data = np.transpose(mesh_y), dims = ["x","y"], coords = {"x": fine_x, "y": fine_y})
mtds = ["linear", "nearest", "slinear", "cubic", "quintic", "pchip"]
for mtd in mtds:
result_interpn = scipy.interpolate.interpn(points = (orig_x, orig_y), values = orig_f, xi = all_points, method = mtd).reshape(200,100, order="F")
result_rgi = scipy.interpolate.RegularGridInterpolator(points = (orig_x, orig_y), values = orig_f, method= mtd)(all_points).reshape(200,100, order="F")
result_xr = xr.DataArray(data = orig_f, dims = ["x","y"], coords = {"x": orig_x, "y": orig_y}).interp(method= mtd, x = new_x_da, y = new_y_da)
if mtd == "pchip":
result_sequential = scipy.interpolate.PchipInterpolator(orig_y, scipy.interpolate.PchipInterpolator(orig_x, orig_f, axis = 0)(fine_x), axis = 1)(fine_y)
result_sequential_2 = scipy.interpolate.PchipInterpolator(orig_x, scipy.interpolate.PchipInterpolator(orig_y, orig_f, axis = 1)(fine_y), axis = 0)(fine_x)
else:
if mtd == "quintic":
mtd = 5
result_sequential = scipy.interpolate.interp1d(orig_y, scipy.interpolate.interp1d(orig_x, orig_f, axis = 0, kind = mtd)(fine_x), axis = 1, kind = mtd)(fine_y)
result_sequential_2 = scipy.interpolate.interp1d(orig_x, scipy.interpolate.interp1d(orig_y, orig_f, axis = 1, kind = mtd)(fine_y), axis = 0, kind = mtd)(fine_x)
fig, axs = plt.subplots(nrows = 2, ncols = 2, figsize = [15,12])
fig.suptitle(f"method = {mtd}")
sb.heatmap(result_sequential_2-result_interpn, ax = axs[0,0])
axs[0,0].set_title("sequential versus scipy.interpn")
sb.heatmap(result_sequential_2-result_rgi, ax = axs[1,0])
axs[1,0].set_title("sequential versus RegularGridInterpolator")
sb.heatmap(result_sequential_2-result_xr, ax = axs[0,1])
axs[0,1].set_title("sequential versus xarray.DataArray.interp()")
sb.heatmap(result_sequential - result_sequential_2, ax = axs[1,1])
axs[1,1].set_title("sequential versus reverse-order sequential")
# decompose the interpolation into a succession of independent interpolation | ||
for indep_indexes_coords in decompose_interp(indexes_coords): | ||
|
||
if method in ["linear", "nearest"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test seems to suggest that "pchip"
and "slinear"
are acceptable too
assert_allclose(actual.transpose("y", "z"), expected) | ||
|
||
|
||
@requires_scipy | ||
def test_interpolate_nd_nd() -> None: | ||
@pytest.mark.parametrize("method", ("cubic", "quintic")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to test all methods supported by interpn here? “linear”, “nearest”, “slinear”, “cubic”, “quintic”, “pchip”
Basically the "separation" is an implementation detail. We just want to be sure we get correct answers regardless of that detail.
@pytest.mark.parametrize( | ||
"method", ["linear", "nearest", "zero", "slinear", "quadratic", "cubic"] | ||
) | ||
@pytest.mark.parametrize("method", ["linear", "nearest"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear how you're choosing which methods to paramterize over?
Perhaps we should add some global constants that can be used repeatedly
INTERP_METHODS_SEPARABLE=["linear", "nearest", "slinear"]
INTERP_METHODS_INSEPARABLE = ...
INTERP_METHODS_SUBSET = ["linear", "cubic"]
|
||
See Also | ||
-------- | ||
scipy.interpolate.interp1d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to preserve these scipy references too.
Thanks @hollymandel this is looking quite close. I only have a few small refinement requests. |
slinear
,cubic
,quintic
, andpchip
.reduce
to allow disabling of reduction of interpolation along independent dimensions (see DataArray.interp() : poor performance #2223). Prior to this the behavior was alwaysreduce=True
. However for certain n-dimensional interpolants this will change the mathematical behavior of the interpolation (I think), so I have added an option for users to turn this off.Outstanding uncertainty: dask/chunking behavior? Nothing about this PR touches the chunking of interpolation but I am a bit worried about this. Many interpolants, including those previously supported in one dimension, require nearby/all points from the original coordinate, and also don't parallelize well over the new coordinate. I'm not quite sure how this is currently being handled.