-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpolation of geo-referenced data #629
Comments
@fbriol this is great! Thanks for sharing! There has been a lot of discussion in our community about how best to provide this important functionality (see e.g. #356; where we discuss all the different packages related to "gridding"). Your experiments could be an important part of the solution. It would be awesome to explore using this prototype to build a custom indexer for Xarray (see pydata/xarray#475). |
Finding the nearest points is only part of the problem if you want to calculate an interpolation and not an extrapolation. For each point, once the nearest neighbors have been selected, it must be checked that the current point is within the polygon formed by its neighbors. I implemented this in my prototype and this verification takes little time with "boost." I don't think it's possible to do this control in a reasonable time with the KDTree tree. But maybe I'm wrong. %timeit mesh.idv(x.flatten(), y.flatten(), around=False, radius=18000, k=16, num_threads=12)
# 1.09 s ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit mesh.idv(x.flatten(), y.flatten(), around=True, radius=18000, k=16, num_threads=12)
# 1.11 s ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) I will try to write an indexer for xarray using my prototype. |
Hello, I didn't fully understand how the indexes work in "xarray,” but if you give me a complete example I could take a look at it. I have modified a large part of my interpolation software to extract the different indexers used: RTree (Cartesian & Geodetic), 1D-Axis. I have created a Conda package for the moment only for Linux, but I will make a Windows and OSX version tonight. You can try it. To install it: There is no need to search for the source code, the core library is a dynamic shared module written in C++. For the moment, the code is not public, but I could publish them if the library has a use. You will find an example of using different objects on the GIST |
@fbriol, have you looked into pyresample (https://pyresample.readthedocs.io/en/latest/#)? There might be some overlap with what you've presented here. |
Yes, I know this library, but like all the Python libraries I have used, they are not very efficient and do not handle longitude discontinuities well enough. The geographical trees used are KdTree. This structure is not very efficient when the number of points becomes large. The RTree structure is much more efficient and is not implemented in other libraries. Today, I use a library written entirely in C++ cross-platform that is very powerful and I am looking for ways to share it. A version is available on my anaconda channel if you want to try it : |
@jhamman thanks for the shout out. If @fbriol wants performance, then Verde is probably not the best solution yet. We're working on performance but it will never be as fast as a nearest-neighbor algorithm. The longitude problem is something we're trying to address as well (see fatiando/verde#173 and fatiando/verde#181). I'm curious to see what you come up with and having native support in xarray would be a boost to everyone who relies on it (Verde included). |
@fbriol, would be too early for you to talk about this (even briefly) at tomorrow meeting in Paris? |
Hi @fbriol, Thanks for this great library! It would be very useful for Parcels. However, a difference between Parcels and what you propose is that you want to interpolate from a cloud of points, while we interpolate using a mesh, that includes a topology. In this small notebook, I added the part that we would need in Parcels (we use structured curvilinear grids). Should we implement this extension in our own code or is it already present in your library? |
Can you send me or make available one of your files so that I can do a unit test? I will look at what can be done.
|
The mesh file is available here Thanks! |
Hello, I just looked at your code. I think what you're doing is too specific to introduce it into the library. On the other hand, you can significantly accelerate your code by modifying your approach: you do the research, for all your points, you challenge them all and you memorize the points that could not be processed. You restart the search for the points for which the interpolation did not work. Example: def find_cell(tree, lon, lat, x, y, k=16):
if k > 2000:
raise Exception(
'find_cell should request k=2000 nearest neighbours in the rtree query'
)
n, m = lon.shape
_, index = tree.query(np.asarray((x, y)).T, k=16, within=False)
result = np.empty((index.shape[0], 2))
# You can accelerate this loop with numba also by taking it out of this function.
for ix in range(index.shape[0]):
yi = (index[ix] / m).astype(np.int)
xi = (index[ix] % m).astype(np.int)
xi_final, yi_final, xsi, eta = loop(lon, lat, x[ix], y[ix], xi, yi, n, m)
if xi_final == -1:
result[ix, 0] = result[ix, 1] = np.nan
else:
result[ix, 0], result[ix, 1] = interpolate_lonlat(lon, lat, x[ix], y[ix], xi_final, yi_final, xsi, eta)
return result
def find_particles(n):
plon = np.random.uniform(-180.0, 180.0, n)
plat = np.random.uniform(-70.0, 88.0, n)
result = find_cell(tree, glon, glat, plon, plat)
idx = np.where(np.isnan(result[:, 0]))
# You restart the research on the undetermined points with k*2
# find_cell(tree, glon, glat, plon[idx], plat[idx]) This approach allows you to remove your main loop, which will speed things up. On my examples, I have to do two to three iterations to process your test set (32 and 64 neighbours). Then, you can accelerate your interpolation functions using loops using numba. For example: @numba.njit()
def interpolate_lonlat(lon, lat, x, y, xi, yi, xsi, eta):
'''bi-linear interpolation within cell [j:j+2, i:i+2] as a function of the relative coordinates
Here we simply interpolate the x, y coordinates, retrieving original coordinates
'''
phi = np.array(
[(1 - xsi) * (1 - eta), xsi * (1 - eta), xsi * eta, (1 - xsi) * eta])
px = np.array(
[lon[yi, xi], lon[yi, xi + 1], lon[yi + 1, xi + 1], lon[yi + 1, xi]])
px = np.where(px[:] - x > 180, px - 360, px)
px = np.where(px[:] - x < -180, px + 360, px)
py = np.array(
[lat[yi, xi], lat[yi, xi + 1], lat[yi + 1, xi + 1], lat[yi + 1, xi]])
#np.allclose([np.dot(phi, px), np.dot(phi, py)], [x, y])
return np.dot(phi, px), np.dot(phi, py)
@numba.njit()
def get_relative_coordinates(lon, lat, x, y, xi, yi):
'''returns relative coordinates xsi, eta
that are the coordinates of the (x, y) point remapped into a square cell [0,1] x [0,1]
'''
invA = np.array([
1.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, 0.0, -1.0, 0.0, 0.0, 1.0, 1.0,
-1.0, 1.0, -1.0
]).reshape(4, 4)
px = np.array(
[lon[yi, xi], lon[yi, xi + 1], lon[yi + 1, xi + 1], lon[yi + 1, xi]])
px = np.where(px[:] - x > 180, px - 360, px)
px = np.where(px[:] - x < -180, px + 360, px)
py = np.array(
[lat[yi, xi], lat[yi, xi + 1], lat[yi + 1, xi + 1], lat[yi + 1, xi]])
a = np.dot(invA, px)
b = np.dot(invA, py)
aa = a[3] * b[2] - a[2] * b[3]
bb = a[3] * b[0] - a[0] * b[3] + a[1] * b[2] - a[2] * b[1] + x * b[
3] - y * a[3]
cc = a[1] * b[0] - a[0] * b[1] + x * b[1] - y * a[1]
if abs(aa) < 1e-12: # Rectilinear cell, or quasi
eta = -cc / bb
else:
det2 = bb * bb - 4 * aa * cc
if det2 > 0: # so, if det is nan we keep the xsi, eta from previous iter
det = np.sqrt(det2)
eta = (-bb + det) / (2 * aa)
else: # should not happen, apart from singularities
eta = 1e6
if abs(a[1] + a[3] *
eta) < 1e-12: # this happens when recti cell rotated of 90deg
xsi = ((y - py[0]) / (py[1] - py[0]) + (y - py[3]) /
(py[2] - py[3])) * .5
else:
xsi = (x - a[0] - a[2] * eta) / (a[1] + a[3] * eta)
return (xsi, eta) Let me know if it helps you. |
Hello, I have finalized a first test version of the interpolation library for geo-referenced data. If you want to try it, you can find information here. The goal is to make a low-level library for interpolations in order to connect other more advanced libraries. This version is far from perfect, but it is a first draft to continue the discussion on the real stuff. |
Thanks Frédéric, for both the library and the cleaning of my small script! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
To my knowledge, there is no library that can interpolate geo-referenced data in Python simply and very quickly. This is even more true for a mesh (cloud of points), representing a geophysical value (dynamic height of the oceans for example).
Cartesian Grid
For Cartesian grids, numpy, offers RegularGridInterpolator which can be adapted to process geophysical data. The difficulty here lies in correctly processing the discontinuity in longitude on a global numerical grid. One of the methods I have tried, to overcome this problem, is to duplicate the first column of the matrix, after the last one in order to properly manage the transition around the Prime meridian. It works, but the performance is not so good. In the code below, we do not manage the discontinuity of longitudes.
By comparison, a method developed in C++ with a Python binding can resolve this problem in 46 ms in total on a single thread in a transparent way (management of the discontinuity of longitudes). As this problem is easily parallelizable, the same execution can be performed in only 15 ms with 12 threads. Here, the C++ routine is 2 to 4.5 faster.
Unstructured grids
For unstructured grids, I used the KDTree tree, available in Scipy. This object can handle the problem, as long as the number of points is not important. Here is the Python code, which I used to interpolate my mesh.
This code, works, manages the discontinuity in longitude. But if the problem size grows (I tried to interpolate a mesh of 74 649 600 points), the algorithm no longer works, at least within a reasonable period of time (a few hours).
I coded an interpolator using the R-Tree implemented in the C++/boost library. And for the same problem size, the time elapsed to solve the problem with this new library is amazing (11 seconds, to build the tree, against more than 30 minutes with the cKDtree library) and 4.5 seconds to perform the interpolation of a numerical grid of 2880*1440 points.
Could the small library I wrote be of any interest to your community? Have you been able to solve this same type of problem easily and efficiently using another method/library that I will not know?
The text was updated successfully, but these errors were encountered: