Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add a custom indexer. #2986

Closed
fbriol opened this issue May 24, 2019 · 4 comments
Closed

How to add a custom indexer. #2986

fbriol opened this issue May 24, 2019 · 4 comments

Comments

@fbriol
Copy link
Contributor

fbriol commented May 24, 2019

Hello,

I have written a set of indexers for 1D, 2D and 3D geodetic and Cartesian data (up to 5 dimensions for Cartesian data).

I used the Boost/C++ library to write the multidimensional data search algorithm. This tree (R*Tree) is impressive for its performance. It can be built in a few seconds with several million points and made requests for a few seconds with several million points.

import numpy as np
# Install it with conda, if you want, only for python3.7: conda install pyindex -c fbriol
import pyindex.core as core

lon = np.random.uniform(-180.0, 180.0, 2048*4096)
lat = np.random.uniform(-90.0, 90.0, 2048*4096)
# You can not set an altitude if it is not necessary.
alt = np.random.uniform(-10000, 100000, 2048*4096)
# WGS system used
system = core.geodetic.System()
# RTree
tree = core.geodetic.RTree(system)
%timeit tree.packing(np.asarray((lon, lat, alt)).T)
# 3.84 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
coordinates = np.asarray((
    np.random.uniform(-180.0, 180.0, 10000),
    np.random.uniform(-90.0, 90.0, 10000),
    np.random.uniform(-10000, 100000, 10000))).T
%timeit tree.query(coordinates)
# 18 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I'm trying to use these indexes with Xarray, but I didn't quite understand how to interface with xarray.

Is there anyone who could explain to me how to write my own indexer to test these indexers with xarray? Thank you in advance.

@benbovy
Copy link
Member

benbovy commented May 24, 2019

Hi @fbriol,

It would be indeed really nice to add custom indexes to xarray like R*Tree! Unfortunately, this isn't supported right now, but it's one of the main points in xarray's development roadmap.

@shoyer has started working on this in #2195.

@benbovy
Copy link
Member

benbovy commented May 24, 2019

See also #1603 where general discussions happen on this topic.

@fbriol
Copy link
Contributor Author

fbriol commented May 24, 2019

Thank you, for the information, I will try to interface my classes with this information.
I saw in the conversation that one of the problems was the serialization of this information in NetCDF files. My class allows you to serialize the tree:

In [5]: coordinates = np.asarray(( 
                np.random.uniform(-180.0, 180.0, 4), 
                np.random.uniform(-90.0, 90.0, 4), 
                np.random.uniform(-10000, 100000, 4))).T 
In [6]: tree.packing(coordinates) 
In [7]:tree.__getstate__()                                                                                                                                                                                 
Out[7]:
((6378137.0, 0.0033528106647474805),
array([[   22759.57572379,    79992.43969068, -6418971.88026264],
       [  170101.10528328,  -718657.28577825,  6402886.08678261],
       [-1385601.77565369,   1608787.7370298,  6095481.97179018],
       [-6272786.11583145,    12746.83764378, -1461257.51500618]]))

It can be structured in a NetCDF file. Thanks again for the information.

@benbovy
Copy link
Member

benbovy commented Aug 23, 2023

@fbriol much progress has been made on this front since this issue was submitted.

To create a custom Xarray index wrapping your R*Tree, you could have a look at the following resources:

FYI I also plan to update https://github.com/xarray-contrib/xoak soon so that the tree-based indexes that it implements reuse the recent xarray.indexes.Index feature. I think that your R*Tree example would nicely fit into that package! It would probably require that your pyindex package is available on conda-forge.

Regarding the (de)serialization of the tree, I think that you could write some utility functions (accessible, e.g., via a DataArray / Dataset accessor) that:

  • (for serialization) create coordinate variables and/or attributes from the tree state
  • (for de-serialization) create a new xarray.Coordinates object (xarray coordinates + indexes) from the serialized tree state.

I'm going to close this issue since Xarray now supports adding such custom index. Feel free to open a new issue or discussion here if you find a bug or some information missing. Or feel free to open an issue in the xoak repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants