Use quadtree-based point-in-polygon for `.contains` binops. #845

thomcom · 2022-12-06T17:24:43Z

Depends on #834
Depends on #839
Closes #844

Description

This PR implements quadtree-based point-in-polygon for geometric binary operations.

I added colinearity tests to the quadtree-based is_point_in_polygon function, and changed the brute-force pip to the quadtree approach.

The tests as written fail on one test_float_precision_limits test. I am assuming this is because the algorithm for is_point_in_polygon is slightly different than the version used in brute force. I haven't validated this yet.

The tests haven't been updated yet to use more than 31 features. This is for two reasons that I'm researching:

About 2% of the randomly generated polygon.contains(linestring) comparisons produce different results than GeoPandas. I am assuming this is due to the discrepancy between .contains and .contains_properly, but I haven't validated it yet. This is a serious issue and prevents me from integrating the quadtree approach.
The performance of the quadtree-based point-in-polygon on 10k polygons and 10k linestrings is notably slower than GeoPandas. This is an even more serious issue than the first.

Quadtree-based point-in-polygon is slower than GeoPandas on a 10k set and also produces inconsistent results.

Presently we have three possible implementations of point-in-polygon: brute force, quadtree, and the unfinished brute-force with unlimited polygons.

At the present time, brute-force is implemented in PR #839 and #834. While these implementations are valid, because of the 31 polygon limitation, all of the pip-based binops are toy unit tests that only compare 31 features against each other. This will not provide a meaningful motivation to use GPU acceleration with cuSpatial until we can support more than 31 features. We can't justify using the quadtree-based approach until more performance benchmarks are made. The question is at what point does it become more efficient to use cuSpatial than GeoPandas for binary operations that depend on point-in-polygon?

Once quadtree-based point-in-polygon performance has been characterized, we'll need to decide if we should continue with the quadtree approach, and I will dig into fixing the existing discrepancies, or if I should finish the unlimited polygons brute force PR #754. That PR also has limitations, as the existing implementation has a substantial feature limit as well. $\sqrt{2^{32}}$ is the row-limit for pairwise, polygon-unlimited point in polygon, since $\sqrt{2^{32}}$ polygons measured against $\sqrt{2^{32}}$ points produces a single column that is at the length limit for cudf.

Based on the observation that quadtree-based pip with only 10k features is slower than GeoPandas, it isn't likely that 65k features will be much faster. Brute-force pip on 65k features may be faster than GeoPandas, but given that with GeoPandas the runtime on 65k features is still only a few seconds or less, it isn't a compelling use-case. We need faster performance on larger datasets, and support for larger numbers of features.

This means that benchmarking quadtree-based point-in-polygon is currently the highest priority, to identify if there is a large enough dataset to make GPU backed computation compelling with the current algorithms.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

thomcom added 13 commits December 1, 2022 14:23

Start refactoring binops to handle various ops and inputs.

67a2a6a

Abstraction working with binops that depend on pip.

3038253

Clean it up in GeoSeries.

d7a842b

Correct abstraction for pip_only tests.

3301b95

Support the seven extra pip tests.

f6688d7

Support all supportable pip calls.

77b6858

Remove unused abstraction.

c69d2e6

Clean up binops.intersects docs.

de923c5

Pass all equals tests except one.

67b0df6

Merge

b4ea5e5

Finish abstracting pip based ops.

305f937

Pass all multi tests.

707217b

Change pip implementation to use quadtree.

d40b637

thomcom added 2 - In Progress Currenty a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 6, 2022

thomcom added this to the DE-9IM milestone Dec 6, 2022

thomcom self-assigned this Dec 6, 2022

github-actions bot added libcuspatial Relates to the cuSpatial C++ library Python Related to Python code labels Dec 6, 2022

thomcom closed this Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use quadtree-based point-in-polygon for `.contains` binops. #845

Use quadtree-based point-in-polygon for `.contains` binops. #845

thomcom commented Dec 6, 2022 •

edited

Loading

Use quadtree-based point-in-polygon for .contains binops. #845

Use quadtree-based point-in-polygon for .contains binops. #845

Conversation

thomcom commented Dec 6, 2022 • edited Loading

Description

Checklist

Use quadtree-based point-in-polygon for `.contains` binops. #845

Use quadtree-based point-in-polygon for `.contains` binops. #845

thomcom commented Dec 6, 2022 •

edited

Loading