Use quadtree-based point-in-polygon for .contains
binops.
#845
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Depends on #834
Depends on #839
Closes #844
Description
This PR implements quadtree-based point-in-polygon for geometric binary operations.
I added colinearity tests to the quadtree-based is_point_in_polygon function, and changed the brute-force pip to the quadtree approach.
The tests as written fail on one
test_float_precision_limits
test. I am assuming this is because the algorithm foris_point_in_polygon
is slightly different than the version used in brute force. I haven't validated this yet.The tests haven't been updated yet to use more than 31 features. This is for two reasons that I'm researching:
polygon.contains(linestring)
comparisons produce different results than GeoPandas. I am assuming this is due to the discrepancy between.contains
and.contains_properly
, but I haven't validated it yet. This is a serious issue and prevents me from integrating the quadtree approach.Quadtree-based point-in-polygon is slower than GeoPandas on a 10k set and also produces inconsistent results.
Presently we have three possible implementations of point-in-polygon: brute force, quadtree, and the unfinished brute-force with unlimited polygons.
At the present time, brute-force is implemented in PR #839 and #834. While these implementations are valid, because of the 31 polygon limitation, all of the pip-based binops are toy unit tests that only compare 31 features against each other. This will not provide a meaningful motivation to use GPU acceleration with cuSpatial until we can support more than 31 features. We can't justify using the quadtree-based approach until more performance benchmarks are made. The question is at what point does it become more efficient to use cuSpatial than GeoPandas for binary operations that depend on point-in-polygon?
Once quadtree-based point-in-polygon performance has been characterized, we'll need to decide if we should continue with the quadtree approach, and I will dig into fixing the existing discrepancies, or if I should finish the unlimited polygons brute force PR #754. That PR also has limitations, as the existing implementation has a substantial feature limit as well.$\sqrt{2^{32}}$ is the row-limit for pairwise, polygon-unlimited point in polygon, since $\sqrt{2^{32}}$ polygons measured against $\sqrt{2^{32}}$ points produces a single column that is at the length limit for
cudf
.Based on the observation that quadtree-based pip with only 10k features is slower than GeoPandas, it isn't likely that 65k features will be much faster. Brute-force pip on 65k features may be faster than GeoPandas, but given that with GeoPandas the runtime on 65k features is still only a few seconds or less, it isn't a compelling use-case. We need faster performance on larger datasets, and support for larger numbers of features.
This means that benchmarking quadtree-based point-in-polygon is currently the highest priority, to identify if there is a large enough dataset to make GPU backed computation compelling with the current algorithms.
Checklist