Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use quadtree-based point-in-polygon for .contains binops. #845

Closed

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Dec 6, 2022

Depends on #834
Depends on #839
Closes #844

Description

This PR implements quadtree-based point-in-polygon for geometric binary operations.

I added colinearity tests to the quadtree-based is_point_in_polygon function, and changed the brute-force pip to the quadtree approach.

The tests as written fail on one test_float_precision_limits test. I am assuming this is because the algorithm for is_point_in_polygon is slightly different than the version used in brute force. I haven't validated this yet.

The tests haven't been updated yet to use more than 31 features. This is for two reasons that I'm researching:

  1. About 2% of the randomly generated polygon.contains(linestring) comparisons produce different results than GeoPandas. I am assuming this is due to the discrepancy between .contains and .contains_properly, but I haven't validated it yet. This is a serious issue and prevents me from integrating the quadtree approach.
  2. The performance of the quadtree-based point-in-polygon on 10k polygons and 10k linestrings is notably slower than GeoPandas. This is an even more serious issue than the first.

Quadtree-based point-in-polygon is slower than GeoPandas on a 10k set and also produces inconsistent results.

Presently we have three possible implementations of point-in-polygon: brute force, quadtree, and the unfinished brute-force with unlimited polygons.

At the present time, brute-force is implemented in PR #839 and #834. While these implementations are valid, because of the 31 polygon limitation, all of the pip-based binops are toy unit tests that only compare 31 features against each other. This will not provide a meaningful motivation to use GPU acceleration with cuSpatial until we can support more than 31 features. We can't justify using the quadtree-based approach until more performance benchmarks are made. The question is at what point does it become more efficient to use cuSpatial than GeoPandas for binary operations that depend on point-in-polygon?

Once quadtree-based point-in-polygon performance has been characterized, we'll need to decide if we should continue with the quadtree approach, and I will dig into fixing the existing discrepancies, or if I should finish the unlimited polygons brute force PR #754. That PR also has limitations, as the existing implementation has a substantial feature limit as well. $\sqrt{2^{32}}$ is the row-limit for pairwise, polygon-unlimited point in polygon, since $\sqrt{2^{32}}$ polygons measured against $\sqrt{2^{32}}$ points produces a single column that is at the length limit for cudf.

Based on the observation that quadtree-based pip with only 10k features is slower than GeoPandas, it isn't likely that 65k features will be much faster. Brute-force pip on 65k features may be faster than GeoPandas, but given that with GeoPandas the runtime on 65k features is still only a few seconds or less, it isn't a compelling use-case. We need faster performance on larger datasets, and support for larger numbers of features.

This means that benchmarking quadtree-based point-in-polygon is currently the highest priority, to identify if there is a large enough dataset to make GPU backed computation compelling with the current algorithms.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@thomcom thomcom added 2 - In Progress Currenty a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 6, 2022
@thomcom thomcom added this to the DE-9IM milestone Dec 6, 2022
@thomcom thomcom self-assigned this Dec 6, 2022
@github-actions github-actions bot added libcuspatial Relates to the cuSpatial C++ library Python Related to Python code labels Dec 6, 2022
@thomcom thomcom closed this Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress improvement Improvement / enhancement to an existing function libcuspatial Relates to the cuSpatial C++ library non-breaking Non-breaking change Python Related to Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: Implement and benchmark quadtree point-in-polygon for binops.
1 participant