Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary Predicates Introduction and Benchmark Notebook #1156

Merged

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented May 26, 2023

Description

Closes #1138
Closes #1141 here
Depends on #1152
Depends on #1064

Please direct your attention to the notebook since the dependencies and delayed state of CI issues over this week have put a lot of files into this PR.

This notebook demonstrates cuSpatial's new binary predicates on large datasets, benchmarking and comparing against the host implementation on GeoPandas.

In order to support the large inputs for these comparisons I had to reactivate the pairwise_point_in_polygon functionality that I'd previously written off. This is because quadtree doesn't support large N for NxN operations, since it is many-to-many, and brute-force would require a huge number of iterations to support such large dataframes. There are some more optimizations that can be made to speed up pairwise_point_in_polygon, but the algorithm itself isn't sufficiently fast. It is detailed fairly closely in the notebook.

Please take a look and let's have some conversations about steps forward.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@thomcom thomcom added 3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change labels May 26, 2023
@thomcom thomcom added this to the DE-9IM milestone May 26, 2023
@thomcom thomcom requested review from harrism and isVoid May 26, 2023 20:59
@thomcom thomcom requested review from a team as code owners May 26, 2023 20:59
@thomcom thomcom self-assigned this May 26, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added cmake Related to CMake code or build configuration libcuspatial Relates to the cuSpatial C++ library Python Related to Python code labels May 26, 2023
@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #8.        [display(widgets.HBox(row)) for row in rows]

There's not need to use list comprehension here. Just

for row in rows:
display(widgets.HBox(row))

Reply via ReviewNB

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace "+" mark with "-", I think "Point-point relationship" is more typical in showing a binary relationship.


Reply via ReviewNB

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a broken output in this cell? Or it's just the reviewNB was not able to show it.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah it looks like ReviewNB can't display the HBoxed output areas that are defined by the import ipywidgets as widgets call. It shows a visualization of the geometries used in test.

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only two relationships that a point can have with another point: equality or inequality.

I understand what you mean here. Although for those who don't have much context in binpreds, point-point can also be disjoint, or intersects. But they can all be reduced to equal/inequality. Perhaps rephrase the word here for general audience?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it another pass.

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three relationships that a Point can have with a LineString: Disjoint, Point is equal to one of the LineString boundary points, and point falls between the LineString boundary points.

Is there a name for the latter 2 relationships?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm specifically aware of. I assume that you're looking for one of the binary predicate names since I used disjoint for the first name?

@isVoid
Copy link
Contributor

isVoid commented May 30, 2023

Before I go further, may I ask who's the intended audience of this notebook? The first section is a deep dive into the situations between each possible combination of geometries, and this section makes me feel that it reveals too much of the underlying implementation detail to GIS users. If this is for documentation purposes, it also doesn't seem like it's clear enough that this is how the predicate is implemented.

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.        """Times the speed of producing the test data as well as the predicate execution.

Why should you time the data production? Benchmark should only time the kernel.


Reply via ReviewNB

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #9.            'GeoPandas predicate time',

I have forgotten the difference between "predicate time" and "time". You said at one point you want to remove one of them?


Reply via ReviewNB

@@ -0,0 +1,1157 @@
{
Copy link
Contributor

@isVoid isVoid May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems better to exist in code comments, not in notebook...


Reply via ReviewNB

@isVoid
Copy link
Contributor

isVoid commented May 30, 2023

The notebook is largely benchmark, not binpred introduction.

@github-actions github-actions bot removed the libcuspatial Relates to the cuSpatial C++ library label May 31, 2023
@jarmak-nv
Copy link
Contributor

Just providing a super high level review here - the notebook is incredibly valuable for us as a team, it can alert us to regressions, track improvements, and show where we should put our focus in the future assuming further acceleration is needed.

I think the other comments are correct in that this isn't the best "story" for highlighting the binpreds. Team facing, it's truly awesome, I think we could just also benefit from you making a more story-focused version that highlights some but not all of the binpreds more like the geocoding notebook.

Let's work as a team to figure it out a good story - you've done amazing work to get the binpreds into cuSpatial and I want as many people as possible to see/touch it and have the i get it 💡 moment.

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @thomcom offline and walked through the changes in this PR. In addition to the notebook, there are multiple performance optimization in the binary predicates and the GeoSeries infrastructure that the code base can all benefits.

isVoid

This comment was marked as duplicate.

@thomcom thomcom changed the base branch from branch-23.06 to branch-23.08 June 6, 2023 14:37
@@ -86,4 +86,6 @@
}
}
}

"forwardPorts": [8080]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I use notebooks in the dev container often and didn't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed to use snakeviz with it, for some reason.

@jarmak-nv
Copy link
Contributor

I think CI might be failing on the notebooks step because it doesn't know to look in subdirs like this was moved to.

Does it make sense for this to be in https://github.com/rapidsai/cuspatial/tree/branch-23.08/python/cuspatial/benchmarks/api?

@thomcom
Copy link
Contributor Author

thomcom commented Aug 3, 2023

It might pass CI if I put it in that folder? :)

@thomcom thomcom requested a review from a team as a code owner August 3, 2023 03:34
@github-actions github-actions bot added the ci label Aug 3, 2023
Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR introduces a new benchmarking notebook.

Is this notebook intended to be run in CI?

I think historically we've skipped benchmark notebooks in CI. See cuml and cugraph for reference:

cuspatial has a similar variable to skip notebook below:

@harrism
Copy link
Member

harrism commented Aug 3, 2023

Oh, we should probably skip the cuproj_benchmark.ipynb too.

@thomcom thomcom force-pushed the fea/binpred-introduction-notebook branch from 6abf517 to ff386c0 Compare August 3, 2023 22:42
@thomcom
Copy link
Contributor Author

thomcom commented Aug 4, 2023

@ajschmidt8 your review might get this in for 23.08

@isVoid isVoid requested a review from ajschmidt8 August 4, 2023 01:37
Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray beat me to it.

approving anyway for good measure 👍

@isVoid
Copy link
Contributor

isVoid commented Aug 4, 2023

/merge

@rapids-bot rapids-bot bot merged commit ae8eb90 into rapidsai:branch-23.08 Aug 4, 2023
@isVoid
Copy link
Contributor

isVoid commented Aug 4, 2023

Thanks @raydouglass @ajschmidt8 ! Congrats @thomcom !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team ci cmake Related to CMake code or build configuration feature request New feature or request non-breaking Non-breaking change Python Related to Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: New notebook/demo showcasing spatial relationship predicates
6 participants