-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python benchmarks. #600
Conversation
from_geopandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The geopandas
version specifier in the integration
repository below will need to be updated as well before this PR can be merged. @thomcom, can you open a PR for that?
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have high level thoughts of why we need to incorporate cudf_benchmark utilities into cuspatial. AFAIK, cudf_benchmark
provides two benefits:
-
It provides a uniformed interface using and reusing fixtures. This is because cudf dataframe can varies over nrows, ncols, dtypes etc and we want to reduce redundancy of recreating similar fixtures and can reuse the fixtures as needed. Geopandas dataframe is built on top of pandas dataframe and provides additional geometry series type. Cuspatial benchmark framework should focus only on the geometry part and avoid overlapping cudf's coverage. With that introducing the
cudf_benchmark
framework could make it easy to create overlapping benchmark tests and makes it hard to single out the parts that cuspatial wants to benchmark. -
CUDF_BENCHMARKS_USE_PANDAS
is useful when we need to compare speed ups between cudf and pandas. We can (want to) do this today because feature parity is a development milestone for cuDF. For cuSpatial, I don't think that's the goal atm.
Most pytest_cases.fixture
introduced in this PR are simple pytest.fixture
s, which doesn't require incorporating all cudf_benchmark
infrastructure at all.
Right. I originally started out trying to support cudf's benchmarking framework, but after discussing with @vyasr it didn't seem necessary or even appropriate at this time.
For these reasons I think we should start out with a trimmer benchmark library for cuspatial. |
c330b3c
to
ac1f525
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One errant "cuDF" found.
Co-authored-by: Mark Harris <mharris@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below.
Curious, how long does it take to run the full benchmark suite?
The full set of tests takes 33 seconds on small default input data. |
|
from_geopandas
@gpucibot merge |
This PR adds benchmarks for the
from_geopandas
method and the rest of the Python API.It also includes a guide to benchmarking, closing #695