Add python benchmarks. #600

thomcom · 2022-07-22T17:11:06Z

This PR adds benchmarks for the from_geopandas method and the rest of the Python API.

It also includes a guide to benchmarking, closing #695

ajschmidt8

The geopandas version specifier in the integration repository below will need to be updated as well before this PR can be merged. @thomcom, can you open a PR for that?

ajschmidt8

Approving ops-codeowner file changes

isVoid

I have high level thoughts of why we need to incorporate cudf_benchmark utilities into cuspatial. AFAIK, cudf_benchmark provides two benefits:

It provides a uniformed interface using and reusing fixtures. This is because cudf dataframe can varies over nrows, ncols, dtypes etc and we want to reduce redundancy of recreating similar fixtures and can reuse the fixtures as needed. Geopandas dataframe is built on top of pandas dataframe and provides additional geometry series type. Cuspatial benchmark framework should focus only on the geometry part and avoid overlapping cudf's coverage. With that introducing the cudf_benchmark framework could make it easy to create overlapping benchmark tests and makes it hard to single out the parts that cuspatial wants to benchmark.
CUDF_BENCHMARKS_USE_PANDAS is useful when we need to compare speed ups between cudf and pandas. We can (want to) do this today because feature parity is a development milestone for cuDF. For cuSpatial, I don't think that's the goal atm.

Most pytest_cases.fixture introduced in this PR are simple pytest.fixtures, which doesn't require incorporating all cudf_benchmark infrastructure at all.

thomcom · 2022-07-27T21:18:06Z

Right. I originally started out trying to support cudf's benchmarking framework, but after discussing with @vyasr it didn't seem necessary or even appropriate at this time.

cuspatial is more of a ListSeries library than a Dataframe library - everything that supports dataframes is at best redundant with cudf and at worst it is going to become divergent.
cuspatial doesn't really support dtypes at this time. I think that our floating point columns usually support float32 or float64, now, but otherwise all columns have a fixed type for each API. GeoSeries can have a single type in a series, or completely heterogeneous types. Having type specific tests will apply to certain GeoSeries operations, eventually, but not yet.
GeoSeries provides a fairly small API surface
that is parallel to GeoPandas. Everything else in cuspatial does not have a language-specific analog. We don't need to switch easily between geopandas and cuspatial yet, for example.

For these reasons I think we should start out with a trimmer benchmark library for cuspatial.

python/cuspatial/benchmarks/io/bench_geoseries.py

isVoid · 2022-08-03T18:18:58Z

@thomcom do you mind writing up the python benchmark docs for #599 since this PR first introduced the benchmark suite?

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>

harrism

One errant "cuDF" found.

docs/source/developer_guide/benchmarking.md

Co-authored-by: Mark Harris <mharris@nvidia.com>

…fea-benchmark-io

isVoid

Some comments below.
Curious, how long does it take to run the full benchmark suite?

docs/source/developer_guide/benchmarking.md

python/cuspatial/benchmarks/pytest.ini

thomcom · 2022-09-30T15:48:08Z

Some comments below. Curious, how long does it take to run the full benchmark suite?

The full set of tests takes 33 seconds on small default input data.

thomcom · 2022-09-30T15:48:26Z

(rapids) rapids@compose:~/cuspatial/python/cuspatial/benchmarks$ time pytest
================================================================================================================================= test session starts =================================================================================================================================
platform linux -- Python 3.8.13, pytest-7.1.3, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/tcomer/mnt/NVIDIA/rapids-docker/cuspatial/python/cuspatial/benchmarks, configfile: pytest.ini
plugins: cov-3.0.0, cases-3.6.13, benchmark-3.4.1, forked-1.4.0, xdist-2.5.0, hypothesis-6.54.6
collected 21 items                                                                                                                                                                                                                                                                    

api/bench_api.py ..................                                                                                                                                                                                                                                             [ 85%]
io/bench_geoseries.py ...                                                                                                                                                                                                                                                       [100%]


---------------------------------------------------------------------------------------------------------------- benchmark: 21 tests -----------------------------------------------------------------------------------------------------------------
Name (time in us)                                       Min                       Max                      Mean                 StdDev                    Median                     IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_haversine_distance                           139.0160 (1.0)            318.3000 (1.0)            154.8732 (1.0)          32.1748 (1.0)            141.6490 (1.0)            4.5692 (1.0)       523;890  6,456.8943 (1.0)        4519           1
bench_lonlat_to_cartesian                          210.3299 (1.51)           478.5019 (1.50)           237.2452 (1.53)         48.6556 (1.51)           215.8792 (1.52)          13.6376 (2.98)      425;562  4,215.0479 (0.65)       3329           1
bench_points_in_spatial_window                     259.7428 (1.87)           529.4732 (1.66)           295.5240 (1.91)         59.0128 (1.83)           265.1850 (1.87)          24.5259 (5.37)      464;531  3,383.8204 (0.52)       2830           1
bench_trajectory_distances_and_speeds              362.3699 (2.61)           685.7242 (2.15)           391.5284 (2.53)         53.9242 (1.68)           368.6621 (2.60)          16.2490 (3.56)      217;343  2,554.0930 (0.40)       2046           1
bench_trajectory_bounding_boxes                    384.6721 (2.77)           732.2170 (2.30)           427.3871 (2.76)         79.3527 (2.47)           391.4400 (2.76)          25.4575 (5.57)      233;342  2,339.7992 (0.36)       2051           1
bench_polyline_bounding_boxes                      462.0778 (3.32)           858.6980 (2.70)           491.5333 (3.17)         63.6829 (1.98)           468.8120 (3.31)          13.5623 (2.97)      136;260  2,034.4501 (0.32)       1689           1
bench_polygon_bounding_boxes                       517.5611 (3.72)         1,042.5448 (3.28)           597.3495 (3.86)        127.7777 (3.97)           527.9840 (3.73)          81.9925 (17.94)     263;263  1,674.0619 (0.26)       1411           1
bench_pairwise_linestring_distance                 639.9602 (4.60)           964.9121 (3.03)           705.2090 (4.55)         68.0856 (2.12)           676.3469 (4.77)         107.4563 (23.52)       213;6  1,418.0193 (0.22)       1097           1
bench_quadtree_point_to_nearest_polyline           873.2041 (6.28)         1,515.8060 (4.76)           911.7086 (5.89)         66.9581 (2.08)           889.4689 (6.28)          29.3235 (6.42)        52;68  1,096.8416 (0.17)        688           1
bench_io_read_polygon_shapefile                  1,685.8699 (12.13)        2,333.1030 (7.33)         2,046.8916 (13.22)       299.6621 (9.31)         2,121.5100 (14.98)        560.6051 (122.69)        1;0    488.5457 (0.08)          5           1
bench_derive_trajectories                        2,237.0580 (16.09)        6,979.4860 (21.93)        2,746.7828 (17.74)       528.4972 (16.43)        2,634.2644 (18.60)        723.4351 (158.33)       36;3    364.0623 (0.06)        386           1
bench_io_geoseries_from_offsets                  6,929.0400 (49.84)        9,210.2559 (28.94)        7,498.8532 (48.42)       720.3332 (22.39)        7,215.1589 (50.94)        941.2048 (205.99)        1;0    133.3537 (0.02)         10           1
bench_quadtree_point_in_polygon                  8,008.3192 (57.61)       14,782.9410 (46.44)        9,987.2674 (64.49)     1,968.0318 (61.17)        8,548.2986 (60.35)      3,810.6833 (834.00)       32;0    100.1275 (0.02)        120           1
bench_quadtree_on_points                        12,491.7610 (89.86)       15,702.6912 (49.33)       12,866.9665 (83.08)       564.0410 (17.53)       12,640.5515 (89.24)        352.6580 (77.18)         8;8     77.7184 (0.01)         84           1
bench_from_geoseries_100                        17,417.7901 (125.29)      96,352.6999 (302.71)      20,873.9450 (134.78)   10,861.2324 (337.57)      19,178.4850 (135.39)     1,739.9152 (380.79)        1;2     47.9066 (0.01)         51           1
bench_io_from_geopandas                         21,291.4760 (153.16)      23,606.3749 (74.16)       22,073.7033 (142.53)      493.7965 (15.35)       22,011.4351 (155.39)       614.4474 (134.48)       11;1     45.3028 (0.01)         37           1
bench_io_to_geopandas                           32,513.8462 (233.89)      51,633.7841 (162.22)      35,600.8441 (229.87)    4,169.8471 (129.60)      34,209.4060 (241.51)     2,832.5964 (619.93)        4;3     28.0892 (0.00)         29           1
bench_directed_hausdorff_distance               53,695.8429 (386.26)     123,995.8352 (389.56)      59,650.9127 (385.16)   16,143.7641 (501.75)      56,051.5680 (395.71)     2,942.6329 (644.02)        1;1     16.7642 (0.00)         18           1
bench_from_geoseries_1000                      100,644.1731 (723.98)     123,424.3310 (387.76)     107,189.1096 (692.11)    8,004.7326 (248.79)     104,656.7055 (738.85)     8,843.0239 (>1000.0)       1;0      9.3293 (0.00)          8           1
bench_point_in_polygon                         203,288.3549 (>1000.0)    216,483.1341 (680.12)     206,608.6186 (>1000.0)   5,588.8981 (173.70)     203,943.8270 (>1000.0)    4,700.9572 (>1000.0)       1;1      4.8401 (0.00)          5           1
bench_from_geoseries_10000                   1,015,495.3292 (>1000.0)  1,156,930.4019 (>1000.0)  1,079,089.5166 (>1000.0)  64,659.0465 (>1000.0)  1,055,592.2610 (>1000.0)  117,687.0280 (>1000.0)       1;0      0.9267 (0.00)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================================================================================================================================= 21 passed in 28.91s =================================================================================================================================

real	0m32.592s
user	0m29.669s
sys	0m2.641s

… support yet.

thomcom · 2022-09-30T18:01:01Z

@gpucibot merge

github-actions bot added conda Related to conda and conda configuration Python Related to Python code labels Jul 22, 2022

thomcom marked this pull request as ready for review July 25, 2022 20:22

thomcom requested review from a team as code owners July 25, 2022 20:22

thomcom requested a review from isVoid July 25, 2022 20:22

thomcom changed the title ~~Add python benchmarks~~ Add python benchmarks for from_geopandas Jul 25, 2022

ajschmidt8 requested changes Jul 26, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

thomcom added 5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 26, 2022

This comment was marked as outdated.

Sign in to view

ajschmidt8 approved these changes Jul 26, 2022

View reviewed changes

ajschmidt8 mentioned this pull request Jul 26, 2022

Bump geopandas to 0.11.0 rapidsai/integration#505

Merged

isVoid requested changes Jul 27, 2022

View reviewed changes

First pass of four df size benchmarks.

ac1f525

thomcom force-pushed the fea-benchmark-io branch from c330b3c to ac1f525 Compare July 29, 2022 17:23

github-actions bot removed the conda Related to conda and conda configuration label Jul 29, 2022

harrism assigned thomcom Aug 1, 2022

fix style checks

5f4a5b9

This comment was marked as resolved.

Sign in to view

thomcom changed the base branch from branch-22.08 to branch-22.10 August 3, 2022 00:06

thomcom requested a review from isVoid August 3, 2022 17:41

thomcom added the 4 - Needs Reviewer Waiting for reviewer to review or respond label Aug 3, 2022

isVoid reviewed Aug 3, 2022

View reviewed changes

python/cuspatial/benchmarks/io/bench_geoseries.py Show resolved Hide resolved

python/cuspatial/benchmarks/io/bench_geoseries.py Outdated Show resolved Hide resolved

Update python/cuspatial/benchmarks/io/bench_geoseries.py

b98ce41

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>

github-actions bot added the inactive-30d label Sep 2, 2022

This comment was marked as outdated.

Sign in to view

github-actions bot removed the inactive-30d label Sep 6, 2022

thomcom added 3 commits September 26, 2022 15:21

Merge branch 'branch-22.10' into fea-benchmark-io

e929f9f

Rebuilding old benchmark branch.

b80c97c

Create benchmarking.md

c094519

harrism approved these changes Sep 27, 2022

View reviewed changes

docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved

thomcom and others added 3 commits September 27, 2022 12:37

Add benchmarks for python API.

74b4a94

Merge branch 'branch-22.10' into fea-benchmark-io

a395879

Update docs/source/developer_guide/benchmarking.md

9baa6d0

Co-authored-by: Mark Harris <mharris@nvidia.com>

thomcom requested a review from isVoid September 29, 2022 20:58

thomcom added 2 commits September 29, 2022 16:05

Merge branch 'fea-benchmark-io' of github.com:thomcom/cuspatial into …

56725c6

…fea-benchmark-io

Some kind of strange cycling taking place...

dd3fe5b

isVoid reviewed Sep 29, 2022

View reviewed changes

thomcom added 2 commits September 29, 2022 16:25

Merge branch 'branch-22.10' into fea-benchmark-io

1b87373

Update benchmarking doc based on comments.

fcd1469

rapidsai deleted a comment from github-actions bot Sep 30, 2022

I removed the inaccurate references to cuDF functionality we will not…

11265d1

… support yet.

thomcom changed the title ~~Add python benchmarks for from_geopandas~~ Add python benchmarks. Sep 30, 2022

Remove marker line.

1fdac16

isVoid approved these changes Sep 30, 2022

View reviewed changes

rapids-bot bot merged commit aec962c into rapidsai:branch-22.10 Sep 30, 2022

harrism mentioned this pull request Oct 3, 2022

[DOC] Create a python guide to benchmarking. #695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add python benchmarks. #600

Add python benchmarks. #600

thomcom commented Jul 22, 2022 •

edited

Loading

ajschmidt8 left a comment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

ajschmidt8 left a comment

isVoid left a comment

thomcom commented Jul 27, 2022

This comment was marked as resolved.

isVoid commented Aug 3, 2022

This comment was marked as outdated.

harrism left a comment

isVoid left a comment

thomcom commented Sep 30, 2022

thomcom commented Sep 30, 2022

thomcom commented Sep 30, 2022

Add python benchmarks. #600

Add python benchmarks. #600

Conversation

thomcom commented Jul 22, 2022 • edited Loading

ajschmidt8 left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

ajschmidt8 left a comment

Choose a reason for hiding this comment

isVoid left a comment

Choose a reason for hiding this comment

thomcom commented Jul 27, 2022

This comment was marked as resolved.

isVoid commented Aug 3, 2022

This comment was marked as outdated.

harrism left a comment

Choose a reason for hiding this comment

isVoid left a comment

Choose a reason for hiding this comment

thomcom commented Sep 30, 2022

thomcom commented Sep 30, 2022

thomcom commented Sep 30, 2022

thomcom commented Jul 22, 2022 •

edited

Loading