Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python benchmarks. #600

Merged
merged 15 commits into from
Sep 30, 2022
Merged

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Jul 22, 2022

This PR adds benchmarks for the from_geopandas method and the rest of the Python API.

It also includes a guide to benchmarking, closing #695

@github-actions github-actions bot added conda Related to conda and conda configuration Python Related to Python code labels Jul 22, 2022
@thomcom thomcom marked this pull request as ready for review July 25, 2022 20:22
@thomcom thomcom requested review from a team as code owners July 25, 2022 20:22
@thomcom thomcom requested a review from isVoid July 25, 2022 20:22
@thomcom thomcom changed the title Add python benchmarks Add python benchmarks for from_geopandas Jul 25, 2022
Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The geopandas version specifier in the integration repository below will need to be updated as well before this PR can be merged. @thomcom, can you open a PR for that?

@thomcom

This comment was marked as outdated.

@ajschmidt8

This comment was marked as outdated.

@thomcom thomcom added 5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 26, 2022
@thomcom

This comment was marked as outdated.

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving ops-codeowner file changes

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have high level thoughts of why we need to incorporate cudf_benchmark utilities into cuspatial. AFAIK, cudf_benchmark provides two benefits:

  1. It provides a uniformed interface using and reusing fixtures. This is because cudf dataframe can varies over nrows, ncols, dtypes etc and we want to reduce redundancy of recreating similar fixtures and can reuse the fixtures as needed. Geopandas dataframe is built on top of pandas dataframe and provides additional geometry series type. Cuspatial benchmark framework should focus only on the geometry part and avoid overlapping cudf's coverage. With that introducing the cudf_benchmark framework could make it easy to create overlapping benchmark tests and makes it hard to single out the parts that cuspatial wants to benchmark.

  2. CUDF_BENCHMARKS_USE_PANDAS is useful when we need to compare speed ups between cudf and pandas. We can (want to) do this today because feature parity is a development milestone for cuDF. For cuSpatial, I don't think that's the goal atm.

Most pytest_cases.fixture introduced in this PR are simple pytest.fixtures, which doesn't require incorporating all cudf_benchmark infrastructure at all.

@thomcom
Copy link
Contributor Author

thomcom commented Jul 27, 2022

Right. I originally started out trying to support cudf's benchmarking framework, but after discussing with @vyasr it didn't seem necessary or even appropriate at this time.

  1. cuspatial is more of a ListSeries library than a Dataframe library - everything that supports dataframes is at best redundant with cudf and at worst it is going to become divergent.
  2. cuspatial doesn't really support dtypes at this time. I think that our floating point columns usually support float32 or float64, now, but otherwise all columns have a fixed type for each API. GeoSeries can have a single type in a series, or completely heterogeneous types. Having type specific tests will apply to certain GeoSeries operations, eventually, but not yet.
  3. GeoSeries provides a fairly small API surface
    that is parallel to GeoPandas. Everything else in cuspatial does not have a language-specific analog. We don't need to switch easily between geopandas and cuspatial yet, for example.

For these reasons I think we should start out with a trimmer benchmark library for cuspatial.

@thomcom thomcom force-pushed the fea-benchmark-io branch from c330b3c to ac1f525 Compare July 29, 2022 17:23
@github-actions github-actions bot removed the conda Related to conda and conda configuration label Jul 29, 2022
@harrism

This comment was marked as resolved.

@thomcom thomcom changed the base branch from branch-22.08 to branch-22.10 August 3, 2022 00:06
@thomcom thomcom requested a review from isVoid August 3, 2022 17:41
@thomcom thomcom added the 4 - Needs Reviewer Waiting for reviewer to review or respond label Aug 3, 2022
@isVoid
Copy link
Contributor

isVoid commented Aug 3, 2022

@thomcom do you mind writing up the python benchmark docs for #599 since this PR first introduced the benchmark suite?

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
@isVoid

This comment was marked as outdated.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One errant "cuDF" found.

docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved
@thomcom thomcom requested a review from isVoid September 29, 2022 20:58
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments below.
Curious, how long does it take to run the full benchmark suite?

docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved
docs/source/developer_guide/benchmarking.md Show resolved Hide resolved
docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved
docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved
docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved
docs/source/developer_guide/benchmarking.md Outdated Show resolved Hide resolved
python/cuspatial/benchmarks/pytest.ini Show resolved Hide resolved
@rapidsai rapidsai deleted a comment from github-actions bot Sep 30, 2022
@thomcom
Copy link
Contributor Author

thomcom commented Sep 30, 2022

Some comments below. Curious, how long does it take to run the full benchmark suite?

The full set of tests takes 33 seconds on small default input data.

@thomcom
Copy link
Contributor Author

thomcom commented Sep 30, 2022

(rapids) rapids@compose:~/cuspatial/python/cuspatial/benchmarks$ time pytest
================================================================================================================================= test session starts =================================================================================================================================
platform linux -- Python 3.8.13, pytest-7.1.3, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/tcomer/mnt/NVIDIA/rapids-docker/cuspatial/python/cuspatial/benchmarks, configfile: pytest.ini
plugins: cov-3.0.0, cases-3.6.13, benchmark-3.4.1, forked-1.4.0, xdist-2.5.0, hypothesis-6.54.6
collected 21 items                                                                                                                                                                                                                                                                    

api/bench_api.py ..................                                                                                                                                                                                                                                             [ 85%]
io/bench_geoseries.py ...                                                                                                                                                                                                                                                       [100%]


---------------------------------------------------------------------------------------------------------------- benchmark: 21 tests -----------------------------------------------------------------------------------------------------------------
Name (time in us)                                       Min                       Max                      Mean                 StdDev                    Median                     IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_haversine_distance                           139.0160 (1.0)            318.3000 (1.0)            154.8732 (1.0)          32.1748 (1.0)            141.6490 (1.0)            4.5692 (1.0)       523;890  6,456.8943 (1.0)        4519           1
bench_lonlat_to_cartesian                          210.3299 (1.51)           478.5019 (1.50)           237.2452 (1.53)         48.6556 (1.51)           215.8792 (1.52)          13.6376 (2.98)      425;562  4,215.0479 (0.65)       3329           1
bench_points_in_spatial_window                     259.7428 (1.87)           529.4732 (1.66)           295.5240 (1.91)         59.0128 (1.83)           265.1850 (1.87)          24.5259 (5.37)      464;531  3,383.8204 (0.52)       2830           1
bench_trajectory_distances_and_speeds              362.3699 (2.61)           685.7242 (2.15)           391.5284 (2.53)         53.9242 (1.68)           368.6621 (2.60)          16.2490 (3.56)      217;343  2,554.0930 (0.40)       2046           1
bench_trajectory_bounding_boxes                    384.6721 (2.77)           732.2170 (2.30)           427.3871 (2.76)         79.3527 (2.47)           391.4400 (2.76)          25.4575 (5.57)      233;342  2,339.7992 (0.36)       2051           1
bench_polyline_bounding_boxes                      462.0778 (3.32)           858.6980 (2.70)           491.5333 (3.17)         63.6829 (1.98)           468.8120 (3.31)          13.5623 (2.97)      136;260  2,034.4501 (0.32)       1689           1
bench_polygon_bounding_boxes                       517.5611 (3.72)         1,042.5448 (3.28)           597.3495 (3.86)        127.7777 (3.97)           527.9840 (3.73)          81.9925 (17.94)     263;263  1,674.0619 (0.26)       1411           1
bench_pairwise_linestring_distance                 639.9602 (4.60)           964.9121 (3.03)           705.2090 (4.55)         68.0856 (2.12)           676.3469 (4.77)         107.4563 (23.52)       213;6  1,418.0193 (0.22)       1097           1
bench_quadtree_point_to_nearest_polyline           873.2041 (6.28)         1,515.8060 (4.76)           911.7086 (5.89)         66.9581 (2.08)           889.4689 (6.28)          29.3235 (6.42)        52;68  1,096.8416 (0.17)        688           1
bench_io_read_polygon_shapefile                  1,685.8699 (12.13)        2,333.1030 (7.33)         2,046.8916 (13.22)       299.6621 (9.31)         2,121.5100 (14.98)        560.6051 (122.69)        1;0    488.5457 (0.08)          5           1
bench_derive_trajectories                        2,237.0580 (16.09)        6,979.4860 (21.93)        2,746.7828 (17.74)       528.4972 (16.43)        2,634.2644 (18.60)        723.4351 (158.33)       36;3    364.0623 (0.06)        386           1
bench_io_geoseries_from_offsets                  6,929.0400 (49.84)        9,210.2559 (28.94)        7,498.8532 (48.42)       720.3332 (22.39)        7,215.1589 (50.94)        941.2048 (205.99)        1;0    133.3537 (0.02)         10           1
bench_quadtree_point_in_polygon                  8,008.3192 (57.61)       14,782.9410 (46.44)        9,987.2674 (64.49)     1,968.0318 (61.17)        8,548.2986 (60.35)      3,810.6833 (834.00)       32;0    100.1275 (0.02)        120           1
bench_quadtree_on_points                        12,491.7610 (89.86)       15,702.6912 (49.33)       12,866.9665 (83.08)       564.0410 (17.53)       12,640.5515 (89.24)        352.6580 (77.18)         8;8     77.7184 (0.01)         84           1
bench_from_geoseries_100                        17,417.7901 (125.29)      96,352.6999 (302.71)      20,873.9450 (134.78)   10,861.2324 (337.57)      19,178.4850 (135.39)     1,739.9152 (380.79)        1;2     47.9066 (0.01)         51           1
bench_io_from_geopandas                         21,291.4760 (153.16)      23,606.3749 (74.16)       22,073.7033 (142.53)      493.7965 (15.35)       22,011.4351 (155.39)       614.4474 (134.48)       11;1     45.3028 (0.01)         37           1
bench_io_to_geopandas                           32,513.8462 (233.89)      51,633.7841 (162.22)      35,600.8441 (229.87)    4,169.8471 (129.60)      34,209.4060 (241.51)     2,832.5964 (619.93)        4;3     28.0892 (0.00)         29           1
bench_directed_hausdorff_distance               53,695.8429 (386.26)     123,995.8352 (389.56)      59,650.9127 (385.16)   16,143.7641 (501.75)      56,051.5680 (395.71)     2,942.6329 (644.02)        1;1     16.7642 (0.00)         18           1
bench_from_geoseries_1000                      100,644.1731 (723.98)     123,424.3310 (387.76)     107,189.1096 (692.11)    8,004.7326 (248.79)     104,656.7055 (738.85)     8,843.0239 (>1000.0)       1;0      9.3293 (0.00)          8           1
bench_point_in_polygon                         203,288.3549 (>1000.0)    216,483.1341 (680.12)     206,608.6186 (>1000.0)   5,588.8981 (173.70)     203,943.8270 (>1000.0)    4,700.9572 (>1000.0)       1;1      4.8401 (0.00)          5           1
bench_from_geoseries_10000                   1,015,495.3292 (>1000.0)  1,156,930.4019 (>1000.0)  1,079,089.5166 (>1000.0)  64,659.0465 (>1000.0)  1,055,592.2610 (>1000.0)  117,687.0280 (>1000.0)       1;0      0.9267 (0.00)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================================================================================================================================= 21 passed in 28.91s =================================================================================================================================

real	0m32.592s
user	0m29.669s
sys	0m2.641s

@thomcom thomcom changed the title Add python benchmarks for from_geopandas Add python benchmarks. Sep 30, 2022
@thomcom
Copy link
Contributor Author

thomcom commented Sep 30, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit aec962c into rapidsai:branch-22.10 Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Reviewer Waiting for reviewer to review or respond 5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants