Skip to content

Commit 17ccadd

Browse files
authored
Improve Hausdorff perf and accept larger number of inputs. (#424)
Fixes #393 We switched to the exclusive scan approach to Hausdorff because certain benchmarks indicated better performance. Apparently those benchmarks were inadequate or just plain badly written (by me), and performance was in fact worse. This became apparent while fixing the OOM error reported in #393. I copied the 0.14 implementation in to the 21.08 branch to re-benchmark. here are the results: cuspatial@0.14: ``` ------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------ HausdorffBenchmark/hausdorff/100/64/manual_time 1.62 ms 1.78 ms 428 items_per_second=23.9898G/s HausdorffBenchmark/hausdorff/512/64/manual_time 43.9 ms 44.1 ms 16 items_per_second=23.6053G/s HausdorffBenchmark/hausdorff/4096/64/manual_time 2810 ms 2810 ms 1 items_per_second=23.6845G/s HausdorffBenchmark/hausdorff/6000/64/manual_time 6148 ms 6148 ms 1 items_per_second=23.2318G/s HausdorffBenchmark/hausdorff/100/100/manual_time 3.31 ms 3.47 ms 210 items_per_second=29.0333G/s HausdorffBenchmark/hausdorff/512/100/manual_time 88.9 ms 89.1 ms 8 items_per_second=28.7737G/s HausdorffBenchmark/hausdorff/4096/100/manual_time 5842 ms 5842 ms 1 items_per_second=28.132G/s HausdorffBenchmark/hausdorff/6000/100/manual_time 12698 ms 12698 ms 1 items_per_second=27.7783G/s ``` cuspatial@21.08 (with fix for OOM, as seen in previous commits of this PR) ``` ------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------ HausdorffBenchmark/hausdorff/100/64/manual_time 17.4 ms 17.6 ms 38 items_per_second=2.2391G/s HausdorffBenchmark/hausdorff/512/64/manual_time 489 ms 490 ms 2 items_per_second=2.11979G/s HausdorffBenchmark/hausdorff/4096/64/manual_time 37120 ms 37119 ms 1 items_per_second=1.79299G/s HausdorffBenchmark/hausdorff/6000/64/manual_time 82732 ms 82729 ms 1 items_per_second=1.7265G/s HausdorffBenchmark/hausdorff/100/100/manual_time 43.4 ms 43.7 ms 16 items_per_second=2.21402G/s HausdorffBenchmark/hausdorff/512/100/manual_time 1341 ms 1341 ms 1 items_per_second=1.90885G/s HausdorffBenchmark/hausdorff/4096/100/manual_time 94898 ms 94894 ms 1 items_per_second=1.7319G/s HausdorffBenchmark/hausdorff/6000/100/manual_time 199120 ms 199115 ms 1 items_per_second=1.77138G/s ``` The performance is bad, and this regression is my fault. Fortunately I was able to quickly reverse this regression and improve performance while getting rid of a bunch of code (and learning a lot in the process). This PR re-implements Hausdorff as a straightforward custom kernel that requires zero intermediate memory. this pr: ``` ------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------ HausdorffBenchmark/hausdorff/100/64/manual_time 1.31 ms 1.47 ms 526 items_per_second=29.6763G/s HausdorffBenchmark/hausdorff/512/64/manual_time 23.2 ms 23.3 ms 30 items_per_second=44.7567G/s HausdorffBenchmark/hausdorff/4096/64/manual_time 1589 ms 1590 ms 1 items_per_second=41.8747G/s HausdorffBenchmark/hausdorff/6000/64/manual_time 3170 ms 3170 ms 1 items_per_second=45.0638G/s HausdorffBenchmark/hausdorff/100/100/manual_time 2.92 ms 3.08 ms 239 items_per_second=32.8852G/s HausdorffBenchmark/hausdorff/512/100/manual_time 55.8 ms 55.8 ms 12 items_per_second=45.8415G/s HausdorffBenchmark/hausdorff/4096/100/manual_time 3547 ms 3547 ms 1 items_per_second=46.3317G/s HausdorffBenchmark/hausdorff/6000/100/manual_time 7658 ms 7658 ms 1 items_per_second=46.0564G/s ``` Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - Mark Harris (https://github.com/harrism) - Paul Taylor (https://github.com/trxcllnt) - AJ Schmidt (https://github.com/ajschmidt8) URL: #424
1 parent 934015b commit 17ccadd

File tree

7 files changed

+125
-588
lines changed

7 files changed

+125
-588
lines changed

cpp/benchmarks/hausdorff_benchmark.cpp

+4-5
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,9 @@
2424

2525
static void BM_hausdorff(benchmark::State& state)
2626
{
27-
int32_t num_points = state.range(1) - 1;
28-
int32_t num_spaces_asked = state.range(0) - 1;
29-
int32_t num_spaces = std::min(num_points, num_spaces_asked);
30-
int32_t num_points_per_space = num_points / num_spaces;
27+
int32_t num_spaces = state.range(0) - 1;
28+
int32_t num_points_per_space = state.range(1) - 1;
29+
int32_t num_points = num_points_per_space * num_spaces;
3130

3231
auto zero_iter = thrust::make_constant_iterator(0);
3332

@@ -62,7 +61,7 @@ class HausdorffBenchmark : public cuspatial::benchmark {
6261
BM_hausdorff(state); \
6362
} \
6463
BENCHMARK_REGISTER_F(HausdorffBenchmark, name) \
65-
->Ranges({{1 << 10, 1 << 14}, {1 << 10, 1 << 15}}) \
64+
->Ranges({{1 << 5, 1 << 13}, {1 << 2, 1 << 7}}) \
6665
->UseManualTime() \
6766
->Unit(benchmark::kMillisecond);
6867

cpp/src/spatial/detail/cartesian_product_group_index_iterator.cuh

-196
This file was deleted.

cpp/src/spatial/detail/hausdorff.cuh

-126
This file was deleted.

0 commit comments

Comments
 (0)