Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize set-like operations #12769

Merged
merged 24 commits into from
Apr 10, 2023
Merged

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Feb 13, 2023

Set-like operations such as intersect_distinct and difference_distinct call purge_nonempty_nulls when the input is nullable. This PR optimizes these set APIs by checking the existence of non-empty nulls (using has_nonempty_nulls) before calling to purge_nonempty_nulls.

@ttnghia ttnghia added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 13, 2023
@ttnghia ttnghia self-assigned this Feb 13, 2023
@github-actions github-actions bot added the CMake CMake build issue label Feb 13, 2023
@ttnghia
Copy link
Contributor Author

ttnghia commented Feb 14, 2023

Benchmark comparing before vs after this PR:

# intersect_distinct

## [0] Quadro RTX 6000

|  num_rows  |  depth  |  null_frequency  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |  Status  |
|------------|---------|------------------|------------|-------------|------------|-------------|---------------|---------|----------|
|    2^10    |    1    |        0         | 236.523 us |      14.04% | 235.340 us |      11.01% |     -1.184 us |  -0.50% |   PASS   |
|    2^13    |    1    |        0         | 297.840 us |      14.22% | 294.913 us |       8.53% |     -2.927 us |  -0.98% |   PASS   |
|    2^16    |    1    |        0         | 927.063 us |       4.98% | 928.836 us |       4.78% |      1.774 us |   0.19% |   PASS   |
|    2^10    |    4    |        0         |  25.484 ms |       1.07% |  27.372 ms |       4.52% |      1.888 ms |   7.41% |   FAIL   |
|    2^13    |    4    |        0         | 136.627 ms |       0.81% | 146.370 ms |       0.49% |      9.743 ms |   7.13% |   FAIL   |
|    2^16    |    4    |        0         | 422.196 ms |       0.31% | 466.010 ms |       2.20% |     43.814 ms |  10.38% |   FAIL   |
|    2^10    |    1    |       0.2        | 365.748 us |       9.44% | 346.861 us |      60.68% |    -18.887 us |  -5.16% |   PASS   |
|    2^13    |    1    |       0.2        | 405.400 us |       9.24% | 396.616 us |      45.96% |     -8.784 us |  -2.17% |   PASS   |
|    2^16    |    1    |       0.2        | 833.861 us |       6.33% | 875.925 us |      31.60% |     42.064 us |   5.04% |   PASS   |
|    2^10    |    4    |       0.2        |  24.712 ms |       1.05% |  26.997 ms |       4.53% |      2.285 ms |   9.24% |   FAIL   |
|    2^13    |    4    |       0.2        |  13.221 ms |       1.32% |  11.813 ms |       8.87% |  -1407.219 us | -10.64% |   FAIL   |
|    2^16    |    4    |       0.2        |  95.765 ms |       0.50% |  80.123 ms |       4.85% | -15642.116 us | -16.33% |   FAIL   |
|    2^10    |    1    |       0.8        | 353.305 us |      17.76% | 331.605 us |      51.34% |    -21.700 us |  -6.14% |   PASS   |
|    2^13    |    1    |       0.8        | 368.859 us |       8.63% | 378.645 us |      51.42% |      9.786 us |   2.65% |   PASS   |
|    2^16    |    1    |       0.8        | 392.823 us |      10.70% | 390.992 us |      49.61% |     -1.831 us |  -0.47% |   PASS   |
|    2^10    |    4    |       0.8        |   1.621 ms |       4.38% |   1.504 ms |      27.91% |   -117.088 us |  -7.22% |   FAIL   |
|    2^13    |    4    |       0.8        |   1.748 ms |       3.64% |   1.521 ms |      21.72% |   -226.220 us | -12.94% |   FAIL   |
|    2^16    |    4    |       0.8        |   1.959 ms |       4.83% |   1.929 ms |      29.51% |    -30.942 us |  -1.58% |   PASS   **|**

@ttnghia ttnghia marked this pull request as ready for review February 14, 2023 00:13
@ttnghia ttnghia requested a review from a team as a code owner February 14, 2023 00:13
@PointKernel
Copy link
Member

Are the performance regression in several test cases expected?

@ttnghia
Copy link
Contributor Author

ttnghia commented Feb 15, 2023

Are the performance regression in several test cases expected?

Yes. Previously, only a trivial check null_count == 0 was used. Now has_nonempty_nulls() is called (if the input is nullable).

But sacrificing the performance regression in those cases can improve more performance in some other cases so this can be considered as an "improvement".

Edit: Such (small) performance regression may be due to noise, as you can see some regression in cases even the input doesn't have nulls.

@ttnghia ttnghia changed the base branch from branch-23.04 to branch-23.06 April 4, 2023 00:48
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one open request about getting rid of an unnecessary enum and just using a lambda, but since that got a 👍 I'm approving assuming that change will happen.

@ttnghia ttnghia requested a review from vyasr April 9, 2023 23:13
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking suggestion.

LGTM

cpp/benchmarks/lists/set_operations.cpp Outdated Show resolved Hide resolved
@ttnghia
Copy link
Contributor Author

ttnghia commented Apr 10, 2023

/merge

@rapids-bot rapids-bot bot merged commit 30411b5 into rapidsai:branch-23.06 Apr 10, 2023
@ttnghia ttnghia deleted the optimize_set_ops branch April 11, 2023 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants