-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use warp per string for long strings in cudf::strings::contains() #10739
Use warp per string for long strings in cudf::strings::contains() #10739
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10739 +/- ##
================================================
+ Coverage 86.37% 86.39% +0.02%
================================================
Files 142 142
Lines 22306 22306
================================================
+ Hits 19266 19272 +6
+ Misses 3040 3034 -6
Continue to review full report at Codecov.
|
Improvements to benchmarks ran locally on my dev machine:
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice wins!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. The speedups look good and the implementation/tests are clean.
@gpucibot merge |
Improves the performance on
cudf::strings::contains()
for long strings. This executes a warp per string to match a target over sections of a single string in parallel. The benchmark showed this to be faster than the current implementation only for longer strings (greater than 64 bytes). It also proved somewhat faster and more consistent than a pure character-parallel approach.This change may also help improve the performance of the regex
contains_re()
function in the future.