Fix read out of bounds in string concatenate #13838

pentschev · 2023-08-09T11:39:30Z

Description

If data is sufficiently large, fused_concatenate_string_chars_kernel will attempt to read out of bounds and ultimately cause CUDA to raise cudaErrorIllegalAddress. Details on how the issue was encountered are in #13771, although this was an already known problem.

Fixes #13771 .

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

pentschev · 2023-08-09T11:41:06Z

xrefing #13771 (comment) as I was unable to generate tests. For completeness, here's what I wrote there referring missing tests:

In any case, I've now put up a fix for the cudaErrorIllegalAddress in #13838. I confirmed it resolves that problem, but I spent some time (far too much) on it and failed to come up with the appropriate test. If I'm understanding the codeflow correctly, to hit reach that kernel in a way to make it fail we need data that makes the use_fused_kernel_heuristic evaluate to true and the total_bytes must be large enough to require a 64-bit integer for addressing. In my mind, the appropriate test would have a large enough number of columns (failing data has 5000) and the total_bytes must also be large (failing data is 1431647652 bytes long). I would appreciate if someone could help me creating some synthetic test data that we could use to test the kernel actually succeeds.

bdice

This appears to be the same fix as #10344. Do we need any other explicit casts for bounds checking, like that PR had? I did not see anything immediately where that would be necessary but may be worth another inspection.

pentschev · 2023-08-09T13:50:22Z

This appears to be the same fix as #10344. Do we need any other explicit casts for bounds checking, like that PR had? I did not see anything immediately where that would be necessary but may be worth another inspection.

I'm assuming you're referring specifically to this line. AFAIU, this is already covered here, but please correct me if I'm wrong.

PointKernel

LGTM

bdice · 2023-08-09T17:15:10Z

@pentschev My question above seems fine / already covered. Thanks!

pentschev · 2023-08-10T13:42:48Z

I managed to write a proper test in 25712ec. This is the output if the fix from this PR is reverted:

[ RUN      ] StringColumnTest.ConcatenateColumnViewLarge
CUDA Error detected. cudaErrorIllegalAddress an illegal memory access was encountered
COPYING_TEST: /datasets/pentschev/miniconda3/envs/cudf-invalid-address-src/include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp:253: void rmm::mr::detail::stream_ordered_memory_resource<PoolResource, FreeListType>::do_deallocate(void*, std::size_t, rmm::cuda_stream_view) [with PoolResource = rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>; FreeListType = rmm::mr::detail::coalescing_free_list; std::size_t = long unsigned int]: Assertion `status__ == cudaSuccess' failed.
Aborted (core dumped)

It's a bit of a long running test though (~3.5s), I hope this is ok.

GregoryKimball · 2023-08-16T04:30:45Z

cpp/src/strings/copying/concatenate.cu

@@ -121,7 +121,7 @@ __global__ void fused_concatenate_string_offset_kernel(column_device_view const*
                                                       bitmask_type* output_mask,
                                                       size_type* out_valid_count)
 {
-  size_type output_index     = threadIdx.x + blockIdx.x * blockDim.x;
+  int64_t output_index       = threadIdx.x + blockIdx.x * blockDim.x;


Would you please use the cudf::thread_index_type alias? Thank you again for diagnosing this issue.

Thanks @GregoryKimball , addressed that.

…concatenate

pentschev · 2023-08-16T10:00:53Z

Reviews/questions were addressed and tests are passing. Anything else needed here or could we get it merged?

PointKernel · 2023-08-16T16:48:39Z

/merge

PointKernel · 2023-08-16T16:50:08Z

@pentschev If you have write access, feel free to merge the PR on your own after getting enough approvals.

pentschev requested a review from a team as a code owner August 9, 2023 11:39

pentschev requested review from PointKernel and nvdbaranec August 9, 2023 11:39

pentschev mentioned this pull request Aug 9, 2023

[BUG] Errors converting tables from arrow to cuDF #13771

Closed

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 9, 2023

Fix read out of bounds in string concatenate

9e4b456

pentschev force-pushed the fix-string-concatenate branch from cd3c128 to 9e4b456 Compare August 9, 2023 11:56

bdice approved these changes Aug 9, 2023

View reviewed changes

bdice added bug Something isn't working non-breaking Non-breaking change labels Aug 9, 2023

PointKernel approved these changes Aug 9, 2023

View reviewed changes

GregoryKimball mentioned this pull request Aug 10, 2023

Use "ranger" to prevent grid stride loop overflow #10368

Open

Add large string concatenate test

25712ec

ttnghia approved these changes Aug 10, 2023

View reviewed changes

pentschev self-assigned this Aug 10, 2023

GregoryKimball reviewed Aug 16, 2023

View reviewed changes

pentschev added 2 commits August 15, 2023 23:43

Replace int64_t by cudf::thread_index_type

d983b14

Merge remote-tracking branch 'upstream/branch-23.10' into fix-string-…

c0f4d8d

…concatenate

rapids-bot bot merged commit 5d5032d into rapidsai:branch-23.10 Aug 16, 2023
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix read out of bounds in string concatenate #13838

Fix read out of bounds in string concatenate #13838

pentschev commented Aug 9, 2023 •

edited

Loading

pentschev commented Aug 9, 2023

bdice left a comment

pentschev commented Aug 9, 2023

PointKernel left a comment

bdice commented Aug 9, 2023

pentschev commented Aug 10, 2023

GregoryKimball Aug 16, 2023 •

edited

Loading

pentschev Aug 16, 2023

pentschev commented Aug 16, 2023

PointKernel commented Aug 16, 2023

PointKernel commented Aug 16, 2023

Fix read out of bounds in string concatenate #13838

Fix read out of bounds in string concatenate #13838

Conversation

pentschev commented Aug 9, 2023 • edited Loading

Description

Checklist

pentschev commented Aug 9, 2023

bdice left a comment

Choose a reason for hiding this comment

pentschev commented Aug 9, 2023

PointKernel left a comment

Choose a reason for hiding this comment

bdice commented Aug 9, 2023

pentschev commented Aug 10, 2023

GregoryKimball Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

pentschev Aug 16, 2023

Choose a reason for hiding this comment

pentschev commented Aug 16, 2023

PointKernel commented Aug 16, 2023

PointKernel commented Aug 16, 2023

pentschev commented Aug 9, 2023 •

edited

Loading

GregoryKimball Aug 16, 2023 •

edited

Loading