Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix read out of bounds in string concatenate #13838

Merged
merged 4 commits into from
Aug 16, 2023

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Aug 9, 2023

Description

If data is sufficiently large, fused_concatenate_string_chars_kernel will attempt to read out of bounds and ultimately cause CUDA to raise cudaErrorIllegalAddress. Details on how the issue was encountered are in #13771, although this was an already known problem.

Fixes #13771 .

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pentschev pentschev requested a review from a team as a code owner August 9, 2023 11:39
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 9, 2023
@pentschev
Copy link
Member Author

xrefing #13771 (comment) as I was unable to generate tests. For completeness, here's what I wrote there referring missing tests:

In any case, I've now put up a fix for the cudaErrorIllegalAddress in #13838. I confirmed it resolves that problem, but I spent some time (far too much) on it and failed to come up with the appropriate test. If I'm understanding the codeflow correctly, to hit reach that kernel in a way to make it fail we need data that makes the use_fused_kernel_heuristic evaluate to true and the total_bytes must be large enough to require a 64-bit integer for addressing. In my mind, the appropriate test would have a large enough number of columns (failing data has 5000) and the total_bytes must also be large (failing data is 1431647652 bytes long). I would appreciate if someone could help me creating some synthetic test data that we could use to test the kernel actually succeeds.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the same fix as #10344. Do we need any other explicit casts for bounds checking, like that PR had? I did not see anything immediately where that would be necessary but may be worth another inspection.

@bdice bdice added bug Something isn't working non-breaking Non-breaking change labels Aug 9, 2023
@pentschev
Copy link
Member Author

This appears to be the same fix as #10344. Do we need any other explicit casts for bounds checking, like that PR had? I did not see anything immediately where that would be necessary but may be worth another inspection.

I'm assuming you're referring specifically to this line. AFAIU, this is already covered here, but please correct me if I'm wrong.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bdice
Copy link
Contributor

bdice commented Aug 9, 2023

@pentschev My question above seems fine / already covered. Thanks!

@pentschev
Copy link
Member Author

I managed to write a proper test in 25712ec. This is the output if the fix from this PR is reverted:

[ RUN      ] StringColumnTest.ConcatenateColumnViewLarge
CUDA Error detected. cudaErrorIllegalAddress an illegal memory access was encountered
COPYING_TEST: /datasets/pentschev/miniconda3/envs/cudf-invalid-address-src/include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp:253: void rmm::mr::detail::stream_ordered_memory_resource<PoolResource, FreeListType>::do_deallocate(void*, std::size_t, rmm::cuda_stream_view) [with PoolResource = rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>; FreeListType = rmm::mr::detail::coalescing_free_list; std::size_t = long unsigned int]: Assertion `status__ == cudaSuccess' failed.
Aborted (core dumped)

It's a bit of a long running test though (~3.5s), I hope this is ok.

@pentschev pentschev self-assigned this Aug 10, 2023
@@ -121,7 +121,7 @@ __global__ void fused_concatenate_string_offset_kernel(column_device_view const*
bitmask_type* output_mask,
size_type* out_valid_count)
{
size_type output_index = threadIdx.x + blockIdx.x * blockDim.x;
int64_t output_index = threadIdx.x + blockIdx.x * blockDim.x;
Copy link
Contributor

@GregoryKimball GregoryKimball Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please use the cudf::thread_index_type alias? Thank you again for diagnosing this issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GregoryKimball , addressed that.

@pentschev
Copy link
Member Author

Reviews/questions were addressed and tests are passing. Anything else needed here or could we get it merged?

@PointKernel
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 5d5032d into rapidsai:branch-23.10 Aug 16, 2023
54 checks passed
@PointKernel
Copy link
Member

@pentschev If you have write access, feel free to merge the PR on your own after getting enough approvals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG] Errors converting tables from arrow to cuDF
5 participants