Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stacktrace into cudf exception types #13298

Merged
merged 60 commits into from
Jun 9, 2023

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented May 5, 2023

This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort.

Closes #12422.

Example:

#0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446
#1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113
#2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e
#3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88
#4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9
#5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027
#6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f
#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6
#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195
#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109
#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f
#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87
#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95
#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c
#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5

Usage

In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (CUDF_BUILD_STACKTRACE_DEBUG) has been added. Just set this option to ON before building cudf and it will be ready to use.

For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example:

try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << e.stacktrace() << std::endl;
  throw e;
} 
// similar with catching other exception types

Follow-up work

The next step would be patching rmm to attach stacktrace into rmm:: exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace.

Note:

  • This feature doesn't require libcudf to be built in Debug mode.
  • The flag CUDF_BUILD_STACKTRACE_DEBUG should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions.
  • This flag removes the current optimization flag from compiling (such as -O2 or -O3, if in Release mode) and replaces by -Og (optimize for debugging).
  • If this option is not set to ON, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected.

@ttnghia ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 5, 2023
@ttnghia ttnghia self-assigned this May 5, 2023
@ttnghia ttnghia changed the title Add stack trace into cudf error types Add stack trace into cudf exception types May 5, 2023
@github-actions github-actions bot added the CMake CMake build issue label May 5, 2023
@ttnghia ttnghia removed the feature request New feature or request label May 5, 2023
@github-actions github-actions bot added the conda label May 5, 2023
@github-actions github-actions bot added the Java Affects Java cuDF API. label May 5, 2023
@harrism
Copy link
Member

harrism commented May 16, 2023

CMake changes look good

REALLY good.

@ttnghia ttnghia changed the base branch from branch-23.06 to branch-23.08 May 19, 2023 22:12
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for taking so long to review this. The changes look solid, just a couple of small requests. Can we insert a call to get_stacktrace in stream_checking_resource_adaptor::verify_stream? I had originally wanted to extract this function anyway for that purpose since I originally implemented the trace to help find improper stream usage.

/**
* @brief Exception thrown when logical precondition is violated.
*
* This exception should not be thrown directly and is instead thrown by the
* CUDF_EXPECTS macro.
*/
struct logic_error : public std::logic_error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be for each exception type to own a stacktrace_recorder that is constructed in its constructor. That approach would require ignoring 2 frames instead of just 1, though, and it would be more verbose. I don't have a strong opinion that that is a necessary change here.

cpp/src/utilities/stacktrace.cpp Outdated Show resolved Hide resolved
@ttnghia ttnghia requested a review from vyasr June 6, 2023 22:20
cpp/CMakeLists.txt Outdated Show resolved Hide resolved
@ttnghia ttnghia requested a review from jlowe June 9, 2023 03:41
java/src/main/native/include/jni_utils.hpp Show resolved Hide resolved
java/src/main/native/include/jni_utils.hpp Outdated Show resolved Hide resolved
java/src/main/native/include/jni_utils.hpp Outdated Show resolved Hide resolved
@ttnghia ttnghia requested a review from jlowe June 9, 2023 17:04
@ttnghia
Copy link
Contributor Author

ttnghia commented Jun 9, 2023

/merge

@rapids-bot rapids-bot bot merged commit 69206d1 into rapidsai:branch-23.08 Jun 9, 2023
@ttnghia ttnghia deleted the stacktrace branch June 9, 2023 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] When libcudf throws exception, it should also print out the stack trace
9 participants