Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stacktrace into cudf exception types #13298

Merged
merged 60 commits into from
Jun 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
7d7cb1e
Add `stacktrace`
ttnghia May 5, 2023
6b51a29
Update docs
ttnghia May 5, 2023
22e6553
Adjust frame index
ttnghia May 5, 2023
f4ae832
Adopt `get_stacktrace` in `identify_stream_usage.cpp`
ttnghia May 5, 2023
f641add
Rewrite stacktrace
ttnghia May 5, 2023
8d45ee3
Fix format
ttnghia May 5, 2023
aa2b963
Fix comment
ttnghia May 5, 2023
78d39f5
Update cmake
ttnghia May 5, 2023
cde4ec5
Merge branch 'branch-23.06' into stacktrace
ttnghia May 5, 2023
a03a497
Add header
ttnghia May 5, 2023
1d291fa
Rewrite stacktrace.cpp
ttnghia May 5, 2023
c88dc7f
Update cmake
ttnghia May 5, 2023
9085058
Rewrite stacktrace accessor
ttnghia May 5, 2023
b158d65
Update copyright year
ttnghia May 5, 2023
9b04caa
Update JNI to adopt stacktrace
ttnghia May 5, 2023
14ab74f
Change `stacktrace` return type
ttnghia May 5, 2023
c1d8dee
Update JNI
ttnghia May 5, 2023
4747704
Remove redundant definition
ttnghia May 5, 2023
97c8046
Rename variable
ttnghia May 5, 2023
353cac8
Optimize JNI
ttnghia May 5, 2023
eb324ca
Move header into `cudf/detail/utilities/`
ttnghia May 5, 2023
1ebde06
Fix `meta.yaml`
ttnghia May 5, 2023
a2c830b
Add compile option
ttnghia May 5, 2023
9b816e5
Change cmake
ttnghia May 6, 2023
0ca1bba
Change cast type
ttnghia May 6, 2023
ef7ebb9
Merge branch 'branch-23.06' into stacktrace
ttnghia May 6, 2023
b820f14
Update cpp/CMakeLists.txt
ttnghia May 8, 2023
e247cb8
Update copyright year
ttnghia May 8, 2023
4112288
Merge branch 'branch-23.06' into stacktrace
ttnghia May 8, 2023
3d7e337
Remove variable
ttnghia May 8, 2023
b24420b
Rewrite stacktrace
ttnghia May 8, 2023
5da984a
Change comments
ttnghia May 8, 2023
1033535
Using `c*` headers, and add `std::` prefix
ttnghia May 8, 2023
991acbc
Encapsulate headers in `#ifdef`
ttnghia May 8, 2023
ba8d97b
Add `// __GNUC__`
ttnghia May 8, 2023
06d7ce7
Reset
ttnghia May 9, 2023
83bd318
Other approach for cmake
ttnghia May 9, 2023
695bfa9
Link `cudf_backtrace` with stream test
ttnghia May 9, 2023
0656b3b
Remove cmake string replace for `_CXX_FLAGS_`
ttnghia May 9, 2023
3224559
Apply suggestions from code review
ttnghia May 9, 2023
27da4e4
Merge branch 'branch-23.06' into stacktrace
ttnghia May 11, 2023
7b2cca9
Merge branch 'branch-23.06' into stacktrace
ttnghia May 12, 2023
72859b8
Fix JNI bugs
ttnghia May 15, 2023
651ed1c
Add more default constructor for Java exception classes
ttnghia May 15, 2023
5d86822
Merge branch 'branch-23.06' into stacktrace
ttnghia May 15, 2023
7f1df5b
Merge branch 'branch-23.06' into stacktrace
ttnghia May 18, 2023
a36c191
Disable stacktrace completely by default
ttnghia May 18, 2023
f382ea6
Merge branch 'branch-23.08' into stacktrace
ttnghia May 19, 2023
0c312c8
Update cpp/src/utilities/stacktrace.cpp
ttnghia Jun 6, 2023
f25bf64
Merge branch 'branch-23.08' into stacktrace
ttnghia Jun 6, 2023
bd4085d
Add more stacktrace usage
ttnghia Jun 6, 2023
e08ba04
Merge branch 'branch-23.08' into stacktrace
ttnghia Jun 7, 2023
e44f2a2
Merge branch 'branch-23.08' into stacktrace
ttnghia Jun 9, 2023
e7c5967
Update cpp/CMakeLists.txt
ttnghia Jun 9, 2023
0e0399c
Change macro name
ttnghia Jun 9, 2023
25ff793
Not throwing new exception
ttnghia Jun 9, 2023
dc03826
Fix typo
ttnghia Jun 9, 2023
58ff5c7
Throw cuda exception but with error code
ttnghia Jun 9, 2023
6ebee14
Add default stacktrace message
ttnghia Jun 9, 2023
9ebe2a2
Add exception check macro
ttnghia Jun 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ outputs:
- test -f $PREFIX/include/cudf/detail/utilities/linked_column.hpp
- test -f $PREFIX/include/cudf/detail/utilities/logger.hpp
- test -f $PREFIX/include/cudf/detail/utilities/pinned_host_vector.hpp
- test -f $PREFIX/include/cudf/detail/utilities/stacktrace.hpp
- test -f $PREFIX/include/cudf/detail/utilities/vector_factories.hpp
- test -f $PREFIX/include/cudf/detail/utilities/visitor_overload.hpp
- test -f $PREFIX/include/cudf/dictionary/detail/concatenate.hpp
Expand Down
56 changes: 53 additions & 3 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,18 @@ option(
stream to external libraries."
OFF
)
# Option to add all symbols to the dynamic symbol table in the library file, allowing to retrieve
# human-readable stacktrace for debugging.
option(
CUDF_BUILD_STACKTRACE_DEBUG
"Replace the current optimization flags by the options '-rdynamic -Og -NDEBUG', useful for debugging with stacktrace retrieval"
OFF
)
option(DISABLE_DEPRECATION_WARNINGS "Disable warnings generated from deprecated declarations." OFF)
# Option to enable line info in CUDA device compilation to allow introspection when profiling /
# memchecking
option(CUDA_ENABLE_LINEINFO
"Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler" OFF
"Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler)" OFF
)
option(CUDA_WARNINGS_AS_ERRORS "Enable -Werror=all-warnings for all CUDA compilation" ON)
# cudart can be statically linked or dynamically linked. The python ecosystem wants dynamic linking
Expand Down Expand Up @@ -94,13 +101,17 @@ message(VERBOSE "CUDF: Use a file cache for JIT compiled kernels: ${JITIFY_USE_C
message(VERBOSE "CUDF: Build and statically link Arrow libraries: ${CUDF_USE_ARROW_STATIC}")
message(VERBOSE "CUDF: Build and enable S3 filesystem support for Arrow: ${CUDF_ENABLE_ARROW_S3}")
message(VERBOSE "CUDF: Build with per-thread default stream: ${CUDF_USE_PER_THREAD_DEFAULT_STREAM}")
message(
VERBOSE
"CUDF: Replace the current optimization flags by the options '-rdynamic -Og' (useful for debugging with stacktrace retrieval): ${CUDF_BUILD_STACKTRACE_DEBUG}"
)
message(
VERBOSE
"CUDF: Disable warnings generated from deprecated declarations: ${DISABLE_DEPRECATION_WARNINGS}"
)
message(
VERBOSE
"CUDF: Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler: ${CUDA_ENABLE_LINEINFO}"
"CUDF: Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler): ${CUDA_ENABLE_LINEINFO}"
)
message(VERBOSE "CUDF: Statically link the CUDA runtime: ${CUDA_STATIC_RUNTIME}")

Expand All @@ -115,6 +126,10 @@ if(BUILD_TESTS AND NOT CUDF_BUILD_TESTUTIL)
)
endif()

if(CUDF_BUILD_STACKTRACE_DEBUG AND NOT CMAKE_COMPILER_IS_GNUCXX)
message(FATAL_ERROR "CUDF_BUILD_STACKTRACE_DEBUG is only supported with GCC compiler")
endif()

set(CUDF_CXX_FLAGS "")
set(CUDF_CUDA_FLAGS "")
set(CUDF_CXX_DEFINITIONS "")
Expand Down Expand Up @@ -608,6 +623,7 @@ add_library(
src/utilities/default_stream.cpp
src/utilities/linked_column.cpp
src/utilities/logger.cpp
src/utilities/stacktrace.cpp
src/utilities/traits.cpp
src/utilities/type_checks.cpp
src/utilities/type_dispatcher.cpp
Expand Down Expand Up @@ -646,6 +662,31 @@ target_compile_options(
"$<$<COMPILE_LANGUAGE:CUDA>:${CUDF_CUDA_FLAGS}>"
)

if(CUDF_BUILD_STACKTRACE_DEBUG)
# Remove any optimization level to avoid nvcc warning "incompatible redefinition for option
# 'optimize'".
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS}")
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS_RELEASE "${CMAKE_CUDA_FLAGS_RELEASE}")
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS_MINSIZEREL
"${CMAKE_CUDA_FLAGS_MINSIZEREL}"
)
string(REGEX REPLACE "(\-O[0123])" "" CMAKE_CUDA_FLAGS_RELWITHDEBINFO
"${CMAKE_CUDA_FLAGS_RELWITHDEBINFO}"
)

add_library(cudf_backtrace INTERFACE)
target_compile_definitions(cudf_backtrace INTERFACE CUDF_BUILD_STACKTRACE_DEBUG)
target_compile_options(
cudf_backtrace INTERFACE "$<$<COMPILE_LANGUAGE:CXX>:-Og>"
"$<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=-Og>"
)
target_link_options(
cudf_backtrace INTERFACE "$<$<LINK_LANGUAGE:CXX>:-rdynamic>"
"$<$<LINK_LANGUAGE:CUDA>:-Xlinker=-rdynamic>"
)
target_link_libraries(cudf PRIVATE cudf_backtrace)
endif()

# Specify include paths for the current target and dependents
target_include_directories(
cudf
Expand Down Expand Up @@ -829,7 +870,9 @@ if(CUDF_BUILD_STREAMS_TEST_UTIL)
# depending via ctest and whether it has been updated to expose public stream APIs.
foreach(_mode cudf testing)
set(_tgt "cudf_identify_stream_usage_mode_${_mode}")
add_library(${_tgt} SHARED tests/utilities/identify_stream_usage.cpp)
add_library(
${_tgt} SHARED src/utilities/stacktrace.cpp tests/utilities/identify_stream_usage.cpp
)

set_target_properties(
${_tgt}
Expand All @@ -838,7 +881,14 @@ if(CUDF_BUILD_STREAMS_TEST_UTIL)
CXX_STANDARD_REQUIRED ON
POSITION_INDEPENDENT_CODE ON
)
target_compile_options(
${_tgt} PRIVATE "$<BUILD_INTERFACE:$<$<COMPILE_LANGUAGE:CXX>:${CUDF_CXX_FLAGS}>>"
)
target_include_directories(${_tgt} PRIVATE "$<BUILD_INTERFACE:${CUDF_SOURCE_DIR}/include>")
target_link_libraries(${_tgt} PUBLIC CUDA::cudart rmm::rmm)
if(CUDF_BUILD_STACKTRACE_DEBUG)
target_link_libraries(${_tgt} PRIVATE cudf_backtrace)
endif()
add_library(cudf::${_tgt} ALIAS ${_tgt})

if("${_mode}" STREQUAL "testing")
Expand Down
47 changes: 47 additions & 0 deletions cpp/include/cudf/detail/utilities/stacktrace.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <string>

namespace cudf::detail {
/**
* @addtogroup utility_stacktrace
* @{
* @file
*/

/**
* @brief Specify whether the last stackframe is included in the stacktrace.
*/
enum class capture_last_stackframe : bool { YES, NO };

/**
* @brief Query the current stacktrace and return the whole stacktrace as one string.
*
* Depending on the value of the flag `capture_last_frame`, the caller that executes stacktrace
* retrieval can be included in the output result.
*
* @param capture_last_frame Flag to specify if the current stackframe will be included into
* the output
* @return A string storing the whole current stacktrace
*/
std::string get_stacktrace(capture_last_stackframe capture_last_frame);

/** @} */ // end of group

} // namespace cudf::detail
30 changes: 27 additions & 3 deletions cpp/include/cudf/utilities/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

#pragma once

#include <cudf/detail/utilities/stacktrace.hpp>

#include <cuda.h>
#include <cuda_runtime_api.h>
#include <stdexcept>
Expand All @@ -29,13 +31,35 @@ namespace cudf {
* @file
*/

/**
* @brief The struct to store the current stacktrace upon its construction.
*/
struct stacktrace_recorder {
stacktrace_recorder()
// Exclude the current stackframe, as it is this constructor.
: _stacktrace{cudf::detail::get_stacktrace(cudf::detail::capture_last_stackframe::NO)}
{
}

public:
/**
* @brief Get the stored stacktrace captured during object construction.
*
* @return The pointer to a null-terminated string storing the output stacktrace
*/
char const* stacktrace() const { return _stacktrace.c_str(); }

protected:
std::string const _stacktrace; //!< The whole stacktrace stored as one string.
};

/**
* @brief Exception thrown when logical precondition is violated.
*
* This exception should not be thrown directly and is instead thrown by the
* CUDF_EXPECTS macro.
*/
struct logic_error : public std::logic_error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folks should be aware that this is changing the size of the exception type even when compiling in non-stacktrace mode.

That isn't to say it's a good idea to not inherit from stacktrace_recorder in non-stacktrace mode because there would be the possibility for weird ABI issues with different builds of libcudf having different sizes for exception types.

The ideal way to do this without changing the size of the exception type would be to include the stack trace in the what() string such that there is only a single string member.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "what()" string is accessed for logging. If we extend it, it may be very large for downstream applications to print out every time. This issue has been mentioned before, please see #12422 (comment).

Copy link
Contributor

@davidwendt davidwendt May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the recorder need to be part of the signature?
The stack trace does not seem to have any real association with the exception object.
Could it not just be a free function used like this?

try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << stacktrace() << std::endl;
  throw e;
} 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not. stacktrace() gives you the call stack up to the point inside stacktrace(). From there, you cannot trace back to where the exception was created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack trace does not seem to have any real association with the exception object.

In fact, the stack trace is created and attached to the exception object. The exception classes now derive from stacktrace_recorder, which captures the call stack immediately upon construction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be for each exception type to own a stacktrace_recorder that is constructed in its constructor. That approach would require ignoring 2 frames instead of just 1, though, and it would be more verbose. I don't have a strong opinion that that is a necessary change here.

struct logic_error : public std::logic_error, public stacktrace_recorder {
/**
* @brief Constructs a logic_error with the error message.
*
Expand All @@ -57,7 +81,7 @@ struct logic_error : public std::logic_error {
* @brief Exception thrown when a CUDA error is encountered.
*
*/
struct cuda_error : public std::runtime_error {
struct cuda_error : public std::runtime_error, public stacktrace_recorder {
/**
* @brief Construct a new cuda error object with error message and code.
*
Expand Down Expand Up @@ -92,7 +116,7 @@ struct fatal_cuda_error : public cuda_error {
* unsupported data_type. This exception should not be thrown directly and is
* instead thrown by the CUDF_EXPECTS or CUDF_FAIL macros.
*/
struct data_type_error : public std::invalid_argument {
struct data_type_error : public std::invalid_argument, public stacktrace_recorder {
/**
* @brief Constructs a data_type_error with the error message.
*
Expand Down
8 changes: 8 additions & 0 deletions cpp/include/cudf_test/stream_checking_resource_adaptor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,12 @@

#include <cudf_test/default_stream.hpp>

#include <cudf/detail/utilities/stacktrace.hpp>

#include <rmm/mr/device/device_memory_resource.hpp>

#include <iostream>

/**
* @brief Resource that verifies that the default stream is not used in any allocation.
*
Expand Down Expand Up @@ -162,6 +166,10 @@ class stream_checking_resource_adaptor final : public rmm::mr::device_memory_res
: (cstream != cudf::test::get_default_stream().value());

if (invalid_stream) {
// Exclude the current function from stacktrace.
std::cout << cudf::detail::get_stacktrace(cudf::detail::capture_last_stackframe::NO)
<< std::endl;

if (error_on_invalid_stream_) {
throw std::runtime_error("Attempted to perform an operation on an unexpected stream!");
} else {
Expand Down
88 changes: 88 additions & 0 deletions cpp/src/utilities/stacktrace.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cudf/detail/utilities/stacktrace.hpp>

#if defined(__GNUC__) && defined(CUDF_BUILD_STACKTRACE_DEBUG)
#include <cxxabi.h>
#include <execinfo.h>

#include <cstdlib>
#include <cstring>
#include <sstream>
#endif // defined(__GNUC__) && defined(CUDF_BUILD_STACKTRACE_DEBUG)

namespace cudf::detail {

std::string get_stacktrace(capture_last_stackframe capture_last_frame)
{
#if defined(__GNUC__) && defined(CUDF_BUILD_STACKTRACE_DEBUG)
constexpr int max_stack_depth = 64;
void* stack[max_stack_depth];

auto const depth = backtrace(stack, max_stack_depth);
auto const modules = backtrace_symbols(stack, depth);

if (modules == nullptr) { return "No stacktrace could be captured!"; }

std::stringstream ss;

// Skip one more depth to avoid including the stackframe of this function.
auto const skip_depth = 1 + (capture_last_frame == capture_last_stackframe::YES ? 0 : 1);
for (auto i = skip_depth; i < depth; ++i) {
// Each modules[i] string contains a mangled name in the format like following:
// `module_name(function_name+0x012) [0x01234567890a]`
// We need to extract function name and function offset.
char* begin_func_name = std::strstr(modules[i], "(");
char* begin_func_offset = std::strstr(modules[i], "+");
char* end_func_offset = std::strstr(modules[i], ")");

auto const frame_idx = i - skip_depth;
if (begin_func_name && begin_func_offset && end_func_offset &&
begin_func_name < begin_func_offset) {
// Split `modules[i]` into separate null-terminated strings.
// After this, mangled function name will then be [begin_func_name, begin_func_offset), and
// function offset is in [begin_func_offset, end_func_offset).
*(begin_func_name++) = '\0';
*(begin_func_offset++) = '\0';
*end_func_offset = '\0';

// We need to demangle function name.
int status{0};
char* func_name = abi::__cxa_demangle(begin_func_name, nullptr, nullptr, &status);

ss << "#" << frame_idx << ": " << modules[i] << " : "
<< (status == 0 /*demangle success*/ ? func_name : begin_func_name) << "+"
<< begin_func_offset << "\n";
free(func_name);
} else {
ss << "#" << frame_idx << ": " << modules[i] << "\n";
}
}

free(modules);

return ss.str();
#else
#ifdef CUDF_BUILD_STACKTRACE_DEBUG
return "Stacktrace is only supported when built with a GNU compiler.";
#else
return "libcudf was not built with stacktrace support.";
#endif // CUDF_BUILD_STACKTRACE_DEBUG
#endif // __GNUC__
}

} // namespace cudf::detail
Loading