[DocDB] Optimize stack trace collection and symbolization #19085

mbautin · 2023-09-11T23:33:52Z

Jira Link: DB-7896

Description

To collect a stack trace (an array of raw program counters) we currently have the following ways:

StackTrace::Collect
- Uses google::GetStackTrace in ASAN/TSAN
- Uses the standard backtrace function (not to be confused with libbacktrace) otherwise

To symbolize a stack trace (convert program counters to function names as well as file names / line numbers, if possible) in StackTrace::Symbolize we do one of the following:

Use libbacktrace via the backtrace_pcinfo function and a callback
Use google::Symbolize from glog ( https://github.com/yugabyte/glog/blob/v0.4.0-yb-5/src/symbolize.cc#L760 )

There is also absl::Symbolize which is much faster than google::Symbolize according to @SrivastavaAnubhav. We should prefer to use that function.

libbacktrace is a library that has some issues (internal deadlocks under some circumstances) so we don't use it in release by default. It is controlled by the FLAGS_libbacktrace, which is true by default in debug mode and false in release mode on Linux, and always false on macOS. libbacktrace is only supported on Linux.

Also, we have a separate implementation of a combined collect + symbolize function called GetStackTrace:

Uses libbacktrace via backtrace_full with the same callback as StackTrace::Symbolize does.
Otherwise, google::glog_internal_namespace_::DumpStackTraceToString is used ( https://github.com/yugabyte/glog/blob/v0.4.0-yb-5/src/utilities.cc#L125 )

We need to clean up this inconsistency and make GetStackTrace() the same as StackTrace::Collect() followed by Symbolize().

A proposed format for stack traces is as follows (using function names provided by absl::Symbolize):

#1  | 0x55f482394ccc | yb/util/debug-util-test.cc:80                  | DebugUtilTest_TestStackTrace2_Test::TestBody()
#2  | 0x7fb8c29fdb8c | googletest-1.12.1/googletest/src/gtest.cc:2599 | testing::internal::HandleExceptionsInMethodIfSupported<>()
#3  | 0x7fb8c29fdb8c | googletest-1.12.1/googletest/src/gtest.cc:2635 | testing::internal::HandleExceptionsInMethodIfSupported<>()
#4  | 0x7fb8c29e3d87 | googletest-1.12.1/googletest/src/gtest.cc:2674 | testing::Test::Run()
#5  | 0x7fb8c29e4da6 | googletest-1.12.1/googletest/src/gtest.cc:2853 | testing::TestInfo::Run()
#6  | 0x7fb8c29e5a1e | googletest-1.12.1/googletest/src/gtest.cc:3012 | testing::TestSuite::Run()
#7  | 0x7fb8c29f51cd | googletest-1.12.1/googletest/src/gtest.cc:5870 | testing::internal::UnitTestImpl::RunAllTests()
#8  | 0x7fb8c29fe86c | googletest-1.12.1/googletest/src/gtest.cc:2599 | testing::internal::HandleExceptionsInMethodIfSupported<>()
#9  | 0x7fb8c29fe86c | googletest-1.12.1/googletest/src/gtest.cc:2635 | testing::internal::HandleExceptionsInMethodIfSupported<>()
#10 | 0x7fb8c29f4cdf | googletest-1.12.1/googletest/src/gtest.cc:5444 | testing::UnitTest::Run()
#11 | 0x7fb8c2f6a760 | uninstrumented/include/gtest/gtest.h:2293      | RUN_ALL_TESTS()
#12 | 0x7fb8c2f6a132 | yb/util/test_main.cc:110                       | main
#13 | 0x7fb8c0942cf2 | :0                                             | __libc_start_main
#14 | 0x55f482394bad | :0                                             | _start

Also, libbacktrace produces really long function names/signatures with all the template arguments, and the resulting stack traces become unreadable. absl::Symbolize produces much more reasonable function names/signatures.

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

The text was updated successfully, but these errors were encountered:

Also some improvements in handling the `--dev-repo` flag. Issue: yugabyte/yugabyte-db#19085

Summary: Upgrade OpenSSL to 3.0.8 as 1.1.1 has reached EOL. Also pulls in the following thirdparty changes: - Disable Linuxbrew builds (yugabyte/yugabyte-db-thirdparty@55eee2b) - Update glog to use stack unwinding based on backtrace function (yugabyte/yugabyte-db-thirdparty@ec8ab75, #19085) Jira: DB-8566 Test Plan: Jenkins Reviewers: mbautin, rthallam Reviewed By: rthallam Subscribers: rthallam, yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29701

mbautin added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Sep 11, 2023

mbautin self-assigned this Sep 11, 2023

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Sep 11, 2023

yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Sep 19, 2023

mbautin mentioned this issue Nov 2, 2023

Update glog to use stack unwinding based on backtrace function yugabyte/yugabyte-db-thirdparty#242

Merged

mbautin added a commit to yugabyte/yugabyte-db-thirdparty that referenced this issue Nov 2, 2023

Update glog to use stack unwinding based on backtrace function (#242)

ec8ab75

Also some improvements in handling the `--dev-repo` flag. Issue: yugabyte/yugabyte-db#19085

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Optimize stack trace collection and symbolization #19085

[DocDB] Optimize stack trace collection and symbolization #19085

mbautin commented Sep 11, 2023 •

edited by jira bot

Loading

[DocDB] Optimize stack trace collection and symbolization #19085

[DocDB] Optimize stack trace collection and symbolization #19085

Comments

mbautin commented Sep 11, 2023 • edited by jira bot Loading

Description

Warning: Please confirm that this issue does not contain any sensitive information

mbautin commented Sep 11, 2023 •

edited by jira bot

Loading