Skip to content

Commit

Permalink
[#17993] docdb: Add stack trace tracking framework and I/O stack trac…
Browse files Browse the repository at this point in the history
…e counting

Summary:
This diff adds a framework for counting occurrences of  different stack traces, collected where ever `TrackStackTrace()` is executed. Unique stack traces are only ever symbolized at most once, and counts are collected in thread-local counters, in order to keep overhead moderate. Traces are placed into groups specified at the `TrackStackTrace()` call site, and an optional weight can be specified.

Read and write syscalls performed through the abstractions in env.cc/file_system.cc are tracked through this in the kReadIO and kWriteIO groups, and the newly added `/io-stack-traces` server UI endpoint exposes this information, consisting of number of calls and number of bytes read (or written) through each trace.

A separate new endpoint, `/debug-stack-traces` has also been added to list tracked stack traces in the kDebugging group, which is intended to be unused in committed code and added temporarily as needed through carefully placed `TrackStackTrace()` calls for debugging or investigation purposes.

A third endpoint, `/reset-stack-traces` has been added to reset counters on tracked stack traces.

Example output of `/io-stack-traces` endpoint:

{img https://github.com/yugabyte/yugabyte-db/assets/6742815/5d29a85c-ba7d-40b3-b306-4655ca34d15c}

All stack trace tracking is guarded behind the newly added `track_stack_traces` gflag, which is by default on in debug and off in release due to performance considerations (~5% degradation observed with sysbench).
Jira: DB-7054

Test Plan:
Jenkins.

Also added DebugUtilTest.TestTrackedStackTraces: `./yb_build.sh --cxx-test util_debug-util-test --gtest_filter DebugUtilTest.TestTrackedStackTraces`

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D29116
  • Loading branch information
es1024 committed May 6, 2024
1 parent 787ce95 commit a51a66d
Show file tree
Hide file tree
Showing 9 changed files with 397 additions and 0 deletions.
89 changes: 89 additions & 0 deletions src/yb/server/default-path-handlers.cc
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@
#include "yb/util/jsonwriter.h"
#include "yb/util/result.h"
#include "yb/util/status_log.h"
#include "yb/util/stack_trace_tracker.h"
#include "yb/util/url-coding.h"
#include "yb/util/version_info.h"
#include "yb/util/version_info.pb.h"
Expand All @@ -99,6 +100,7 @@ DEFINE_RUNTIME_uint32(max_prometheus_metric_entries, UINT32_MAX,
"adding a metric with all its entities would exceed the limit, then we will drop them all."
"Thus, the actual number of metric entries returned might be smaller than the limit.");

DECLARE_bool(track_stack_traces);
DECLARE_bool(TEST_mini_cluster_mode);

namespace yb {
Expand Down Expand Up @@ -669,6 +671,87 @@ static void HandleGetVersionInfo(
jw.EndObject();
}

static void IOStackTraceHandler(const Webserver::WebRequest& req, Webserver::WebResponse* resp) {
std::stringstream *output = &resp->output;

if (!GetAtomicFlag(&FLAGS_track_stack_traces)) {
(*output) << "track_stack_traces must be turned on to use this page.";
return;
}

Tags tags(false /* as_text */);

auto traces = GetTrackedStackTraces();
std::sort(traces.begin(), traces.end(),
[](const auto& left, const auto& right) { return left.weight > right.weight; });

(*output) << tags.header << "I/O stack traces" << tags.end_header;

(*output) << tags.table << tags.row
<< tags.table_header << "Type" << tags.end_table_header
<< tags.table_header << "Count" << tags.end_table_header
<< tags.table_header << "Bytes" << tags.end_table_header
<< tags.table_header << "Stack Trace" << tags.end_table_header
<< tags.end_row;
for (const auto& entry : traces) {
if (entry.count == 0 ||
(entry.group != StackTraceTrackingGroup::kReadIO &&
entry.group != StackTraceTrackingGroup::kWriteIO)) {
continue;
}
(*output) << tags.row
<< tags.cell
<< (entry.group == StackTraceTrackingGroup::kReadIO ? "Read" : "Write")
<< tags.end_cell
<< tags.cell << entry.count << tags.end_cell
<< tags.cell << HumanReadableNumBytes::ToString(entry.weight) << tags.end_cell
<< tags.cell << tags.pre_tag << EscapeForHtmlToString(entry.symbolized_trace)
<< tags.end_pre_tag << tags.end_cell
<< tags.end_row;
}

(*output) << tags.end_table;
}

static void DebugStackTraceHandler(const Webserver::WebRequest& req, Webserver::WebResponse* resp) {
std::stringstream *output = &resp->output;

if (!GetAtomicFlag(&FLAGS_track_stack_traces)) {
(*output) << "track_stack_traces must be turned on to use this page.";
return;
}

Tags tags(false /* as_text */);

auto traces = GetTrackedStackTraces();
std::sort(traces.begin(), traces.end(),
[](const auto& left, const auto& right) { return left.count > right.count; });

(*output) << tags.header << "Tracked stack traces" << tags.end_header;

(*output) << tags.table << tags.row
<< tags.table_header << "Count" << tags.end_table_header
<< tags.table_header << "Stack Trace" << tags.end_table_header
<< tags.end_row;
for (const auto& entry : traces) {
if (entry.count == 0 || entry.group != StackTraceTrackingGroup::kDebugging) {
continue;
}
(*output) << tags.row
<< tags.cell << entry.count << tags.end_cell
<< tags.cell << tags.pre_tag << EscapeForHtmlToString(entry.symbolized_trace)
<< tags.end_pre_tag << tags.end_cell
<< tags.end_row;
}

(*output) << tags.end_table;
}

static void ResetStackTraceHandler(const Webserver::WebRequest& req, Webserver::WebResponse* resp) {
ResetTrackedStackTraces();
resp->output << "Tracked stack traces reset.";
}

} // anonymous namespace

// Registered to handle "/memz", and prints out memory allocation statistics.
Expand Down Expand Up @@ -707,6 +790,12 @@ void AddDefaultPathHandlers(Webserver* webserver) {
"/api/v1/varz", "Flags", std::bind(&GetFlagsJsonHandler, _1, _2, webserver), false, false);
webserver->RegisterPathHandler("/api/v1/version-info", "Build Version Info",
HandleGetVersionInfo, false, false);
webserver->RegisterPathHandler("/io-stack-traces", "I/O Stack Traces",
IOStackTraceHandler, true, false);
webserver->RegisterPathHandler("/debug-stack-traces", "Debugging Stack Traces",
DebugStackTraceHandler, true, false);
webserver->RegisterPathHandler("/reset-stack-traces", "Reset Stack Traces",
ResetStackTraceHandler, true, false);

AddPprofPathHandlers(webserver);
}
Expand Down
1 change: 1 addition & 0 deletions src/yb/util/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ set(UTIL_SRCS
spinlock_profiling.cc
split.cc
stack_trace.cc
stack_trace_tracker.cc
stats/iostats_context_imp.cc
stats/perf_level_imp.cc
status.cc
Expand Down
31 changes: 31 additions & 0 deletions src/yb/util/debug-util-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
#include "yb/util/debug-util.h"
#include "yb/util/debug/long_operation_tracker.h"
#include "yb/util/scope_exit.h"
#include "yb/util/stack_trace_tracker.h"
#include "yb/util/test_macros.h"
#include "yb/util/test_util.h"
#include "yb/util/thread.h"
Expand All @@ -61,6 +62,7 @@ using std::vector;

using namespace std::literals;

DECLARE_bool(track_stack_traces);
DECLARE_bool(TEST_disable_thread_stack_collection_wait);

namespace yb {
Expand Down Expand Up @@ -565,4 +567,33 @@ TEST_F(DebugUtilTest, YB_DISABLE_TEST_ON_MACOS(TestGetStackTraceWhileCreatingThr
}
thread_holder.Stop();
}

TEST_F(DebugUtilTest, TestTrackedStackTraces) {
constexpr size_t kNumIterations = 100;
constexpr size_t kWeight = 10;

ANNOTATE_UNPROTECTED_WRITE(FLAGS_track_stack_traces) = true;

for (size_t i = 0; i < kNumIterations; ++i) {
TrackStackTrace(StackTraceTrackingGroup::kDebugging, kWeight);
}

DumpTrackedStackTracesToLog(StackTraceTrackingGroup::kDebugging);

auto traces = GetTrackedStackTraces();
// There may be I/O traces tracked during test setup, but kDebugging should never be in
// checked in code, so we can rely on there only be one kDebugging trace.
bool found_debugging_trace = false;
for (size_t i = 0; i < traces.size(); ++i) {
auto& trace = traces[i];
if (trace.group == StackTraceTrackingGroup::kDebugging) {
ASSERT_FALSE(found_debugging_trace);
found_debugging_trace = true;
ASSERT_EQ(trace.count, kNumIterations);
ASSERT_EQ(trace.weight, kNumIterations * kWeight);
}
}
ASSERT_TRUE(found_debugging_trace);
}

} // namespace yb
5 changes: 5 additions & 0 deletions src/yb/util/env_posix.cc
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
#include "yb/util/path_util.h"
#include "yb/util/result.h"
#include "yb/util/slice.h"
#include "yb/util/stack_trace_tracker.h"
#include "yb/util/status.h"
#include "yb/util/status_format.h"
#include "yb/util/status_log.h"
Expand Down Expand Up @@ -502,6 +503,7 @@ class PosixWritableFile : public WritableFile {
return STATUS_IO_ERROR(filename_, errno);
}

TrackStackTrace(StackTraceTrackingGroup::kWriteIO, written);
UnwrittenRemaining(&remaining_iov, written, &remaining_count);
total_written += written;
}
Expand Down Expand Up @@ -697,6 +699,7 @@ class PosixDirectIOWritableFile final : public PosixWritableFile {
}
return STATUS_IO_ERROR(filename_, errno);
}
TrackStackTrace(StackTraceTrackingGroup::kWriteIO, written);
next_write_offset_ += written;

UnwrittenRemaining(&remaining_iov, written, &remaining_blocks);
Expand Down Expand Up @@ -791,6 +794,7 @@ class PosixRWFile final : public RWFile {
// An error: return a non-ok status.
return STATUS_IO_ERROR(filename_, errno);
}
TrackStackTrace(StackTraceTrackingGroup::kReadIO, r);
Slice this_result(dst, r);
DCHECK_LE(this_result.size(), rem);
if (this_result.size() == 0) {
Expand All @@ -814,6 +818,7 @@ class PosixRWFile final : public RWFile {
int err = errno;
return STATUS_IO_ERROR(filename_, err);
}
TrackStackTrace(StackTraceTrackingGroup::kWriteIO, written);

if (PREDICT_FALSE(written != implicit_cast<ssize_t>(data.size()))) {
return STATUS(IOError,
Expand Down
4 changes: 4 additions & 0 deletions src/yb/util/file_system_posix.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "yb/util/errno.h"
#include "yb/util/malloc.h"
#include "yb/util/result.h"
#include "yb/util/stack_trace_tracker.h"
#include "yb/util/stats/iostats_context_imp.h"
#include "yb/util/test_kill.h"
#include "yb/util/thread_restrictions.h"
Expand Down Expand Up @@ -125,6 +126,7 @@ Status PosixSequentialFile::Read(size_t n, Slice* result, uint8_t* scratch) {
do {
r = fread_unlocked(scratch, 1, n, file_);
} while (r == 0 && ferror(file_) && errno == EINTR);
TrackStackTrace(StackTraceTrackingGroup::kReadIO, r);
*result = Slice(scratch, r);
if (r < n) {
if (feof(file_)) {
Expand Down Expand Up @@ -191,6 +193,7 @@ Status PosixRandomAccessFile::Read(uint64_t offset, size_t n, Slice* result,
}
break;
}
TrackStackTrace(StackTraceTrackingGroup::kReadIO, r);
ptr += r;
offset += r;
left -= r;
Expand Down Expand Up @@ -306,6 +309,7 @@ Status PosixWritableFile::Append(const Slice& data) {
}
return STATUS_IO_ERROR(filename_, errno);
}
yb::TrackStackTrace(yb::StackTraceTrackingGroup::kWriteIO, done);
left -= done;
src += done;
}
Expand Down
13 changes: 13 additions & 0 deletions src/yb/util/stack_trace.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <sys/syscall.h>
#endif

#include <algorithm>
#include <mutex>

#include "yb/gutil/casts.h"
Expand Down Expand Up @@ -379,6 +380,18 @@ uint64_t StackTrace::HashCode() const {
sizeof(frames_[0]) * num_frames_);
}

Result<StackTrace> StackTrace::MakeStackTrace(std::string_view frames) {
SCHECK_EQ(frames.size() % sizeof(void *), 0,
InvalidArgument, "frames should contain an array of pointers");
SCHECK(frames.size() / sizeof(void *) <= kMaxFrames, InvalidArgument, "too many frames");

StackTrace s;
s.num_frames_ = narrow_cast<int>(frames.size() / sizeof(void *));
std::memcpy(s.frames_, frames.data(), s.num_frames_ * sizeof(void *));
return s;
}


// ------------------------------------------------------------------------------------------------

Result<StackTrace> ThreadStack(ThreadIdForStack tid) {
Expand Down
6 changes: 6 additions & 0 deletions src/yb/util/stack_trace.h
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,12 @@ class StackTrace {
pointer_cast<const char*>(frames_ + num_frames_));
}

std::string_view as_string_view() const {
return std::string_view(pointer_cast<const char*>(frames_), sizeof(void *) * num_frames_);
}

static Result<StackTrace> MakeStackTrace(std::string_view frames);

private:
enum {
// The maximum number of stack frames to collect.
Expand Down
Loading

0 comments on commit a51a66d

Please sign in to comment.