-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log request and duration for every RPC call #4036
Conversation
std::stringstream logMsg; | ||
logMsg << "RPC request processing duration = " | ||
<< std::chrono::duration_cast<std::chrono::microseconds>(duration) | ||
.count() | ||
<< " microseconds. request = " << request; | ||
if (std::chrono::duration_cast<std::chrono::seconds>(duration).count() >= | ||
10) | ||
{ | ||
JLOG(journal.error()) << logMsg.str(); | ||
} | ||
else if ( | ||
std::chrono::duration_cast<std::chrono::seconds>(duration).count() >= 1) | ||
{ | ||
JLOG(journal.warn()) << logMsg.str(); | ||
} | ||
else | ||
{ | ||
JLOG(journal.debug()) << logMsg.str(); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider trying this. This is off the top of the dome, so it might require some finessing.
auto const seconds = std::chrono::duration_cast<std::chrono::seconds>(duration).count();
using J = decltype(journal);
auto const level = (seconds >= 10) ? &J::error : (seconds >= 1) ? &J:warn : &J:debug;
JLOG(journal.*level()) << "RPC request processing duration = "
<< std::chrono::duration_cast<std::chrono::microseconds>(duration)
.count()
<< " microseconds. request = " << request;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const seconds =
std::chrono::duration_cast<std::chrono::seconds>(duration).count();
auto const level = (seconds >= 10)
? journal.error()
: (seconds >= 1) ? journal.warn() : journal.debug();
JLOG(level) << "RPC request processing duration = "
<< std::chrono::duration_cast<std::chrono::microseconds>(
duration)
.count()
<< " microseconds. request = " << request;
worked for me. Tested by logging a request that ran for > 1s, and got the appropriate warn() message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level
can be inlined by wrapping it in an extra set of ()
, not that that's necessarily a good idea.
auto start = std::chrono::system_clock::now(); | ||
RPC::doCommand(context, jr[jss::result]); | ||
auto end = std::chrono::system_clock::now(); | ||
logDuration(jv, end - start, m_journal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this pattern is used in more places, it might be worth investing in a helper class that uses RAII in the style of scope_guard
.
auto const seconds = | ||
std::chrono::duration_cast<std::chrono::seconds>(duration).count(); | ||
|
||
auto const level = (seconds >= 10) | ||
? journal.error() | ||
: (seconds >= 1) ? journal.warn() : journal.debug(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be significantly simplified:
using namespace std::chrono_literals;
auto const level = (duration >= 10s)
? journal.error()
: (duration >= 1s) ? journal.warn() : journal.debug();
26a8203
to
1b676f1
Compare
1b676f1
to
0160194
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@cjcobb23 authored this commit
High Level Overview of Change
For every RPC, the request itself and the duration it took to process is logged. This helps immensely with debugging performance issues. The log level is determined by how long the request took:
Less than 1 second: debug
1 - 10 seconds: warning
Greater than 10 seconds: error
Context of Change
We were noticing some very long running calls on the reporting servers, and we wanted to determine exactly what was taking so long, and how often this was happening. The stats in grafana are just averages, and so the logging gives a much finer granularity, as well as a way to reproduce the long running requests, since the entire request is also logged.