Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log request and duration for every RPC call #4036

Closed
wants to merge 1 commit into from

Conversation

natenichols
Copy link
Contributor

@natenichols natenichols commented Dec 14, 2021

@cjcobb23 authored this commit

High Level Overview of Change

For every RPC, the request itself and the duration it took to process is logged. This helps immensely with debugging performance issues. The log level is determined by how long the request took:

Less than 1 second: debug
1 - 10 seconds: warning
Greater than 10 seconds: error

Context of Change

We were noticing some very long running calls on the reporting servers, and we wanted to determine exactly what was taking so long, and how often this was happening. The stats in grafana are just averages, and so the logging gives a much finer granularity, as well as a way to reproduce the long running requests, since the entire request is also logged.

@cjcobb23 cjcobb23 assigned pwang200 and ximinez and unassigned cjcobb23, pwang200 and ximinez Dec 14, 2021
@ximinez ximinez self-requested a review December 14, 2021 21:59
@ximinez ximinez self-assigned this Dec 14, 2021
Comment on lines 387 to 398
std::stringstream logMsg;
logMsg << "RPC request processing duration = "
<< std::chrono::duration_cast<std::chrono::microseconds>(duration)
.count()
<< " microseconds. request = " << request;
if (std::chrono::duration_cast<std::chrono::seconds>(duration).count() >=
10)
{
JLOG(journal.error()) << logMsg.str();
}
else if (
std::chrono::duration_cast<std::chrono::seconds>(duration).count() >= 1)
{
JLOG(journal.warn()) << logMsg.str();
}
else
{
JLOG(journal.debug()) << logMsg.str();
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider trying this. This is off the top of the dome, so it might require some finessing.

auto const seconds = std::chrono::duration_cast<std::chrono::seconds>(duration).count();
using J = decltype(journal);
auto const level = (seconds >= 10) ? &J::error : (seconds >= 1) ? &J:warn : &J:debug;
JLOG(journal.*level()) << "RPC request processing duration = "
           << std::chrono::duration_cast<std::chrono::microseconds>(duration)
                  .count()
           << " microseconds. request = " << request;

Copy link
Contributor Author

@natenichols natenichols Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    auto const seconds =
        std::chrono::duration_cast<std::chrono::seconds>(duration).count();

    auto const level = (seconds >= 10)
        ? journal.error()
        : (seconds >= 1) ? journal.warn() : journal.debug();

    JLOG(level) << "RPC request processing duration = "
                << std::chrono::duration_cast<std::chrono::microseconds>(
                       duration)
                       .count()
                << " microseconds. request = " << request;

worked for me. Tested by logging a request that ran for > 1s, and got the appropriate warn() message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level can be inlined by wrapping it in an extra set of (), not that that's necessarily a good idea.

Comment on lines +489 to +483
auto start = std::chrono::system_clock::now();
RPC::doCommand(context, jr[jss::result]);
auto end = std::chrono::system_clock::now();
logDuration(jv, end - start, m_journal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pattern is used in more places, it might be worth investing in a helper class that uses RAII in the style of scope_guard.

@natenichols natenichols added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Dec 15, 2021
Comment on lines 387 to 392
auto const seconds =
std::chrono::duration_cast<std::chrono::seconds>(duration).count();

auto const level = (seconds >= 10)
? journal.error()
: (seconds >= 1) ? journal.warn() : journal.debug();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be significantly simplified:

    using namespace std::chrono_literals;
    auto const level = (duration >= 10s)
        ? journal.error()
        : (duration >= 1s) ? journal.warn() : journal.debug();

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This was referenced Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants