Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Speed up logging once again #9981

Merged
merged 12 commits into from
Oct 21, 2021

Conversation

koute
Copy link
Contributor

@koute koute commented Oct 8, 2021

This PR tries to speed up our logging machinery by introducing a buffered semi-asynchronous stderr writer.

(In our profiling on one of the validators we've seen ~5% of total time being spent writing to stderr.)

For more details please read the doc comment included in the stderr_writer.rs.

As far as benchmarks go, this is pretty hard to benchmark; depending on how the benchmark's set up it can either show zero improvement, or a huge improvement. Nevertheless in a microbenchmark which tries to replicate similar conditions to what we're running on our validator nodes (which is: executable is launched as a systemd service, and then just keeps on puking logs into the systemd journal) this PR cuts down the total execution time of the microbenchmark from ~8s to ~2s.

(While I'm at it I've also enabled the parking_lot feature for the tracing-subscriber crate which measurably lowers the overhead in case where we're not printing anything; there's still more work that can be done for that case, but that will come in another PR.)

@koute koute requested review from gilescope and bkchr October 8, 2021 12:43
@koute koute added A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. labels Oct 8, 2021
@bkchr
Copy link
Member

bkchr commented Oct 9, 2021

@koute
Copy link
Contributor Author

koute commented Oct 11, 2021

Why not use this: https://docs.rs/tracing-appender/0.1.2/tracing_appender/non_blocking/index.html ?

That's a great question!

  1. It's slower. On the same benchmark where my logger takes ~2s the one from tracing-appender takes ~5s. (Assuming I turn on non-lossy mode so that it doesn't drop any logs; if I let it run in lossy mode so that it can drop logs it takes ~2.6s, so still slightly slower than my logger which doesn't drop anything.)
  2. It applies backpressure only based on the number of lines printed, and not due to their length. And we do have places in our code where we essentially log untrusted user input verbatim, so while unlikely in theory it could maybe be a potential vector for a DoS attack.
  3. I'm not a total fan of the shutdown semantics; when that logger is shutdown (that is: the guard is dropped) it stops working instead of just printing everything synchronously like mine does. Probably doesn't matter in practice, but still.

@koute koute force-pushed the master_faster_logging branch from 3e3bf2d to a07611c Compare October 11, 2021 05:35
@koute
Copy link
Contributor Author

koute commented Oct 11, 2021

(Force-pushed to resolve conflicts.)

@koute koute added the B0-silent Changes should not be mentioned in any release notes label Oct 11, 2021
@gilescope
Copy link
Contributor

A test would be nice to prove that while spinning up n threads all logging that all generated log messages did get logged and none fell through any cracks.

@koute
Copy link
Contributor Author

koute commented Oct 11, 2021

A test would be nice to prove that while spinning up n threads all logging that all generated log messages did get logged and none fell through any cracks.

I've added an integration test which does just that.

@gilescope
Copy link
Contributor

Nice test. Good that nothing is going astray.

client/tracing/tests/logger.rs Outdated Show resolved Hide resolved
client/tracing/tests/logger.rs Outdated Show resolved Hide resolved
client/tracing/tests/logger.rs Outdated Show resolved Hide resolved
Comment on lines +164 to +174
if !ENABLE_ASYNC_LOGGING.load(Ordering::Relaxed) {
sync_flush_on_drop = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the logger was already deinitialized but something still wants to log something for some reason then instead of appending that to a buffer which will never be flushed (because the background thread which does the flushing is most likely already dead at this point) it will instead just log it synchronously.

Basically, a just-in-case handling of a potential corner case.

pub struct StderrWriter {
buffer: Option<parking_lot::lock_api::MutexGuard<'static, parking_lot::RawMutex, Vec<u8>>>,
sync_flush_on_drop: bool,
original_len: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is that not 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer is not always immediately flushed; as long as the amount of data in the buffer is less than ASYNC_FLUSH_THRESHOLD then it will only be flushed every AUTOFLUSH_EVERY seconds (50ms in this case), e.g. if you call log!("Hello World!") two times in rapid succession then on the second call the original_len will be non-zero, since the buffer will still contain the data from the first call which wasn't yet flushed (because neither the amount of data exceeded the ASYNC_FLUSH_THRESHOLD nor the amount of time which has passed since the last flush exceeded AUTOFLUSH_EVERY, so the thread which does the flushing is still sleeping).

client/tracing/src/logging/stderr_writer.rs Outdated Show resolved Hide resolved
Comment on lines +130 to +134
extern "C" fn on_exit() {
ENABLE_ASYNC_LOGGING.store(false, Ordering::SeqCst);

if let Some(buffer) = BUFFER.try_lock_for(ON_EXIT_FLUSH_TIMEOUT) {
flush_logs(buffer);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Couldn't the Drop implementation just set ENABLE_ASYNC_LOGGING=false and the thread finishes when it sees this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some way to detect that we're exiting, and yes, we could technically do that through Drop, but that would require us to actually make sure it gets dropped. We can't make that a static as Rust doesn't actually call drop on anything static. We could do it through the .fini or .dtors linker sections, but that's unportable (and possibly nightly-only; haven't used those in a long time so I don't remember if it's possible to use them on stable Rust already). We could use a thread local and check whenever we're on the thread which initialized the logger and then run drop (since TLS destructors do get run), but that's kind of a hack. We could have the logger initializer return a handle which would control the lifetime, but that's inconvenient and potentially error prone in this case.

And depending on which drop variant we'd go for the logger might not be flushed if libc::exit is called, which, yes, probably doesn't normally happen, but I think it's a good idea to cover all of your bases.

So unless I'm missing something what's left is registering a handler through the standard atexit.

(Of course this still doesn't cover a few cases, e.g. SIGSEGV from a segmentation fault and SIGABRT from abort being called, but I'm not sure if we want to go that far as that's a little more tricky since it involves signal handling.)

As far as why we're doing the try_lock_for here - on exit the threads are just going to be forcibly killed, so we want to at least give the thread here some time to flush itself, but to not indefinitely hang if for some reason it can't do it (because e.g. the other side of stderr stopped reading).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But StderrWriter is dropped at some point that could trigger the logic I described above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean the MakeStderrWriter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If drop would actually be called for MakeStderrWriter then yeah, but it isn't from what I can see. (Under the hood the whole structure which owns the MakeStderrWriter is assigned to a static mut and as any other static in Rust it's never dropped.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Makes sense.

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Good work 👍

The only problem I see is around aborts, but they are very rarely anyway.

One last nitpick and then it is good to be merged.

Comment on lines +130 to +134
extern "C" fn on_exit() {
ENABLE_ASYNC_LOGGING.store(false, Ordering::SeqCst);

if let Some(buffer) = BUFFER.try_lock_for(ON_EXIT_FLUSH_TIMEOUT) {
flush_logs(buffer);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Makes sense.

client/tracing/src/logging/stderr_writer.rs Outdated Show resolved Hide resolved
client/tracing/src/logging/stderr_writer.rs Outdated Show resolved Hide resolved
@koute
Copy link
Contributor Author

koute commented Oct 21, 2021

Hmmm.... it looks like the continuous-integration/gitlab-node-bench-regression-guard benchmark-test for importing blocks is failing, which is... interesting, as this definitely shouldn't affect it. Apparently it's almost three times slower now, which makes no sense at all. (For one, the benchmark shouldn't be logging that much to begin with, and if it does then it'd be more of a logger + environment benchmark than a block import one; and also even if it does log a lot then even in the worst case scenario this should not be slower than what we had before.)

The only reason I can think of as to why it could have regressed is that I've also bumped tracing-subscriber to the newest version, and maybe that has some extra overhead when we're not logging?

I'll try to revert the tracing-subscriber bump (since we're going to bump it during #10072 anyway) and/or run the benches locally to see what's up.

@bkchr
Copy link
Member

bkchr commented Oct 21, 2021

@koute no don't do this. This test is failing everywhere, it depends on the machine it is executed on...

@koute
Copy link
Contributor Author

koute commented Oct 21, 2021

@koute no don't do this. This test is failing everywhere, it depends on the machine it is executed on...

Ah, okay, that's good to know! I was just about to run it locally. (:

@bkchr
Copy link
Member

bkchr commented Oct 21, 2021

@koute merging master now should fix the benchmark test.

@koute koute force-pushed the master_faster_logging branch from 62981af to 949d1e1 Compare October 21, 2021 13:41
@koute
Copy link
Contributor Author

koute commented Oct 21, 2021

Okay, rebased on the newest master with no extra changes.

@bkchr
Copy link
Member

bkchr commented Oct 21, 2021

@koute in the future please you merge commits. We squash merge to master anyway and that makes it easier for reviewers.

@koute
Copy link
Contributor Author

koute commented Oct 21, 2021

@koute in the future please you merge commits. We squash merge to master anyway and that makes it easier for reviewers.

My apologies; force of habit. I'll be sure to just use merge commits when possible in the future.

@koute
Copy link
Contributor Author

koute commented Oct 21, 2021

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 632b323 into paritytech:master Oct 21, 2021
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
* Update `tracing`-related dependencies

* Enable `parking_lot` feature in `tracing-subscriber`

* Add an asynchronous stderr logger

* Make clippy happy

* Add an integration test for the logger

* Refactor `test_logger_filters`'s subprocess machinery into a separate function

* Use a child process instead of hooking into stderr for the test

* Add a doc comment for `MakeStderrWriter`

* Move the initialization into the `MakeStderrWriter`'s constructor

* Add an extra test case to trigger the logger's emergency flush mechanism

* Use the buffer's mutex for asynchronous flushes

* Remove vestigial `nix` dependency from one of the previous commits
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
* Update `tracing`-related dependencies

* Enable `parking_lot` feature in `tracing-subscriber`

* Add an asynchronous stderr logger

* Make clippy happy

* Add an integration test for the logger

* Refactor `test_logger_filters`'s subprocess machinery into a separate function

* Use a child process instead of hooking into stderr for the test

* Add a doc comment for `MakeStderrWriter`

* Move the initialization into the `MakeStderrWriter`'s constructor

* Add an extra test case to trigger the logger's emergency flush mechanism

* Use the buffer's mutex for asynchronous flushes

* Remove vestigial `nix` dependency from one of the previous commits
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants