-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging (2/2): Replace DelayDetector #9543
Conversation
The `lib.rs` file accumulated structs and functionality from different areas. Split the file into several smaller files.
…ration (#9539) * remove force_* features and move runtime selection to runtime configuration There are a couple of reasons, but the most major one is that these `force_*` fetures make it impossible to run tools with `--all-features` enabled on macOS (where use of wasmer0 would be forced somewhat arbitrarily.) That said, these options don't make sense, in the world of limited replayability, and with runtime configuration there is also now a fairly straightforward approach to forcing the use of a different runtime than the "default" (just chnge some yamls.) As an upside this also forced my hand to go and fix the Config::test nonsense, a little bit, hence Fixes #8202 and tests will now actually exercise the system in a configuration that is going to run in production most of the time! Yay! * Replace !0 with PROTOCOL_VERSION * Remove the wasmtime feature from runtime-params-estimator * Fix test expectations
* debug: Add #[perf] to more actors This allows to observe more details about actor utilization when compiling with `--features=performance_stats`. Running some loadtest with this reveals that `ClientActor` and `ShardsManagerActor` are the busiest in some cases, which would not be possible to observe without the instrumentation added in this commit. * cargo fmt
Chunks can be MBs in size. We should not force a read from DB when it's not cached already. Also alters the `chunk_exists` method to check the cache first. An `exists` call on the DB is still more expensive than looking up an `Arc` in the in-memory cache.
* Logging: Deprecate --verbose --verbose is not a bool, but a String argument, which is confusing. * Logging: Deprecate --verbose --verbose is not a bool, but a String argument, which is confusing. * Logging: Deprecate --verbose --verbose is not a bool, but a String argument, which is confusing.
* Logging: New delay detector * Logging: New delay detector * Logging: New delay detector * Logging: New delay detector * Rename to SpanDurationLogger * Use Durations instead of nanoseconds * refactor: Tracing (#9544) The `lib.rs` file accumulated structs and functionality from different areas. Split the file into several smaller files. * Logging: New delay detector * Rename to SpanDurationLogger * Use Durations instead of nanoseconds * Use Durations instead of nanoseconds * refactor: Tracing (#9544) The `lib.rs` file accumulated structs and functionality from different areas. Split the file into several smaller files. * Logging: New delay detector * refactor: Tracing (#9544) The `lib.rs` file accumulated structs and functionality from different areas. Split the file into several smaller files. * Use Durations instead of nanoseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍 Nothing better than deleting old code while also improving what it was doing :D
I left one comment regarding the ClientActor::wrap
function. Feel free to ignore or address in a follow up in the undefined future.
@@ -263,8 +263,7 @@ impl ClientActor { | |||
) -> Res { | |||
let (_span, msg) = handler_debug_span!(target: "client", msg, msg_type); | |||
self.check_triggers(ctx); | |||
let _d = | |||
delay_detector::DelayDetector::new(|| format!("NetworkClientMessage {:?}", msg).into()); | |||
let _span_inner = tracing::debug_span!(target: "client", "NetworkClientMessage").entered(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, without the actual message printed, it becomes rather hard to figure out what is slow if this triggers.
Probably not a big problem. But if you have an idea how we can improve it, ideally without eagerly debug printing every time, it would be nice.
(If #[track_caller]
worked with tracing spans and report the callers location instead of this one inside the wrap
, I think that would be perfect. I don't know for sure but I think it doesn't work this way, however.)
Also, I think you can remove the std::fmt::Debug
requirement on the function if you don't use debug formatting anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is slow if this triggers
That's when Grafana Tempo should be useful. It (should and will eventually) ingest the tracing information, and will let us search for long running "NetworkClientMessage" spans.
Continues #9540 Deletes the `DelayDetector` class and replaces the uses of it with spans. --------- Co-authored-by: Anton Puhach <anton@near.org> Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me> Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
Continues #9540 Deletes the `DelayDetector` class and replaces the uses of it with spans. --------- Co-authored-by: Anton Puhach <anton@near.org> Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me> Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
Continues #9540 Deletes the `DelayDetector` class and replaces the uses of it with spans. --------- Co-authored-by: Anton Puhach <anton@near.org> Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me> Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
Continues #9540
Deletes the
DelayDetector
class and replaces the uses of it with spans.