Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add metrics for the events in the network output channels #5597

Merged
merged 15 commits into from
Apr 9, 2020

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Apr 9, 2020

There seems to be a high peak memory usage in the channels that send out network events to the rest of the subsystems.

This PR adds a metric for each possible event type in the channels.

I'm compiling Polkadot with this change to see how it goes.

@tomaka tomaka added the A0-please_review Pull request needs code review. label Apr 9, 2020
@tomaka
Copy link
Contributor Author

tomaka commented Apr 9, 2020

This seems to stay at 0 continuously on my local test (since we're polling every 5 seconds).
Does it make sense to use a Histogram here as well, like for the notifications queue?

Copy link
Contributor

@twittner twittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This multi-staged setup is rather complicated. The code would benefit from an spmc channel I think but in the absence of it this must do.

@arkpar
Copy link
Member

arkpar commented Apr 9, 2020

Looks good in general. The code seems to imply that the channels are always asynchronous. Is it absolutely guaranteed that there won't be a recursive deadlock on a mutex? E.g. unbounded_send that calls another unbounded_send?

@tomaka
Copy link
Contributor Author

tomaka commented Apr 9, 2020

The code seems to imply that the channels are always asynchronous. Is it absolutely guaranteed that there won't be a recursive deadlock on a mutex?

It's impossible because any order sent to the network also goes through an unbounded channel, but I changed the implementation to only lock at the end and make this more fool-proof.

Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried by the impact the single serialization point (lock) might have.

We don't depend on these metrics being consistent among each other. Recording metrics for a single event does not need to be atomic. Why not add an OutEventsMetrics struct in out_events.rs with Prometheus Gauges and Counters that would be registered through a registry passed to OutChannels::new. Using atomics instead of locks should reduce the contention and is (in this case) wait-free.

This multi-staged setup is rather complicated. The code would benefit from an spmc channel I think but in the absence of it this must do.

I agree with the complexity this pull request introduces. I have not done a lot of research. Is there no future-aware spmc or mpmc broadcast channel implementation out there? Wouldn't https://docs.rs/multiqueue/0.3.2/multiqueue/index.html do?

/// Must be associated with an [`OutChannels`] before anything can be sent on it
///
/// > **Note**: Contrary to regular channels, this `Sender` is purposefully designed to not
/// > implement the `Clone` trait. If someone adds a `#[derive(Clone)]` below, it is **wrong**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in the note why you don't want multiple producers?

client/network/src/service/out_events.rs Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
@tomaka
Copy link
Contributor Author

tomaka commented Apr 9, 2020

I totally disagree on the complexity. How is it complicated? You create a channel, then put the sender in a collection and send messages to all the senders in the collection at once.
It is exactly the same as what is being done in service.rs before this PR, except in a separate module.

@tomaka
Copy link
Contributor Author

tomaka commented Apr 9, 2020

The code would benefit from an spmc channel I think but in the absence of it this must do.

Is there no future-aware spmc or mpmc broadcast channel implementation out there? Wouldn't https://docs.rs/multiqueue/0.3.2/multiqueue/index.html do?

That would mean tackling paritytech/polkadot-sdk#557 immediately, which is a different approach than what this PR does.

@tomaka
Copy link
Contributor Author

tomaka commented Apr 9, 2020

All right, I changed the PR to directly use Prometheus metrics. There is now a single Mutex that is locked only when pushing the sender in the collection and to clone an Arc.

@tomaka tomaka requested a review from mxinden April 9, 2020 15:56
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service.rs Outdated Show resolved Hide resolved
client/network/src/service.rs Outdated Show resolved Hide resolved
client/network/src/service.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
client/network/src/service/out_events.rs Outdated Show resolved Hide resolved
"sub_libp2p_out_events_num_channels",
"Number of internal active channels that broadcast network events",
)?, registry)?,
out_events_in: register(CounterVec::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not combine out_events_in and out_events_out into a single metric with a label direction and label values in and out?

)?, registry)?,
out_events_in: register(CounterVec::new(
Opts::new(
"sub_libp2p_out_events_notifications_opened_count",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this metric name is outdated, right? Shouldnt it be something like

Suggested change
"sub_libp2p_out_events_notifications_opened_count",
"sub_libp2p_out_events_notifications_count",

)?, registry)?,
out_events_out: register(CounterVec::new(
Opts::new(
"sub_libp2p_out_events_notifications_opened_count",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric name seems outdated, see same name above.

out_events_out: register(CounterVec::new(
Opts::new(
"sub_libp2p_out_events_notifications_opened_count",
"Number of events pending in the channels that broadcast network events"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Number of events pending in the channels that broadcast network events"
"Number of events received from the channels that broadcast network events"


struct Metrics {
// This list is ordered alphabetically
out_events_num_channels: Gauge<U64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why prefix each of the variable names with out_events?

client/network/service/out_events: Apply remaining suggestions
Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me!

@tomaka tomaka added A8-mergeoncegreen and removed A0-please_review Pull request needs code review. labels Apr 9, 2020
@tomaka tomaka merged commit 7f12a9f into paritytech:master Apr 9, 2020
@tomaka tomaka deleted the metrics-net-out-channels branch April 9, 2020 18:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants