feat(prometheus,shred-collector): metrics improvements + add metrics to shred collector #306

dnut · 2024-10-07T13:32:27Z

The aim of this PR is to improve the observability of the shred collector with prometheus metrics. While doing so, I also made various improvements to the prometheus library. I also made changes throughout the rest of the codebase to take advantage of these improvements.

Shred Collector

Added metrics to:

shred receiver
shred verifier
shred processor
repair service
repair peer provider

Prometheus

VariantCounter

I added a new metric type: VariantCounter: This separately count the occurrence of each variant within an enum or error set. This is a composite metric, similar to a histogram, except the "buckets" are discrete, unordered, and identified by name, instead of representing numeric ranges. Currently it's only used for counting errors but it can also be used to count enum variants, since enums and errors are conceptually almost the same thing.

I was struggling to decide if each error should be reported as a completely separate metric, or if the VariantCounter as a whole should be a single metric with many labels. I ended up decided to take the label approach because I think it's more flexible, but I'd appreciate feedback on this. See this commit for switching to labels: 0afc750

Initiailize structs containing many metrics

We have some structs like this:

const MyMetricsStruct = struct {
    event_count: *Counter,
    current_state: *Gauge(u64),
    my_hist: *Histogram,
};

Usually we define a custom init function for each of these structs. These init functions use comptime code to initialize all the fields. I moved all this logic into a single place. It is now in these methods in the Registry struct:

initStruct: Init an entire metrics struct that contains many metrics
initFields: Init metrics fields within a struct that has other data

Configuration:

Metric.metric_type This is provided within the definitions for Counter, Gauge, etc. to identify the type of metric so initStruct and initFields can work properly.
MyMetricsStruct.prefix optional: For namespacing of metrics. Within the struct containing metrics, this will string will be prefixed to any metric names
MyMetricsStruct.buckets required for histograms. If the struct contains any histograms, you need to specify the buckets either with a const or a function that takes the field name and returns its buckets.

Existing metrics tweaks

Counter: remove set method
Gauge: add min and max methods

0xNineteen

looks like a great improvement - a few nits - and a few comments that are more global questions of 'how we name/organize metrics' -- which is more than what this PR asked for but i think it has the chance to make a large improvement to consistency if we decide these things here

src/accountsdb/db.zig

src/prometheus/registry.zig

src/geyser/core.zig

src/prometheus/registry.zig

src/shred_collector/shred_verifier.zig

src/prometheus/variant_counter.zig

0xNineteen · 2024-10-09T10:29:47Z

can we add the comments on metrics to contributing.md?

call the struct Metrics if the context is already defined and theres no conflict in the file
if the Metrics struct is small enough, it can be defined within the struct, otherwise it can be defined directly underneath the struct

0xNineteen · 2024-10-09T18:42:56Z

also - should this PR include a corresponding grafana dashboard? doesnt seem to be commited

…to shred collector

… of separate metrics

… error messages

dnut · 2024-10-10T03:51:58Z

also - should this PR include a corresponding grafana dashboard? doesnt seem to be commited

I figured I would do this later when I actually need to gain insights from the metrics. For now I just wanted to expose them in the code. It would be great to also have a dashboard already, but I just didn't prioritize it at this point.

…eceiver

dnut · 2024-10-10T13:42:58Z

can we add the comments on metrics to contributing.md?

* call the struct `Metrics` if the context is already defined and theres no conflict in the file

* if the Metrics struct is small enough, it can be defined within the struct, otherwise it can be defined directly underneath the struct

9b17bfb

0xNineteen · 2024-10-10T14:15:37Z

nice - just commited a quick fit to the naming -- some cases updated the struct name but not the field (ie, stats: GossipMetrics -- also for AccountsDB reverted the changes to BankHashStatsMap (this struct isnt used for prometheus metrics)

0xNineteen

dnut requested review from 0xNineteen and yewman October 8, 2024 14:16

0xNineteen requested changes Oct 8, 2024

View reviewed changes

dnut force-pushed the dnut/metrics-shred-collector branch from 9958d04 to 75f0806 Compare October 9, 2024 21:02

dnut added 8 commits October 9, 2024 21:47

feat(prometheus,shred-collector): metrics improvements + add metrics …

fb829ce

…to shred collector

refactor: remove unused imports

1fbf474

feat(prometheus): use labels for variants in variant counter, instead…

22c2feb

… of separate metrics

refactor(prometheus): remove unused VariantIndex.get

9ce0bae

refactor(prometheus): use standard import style

b373a1d

refactor(geyser): use metrics prefix for reader and writer

f5c2e3c

docs(prometheus): fix typos/inaccuracies in VariantIndexer docs

a0d785d

feat(prometheus): more descriptive histogram bucket configuration and…

38c9d9c

… error messages

dnut force-pushed the dnut/metrics-shred-collector branch from 75f0806 to 38c9d9c Compare October 10, 2024 03:34

refactor(shred-collector): rename count variable to packet_count

b3e96e5

dnut added 4 commits October 9, 2024 23:55

refactor(shred-collector): more descriptive variable names in shred r…

e6312ca

…eceiver

refactor(shred-collector): remove todo about metrics

ca6e0de

refactor: rename Stats structs to Metrics

90689ef

docs(contributing): add metrics naming to style guide

9b17bfb

dnut force-pushed the dnut/metrics-shred-collector branch from df884bc to 9b17bfb Compare October 10, 2024 13:42

dnut requested a review from 0xNineteen October 10, 2024 13:44

fix: naming

0c561b4

0xNineteen approved these changes Oct 10, 2024

View reviewed changes

dnut merged commit 9c21b67 into main Oct 10, 2024
6 checks passed

0xNineteen deleted the dnut/metrics-shred-collector branch October 10, 2024 14:25

dnut added this to the Networking milestone Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(prometheus,shred-collector): metrics improvements + add metrics to shred collector #306

feat(prometheus,shred-collector): metrics improvements + add metrics to shred collector #306

dnut commented Oct 7, 2024 •

edited

Loading

0xNineteen left a comment

0xNineteen commented Oct 9, 2024

0xNineteen commented Oct 9, 2024

dnut commented Oct 10, 2024

dnut commented Oct 10, 2024

0xNineteen commented Oct 10, 2024

0xNineteen left a comment

feat(prometheus,shred-collector): metrics improvements + add metrics to shred collector #306

feat(prometheus,shred-collector): metrics improvements + add metrics to shred collector #306

Conversation

dnut commented Oct 7, 2024 • edited Loading

Shred Collector

Prometheus

VariantCounter

Initiailize structs containing many metrics

Existing metrics tweaks

0xNineteen left a comment

Choose a reason for hiding this comment

0xNineteen commented Oct 9, 2024

0xNineteen commented Oct 9, 2024

dnut commented Oct 10, 2024

dnut commented Oct 10, 2024

0xNineteen commented Oct 10, 2024

0xNineteen left a comment

Choose a reason for hiding this comment

dnut commented Oct 7, 2024 •

edited

Loading