Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(prometheus,shred-collector): metrics improvements + add metrics to shred collector #306

Merged
merged 14 commits into from
Oct 10, 2024

Conversation

dnut
Copy link
Contributor

@dnut dnut commented Oct 7, 2024

The aim of this PR is to improve the observability of the shred collector with prometheus metrics. While doing so, I also made various improvements to the prometheus library. I also made changes throughout the rest of the codebase to take advantage of these improvements.

Shred Collector

Added metrics to:

  • shred receiver
  • shred verifier
  • shred processor
  • repair service
  • repair peer provider

Prometheus

VariantCounter

I added a new metric type: VariantCounter: This separately count the occurrence of each variant within an enum or error set. This is a composite metric, similar to a histogram, except the "buckets" are discrete, unordered, and identified by name, instead of representing numeric ranges. Currently it's only used for counting errors but it can also be used to count enum variants, since enums and errors are conceptually almost the same thing.

I was struggling to decide if each error should be reported as a completely separate metric, or if the VariantCounter as a whole should be a single metric with many labels. I ended up decided to take the label approach because I think it's more flexible, but I'd appreciate feedback on this. See this commit for switching to labels: 0afc750

Initiailize structs containing many metrics

We have some structs like this:

const MyMetricsStruct = struct {
    event_count: *Counter,
    current_state: *Gauge(u64),
    my_hist: *Histogram,
};

Usually we define a custom init function for each of these structs. These init functions use comptime code to initialize all the fields. I moved all this logic into a single place. It is now in these methods in the Registry struct:

  • initStruct: Init an entire metrics struct that contains many metrics
  • initFields: Init metrics fields within a struct that has other data

Configuration:

  • Metric.metric_type This is provided within the definitions for Counter, Gauge, etc. to identify the type of metric so initStruct and initFields can work properly.
  • MyMetricsStruct.prefix optional: For namespacing of metrics. Within the struct containing metrics, this will string will be prefixed to any metric names
  • MyMetricsStruct.buckets required for histograms. If the struct contains any histograms, you need to specify the buckets either with a const or a function that takes the field name and returns its buckets.

Existing metrics tweaks

  • Counter: remove set method
  • Gauge: add min and max methods

@dnut dnut requested review from 0xNineteen and yewman October 8, 2024 14:16
Copy link
Contributor

@0xNineteen 0xNineteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a great improvement - a few nits - and a few comments that are more global questions of 'how we name/organize metrics' -- which is more than what this PR asked for but i think it has the chance to make a large improvement to consistency if we decide these things here

@0xNineteen
Copy link
Contributor

can we add the comments on metrics to contributing.md?

  • call the struct Metrics if the context is already defined and theres no conflict in the file
  • if the Metrics struct is small enough, it can be defined within the struct, otherwise it can be defined directly underneath the struct

@0xNineteen
Copy link
Contributor

also - should this PR include a corresponding grafana dashboard? doesnt seem to be commited

@dnut dnut force-pushed the dnut/metrics-shred-collector branch from 9958d04 to 75f0806 Compare October 9, 2024 21:02
@dnut dnut force-pushed the dnut/metrics-shred-collector branch from 75f0806 to 38c9d9c Compare October 10, 2024 03:34
@dnut
Copy link
Contributor Author

dnut commented Oct 10, 2024

also - should this PR include a corresponding grafana dashboard? doesnt seem to be commited

I figured I would do this later when I actually need to gain insights from the metrics. For now I just wanted to expose them in the code. It would be great to also have a dashboard already, but I just didn't prioritize it at this point.

@dnut dnut force-pushed the dnut/metrics-shred-collector branch from df884bc to 9b17bfb Compare October 10, 2024 13:42
@dnut
Copy link
Contributor Author

dnut commented Oct 10, 2024

can we add the comments on metrics to contributing.md?

* call the struct `Metrics` if the context is already defined and theres no conflict in the file

* if the Metrics struct is small enough, it can be defined within the struct, otherwise it can be defined directly underneath the struct

9b17bfb

@dnut dnut requested a review from 0xNineteen October 10, 2024 13:44
@0xNineteen
Copy link
Contributor

nice - just commited a quick fit to the naming -- some cases updated the struct name but not the field (ie, stats: GossipMetrics -- also for AccountsDB reverted the changes to BankHashStatsMap (this struct isnt used for prometheus metrics)

Copy link
Contributor

@0xNineteen 0xNineteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@dnut dnut merged commit 9c21b67 into main Oct 10, 2024
6 checks passed
@0xNineteen 0xNineteen deleted the dnut/metrics-shred-collector branch October 10, 2024 14:25
@dnut dnut added this to the Networking milestone Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants