Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: prometheus metrics for RPC methods #4607

Merged
merged 7 commits into from
Aug 2, 2024
Merged

feat: prometheus metrics for RPC methods #4607

merged 7 commits into from
Aug 2, 2024

Conversation

lemmih
Copy link
Contributor

@lemmih lemmih commented Aug 1, 2024

Summary of changes

Changes introduced in this pull request:

  • Expose how many times each RPC method has been called, how long the execution took, and how many failed.

The new metrics look like this:

rpc_processing_time_sum{method="Filecoin.NodeStatus"} 0.006975156
rpc_processing_time_count{method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.01",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.02",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.04",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.08",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.16",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.32",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.64",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="1.28",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="2.56",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="5.12",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="+Inf",method="Filecoin.NodeStatus"} 9
rpc_method_failure{method="Filecoin.WeDontSupporThisCall"} 1

Reference issue to close (if applicable)

Closes #3767

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

@lemmih lemmih marked this pull request as ready for review August 1, 2024 14:10
@lemmih lemmih requested a review from a team as a code owner August 1, 2024 14:10
@lemmih lemmih requested review from LesnyRumcajs and sudo-shashank and removed request for a team August 1, 2024 14:10
@LesnyRumcajs
Copy link
Member

@lemmih is it WIP or ready to review?

@lemmih lemmih changed the title [WIP] feat: prometheus metrics for RPC methods feat: prometheus metrics for RPC methods Aug 1, 2024
Copy link
Member

@LesnyRumcajs LesnyRumcajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rpc_processing_time_count{method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.01",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.02",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.04",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.08",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.16",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.32",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.64",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="1.28",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="2.56",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="5.12",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="+Inf",method="Filecoin.NodeStatus"} 9

Can we avoid so many records?

src/metrics/mod.rs Outdated Show resolved Hide resolved
src/metrics/mod.rs Outdated Show resolved Hide resolved
@lemmih
Copy link
Contributor Author

lemmih commented Aug 2, 2024

rpc_processing_time_count{method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.01",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.02",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.04",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.08",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.16",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.32",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="0.64",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="1.28",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="2.56",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="5.12",method="Filecoin.NodeStatus"} 9
rpc_processing_time_bucket{le="+Inf",method="Filecoin.NodeStatus"} 9

Can we avoid so many records?

We could track just p50, p90 and p99. But we would still have hundreds of metrics. I say we wait and see if the large number of metrics causes any problems / unexpected costs.

@lemmih lemmih added this pull request to the merge queue Aug 2, 2024
Merged via the queue into main with commit 9d90d9e Aug 2, 2024
30 checks passed
@lemmih lemmih deleted the lemmih/rpc-metrics branch August 2, 2024 09:01
// Histogram with 10 buckets starting from 0.01s going to 5.12s, each bucket twice as big as the last.
Histogram::new(exponential_buckets(0.01, 2.0, 10))
});
crate::metrics::default_registry().register(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm curious. Is there a particular reason we do not use DEFAULT_REGISTRY.write() here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default_registry() hides the detail that the registry is under an RwLock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I see that all other metrics use DEFAULT_REGISTRY.write() in this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They likely shouldn't. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Prometheus metrics for RPC endpoints
4 participants