Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Native] Add prometheus metric exporter using prometheus c++ library #21753

Closed

Conversation

jaystarshot
Copy link
Member

@jaystarshot jaystarshot commented Jan 22, 2024

Support Prometheus metrics using prometheus cpp library https://github.com/jupp0r/prometheus-cpp which is more stable

This is used in Uber, we are still not in production but we are in shadow mode

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

@jaystarshot jaystarshot changed the title Add prometheus metric exporter using prometheus c++ library [Native] Add prometheus metric exporter using prometheus c++ library Jan 22, 2024
@aditi-pandit
Copy link
Contributor

@jaystarshot : Thanks for this work. Its really good.

Curious if you ran some benchmarks (taking https://github.com/jupp0r/prometheus-cpp?tab=readme-ov-file#benchmarks as an example) with your code in Prestissimo ? We might be interested in how this holds up over long periods. Its possible that production servers run for upto a month before recycling the counters as well.

The other implementation was super simplistic but was very stream-lined for Ahana/IBM use-case.

@karteekmurthys : Would you and Jay be able to co-author an RFC which outlines details of the metrics, how the monitoring is configured, production characteristic. It would help us be more organized for this.

@jaystarshot
Copy link
Member Author

jaystarshot commented Jan 25, 2024

@aditi-pandit Thanks! But I have just started development this week and its not really in production yet. I am also new to cpp and it will take some time. My inital goal was to make it production-ready, deploy, test then update this PR here but I didn't want conflicting implementations in opensource, hence I reached out early.
I also think we need a way to control what metrics will be allowed like jmxExporter has in its yaml files.
Also I roughly think that this library can be used in a sidecar in future if we want a similar setup as jmxexporter. Not sure about the inter container protocol (i think jmxexporter talks to the presto jvm process in some unique protocol)

@jaystarshot
Copy link
Member Author

image
This is working fine on my local setup for count as of now

Summary:
1. Known limitation - Added unit test won't run in CI, this is because the prometheus flag is off in CI  since enabling it causes some issues in server tests

Test Plan:
Unit test + tested via deployment,
Metrics populated - https://ugrafana.uberinternal.comx

https://querybuilder.uberinternal.com/r/DjLbAMs1F/edit shadow tested

Reviewers: #ldap_velox-core, hitarth

Reviewed By: #ldap_velox-core, hitarth

Subscribers: hitarth, ptiwary, curt

JIRA Issues: PRESTO-6312

Differential Revision: https://code.uberinternal.com/D12756007
.Help("Summary for " + keyStr)
.Register(*registry);
auto& summary = summaryFamily.Add({{"cluster", cluster_}, {"node", node_}, {"host", host_}},
prometheus::Summary::Quantiles{{0.5, 0.05}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5 quantile is not the average, instead it is the median value. This is as good as reporting instantaneous value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
It has sum and count so avg can be computed as well

@jaystarshot
Copy link
Member Author

Closing since the other commit is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants