-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PrestoExchangeSource HttpClient histogram metric is blocking query #23338
Comments
@pramodsatya gathered some stats on total time spent at different locations in Presto CPP where RECORD_HISTOGRAM is called. The Q67 query was run for over an hour and
|
What if the implementation instead created a subprocess like the exposer mentioned https://github.com/jupp0r/prometheus-cpp. Resources can be controlled when starting the exposer. In uber we use the exposer as a separate process since we don't have to worry about reporting/maintainng the new http server |
I don't think this will help directly here^ since I could repro this internally. |
PrometheusStatsReporter updates worker stats synchronously. This is fine as most of the stats are updated through PeriodicTaskManager in a separate thread periodically without blocking queries.
But this specific metric is updated in the query path and is updated way too frequently.
The Q67 of SF10K TPCDS workload takes several hours to finish since we merged PrometheusStatsReporter. We are working on how to improve this in the stats reporter but we should also avoid reporting metrics in the query path if possible.
Request to review if
presto_cpp.http.client.presto_exchange_source.on_body_bytes
can be removed or made optional for tracking.We tested Q67 at SF10K by removing this metric and the query completes successfully.
We also have a workaround of updating histogram asynchronously in PrometheusStatsReporter. This solution was tested successfully as well against Q67
The text was updated successfully, but these errors were encountered: