PrestoExchangeSource HttpClient histogram metric is blocking query #23338

karteekmurthys · 2024-07-30T23:22:02Z

PrometheusStatsReporter updates worker stats synchronously. This is fine as most of the stats are updated through PeriodicTaskManager in a separate thread periodically without blocking queries.

But this specific metric is updated in the query path and is updated way too frequently.

        RECORD_HISTOGRAM_METRIC_VALUE(
            kCounterHttpClientPrestoExchangeOnBodyBytes, bufferBytes);
      });

The Q67 of SF10K TPCDS workload takes several hours to finish since we merged PrometheusStatsReporter. We are working on how to improve this in the stats reporter but we should also avoid reporting metrics in the query path if possible.

Request to review if presto_cpp.http.client.presto_exchange_source.on_body_bytes can be removed or made optional for tracking.

We tested Q67 at SF10K by removing this metric and the query completes successfully.

We also have a workaround of updating histogram asynchronously in PrometheusStatsReporter. This solution was tested successfully as well against Q67

The text was updated successfully, but these errors were encountered:

karteekmurthys · 2024-07-30T23:24:02Z

cc: @majetideepak @amitkdutta

karteekmurthys · 2024-07-31T01:31:03Z

@pramodsatya gathered some stats on total time spent at different locations in Presto CPP where RECORD_HISTOGRAM is called. The Q67 query was run for over an hour and PrestoExchangeSource.cpp:117 took total of 3256278906 us in an hour of query run.

reg/add   function                               line num
Add       addRootPool                            Memory.cpp:248                          76
          clearThread                            Driver.h:153                        166326
          operator()                             ExchangeClient.cpp:163               87166
                                                 ExchangeClient.cpp:167               61760
                                                 ExchangeClient.cpp:170               31759
                                                 FileHandle.cpp:64                       45
                                                 PrestoExchangeSource.cpp:117    3256278906
          runInternal                            Driver.cpp:522                      348599
          updatePrestoExchangeSourceMemoryStats  PeriodicTaskManager.cpp:268          18196
          ~MemoryPoolImpl                        MemoryPool.cpp:462                      45
          ~ScopedArbitration                     SharedArbitrator.cpp:880              1110
                                                 SharedArbitrator.cpp:910                12

jaystarshot · 2024-07-31T19:52:21Z

What if the implementation instead created a subprocess like the exposer mentioned https://github.com/jupp0r/prometheus-cpp. Resources can be controlled when starting the exposer. In uber we use the exposer as a separate process since we don't have to worry about reporting/maintainng the new http server

jaystarshot · 2024-07-31T21:31:58Z

I don't think this will help directly here^ since I could repro this internally.

karteekmurthys added the bug label Jul 30, 2024

karteekmurthys self-assigned this Jul 30, 2024

karteekmurthys mentioned this issue Aug 1, 2024

[native] Disable recording http.client.presto_exchange_source.on_body_bytes metric #23357

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrestoExchangeSource HttpClient histogram metric is blocking query #23338

PrestoExchangeSource HttpClient histogram metric is blocking query #23338

karteekmurthys commented Jul 30, 2024 •

edited

Loading

karteekmurthys commented Jul 30, 2024

karteekmurthys commented Jul 31, 2024 •

edited

Loading

jaystarshot commented Jul 31, 2024 •

edited

Loading

jaystarshot commented Jul 31, 2024

PrestoExchangeSource HttpClient histogram metric is blocking query #23338

PrestoExchangeSource HttpClient histogram metric is blocking query #23338

Comments

karteekmurthys commented Jul 30, 2024 • edited Loading

karteekmurthys commented Jul 30, 2024

karteekmurthys commented Jul 31, 2024 • edited Loading

jaystarshot commented Jul 31, 2024 • edited Loading

jaystarshot commented Jul 31, 2024

karteekmurthys commented Jul 30, 2024 •

edited

Loading

karteekmurthys commented Jul 31, 2024 •

edited

Loading

jaystarshot commented Jul 31, 2024 •

edited

Loading