Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor: Splitting summary metrics into timeseries. #7134

Open
dashpole opened this issue Jan 11, 2022 · 7 comments
Open

Processor: Splitting summary metrics into timeseries. #7134

dashpole opened this issue Jan 11, 2022 · 7 comments
Labels
never stale Issues marked with this label will be never staled and automatically removed

Comments

@dashpole
Copy link
Contributor

dashpole commented Jan 11, 2022

Is your feature request related to a problem? Please describe.

In the googlecloud exporter, we currently split a summary metric into multiple time series before sending it, since Google Cloud Monitoring does not support summaries:

  • a count timeseries
  • a sum timeseries
  • one quantile timeseries for each quantile

An unfortunate consequence of this is that one summary data point can become a large number of timeseries. If an incoming batch of metrics has 200 data points, but contains a summary, it will exceed the 200 timeseries threshold for a batch we can send to cloud monitoring. As a result, we need to perform batching again after splitting summaries in the exporter to ensure we do not exceed 200 timeseries in a request to cloud monitoring.

Describe the solution you'd like

The base conclusion from our issue above is that we need to split summary metrics before we perform batching. One way to accomplish this would be to use a processor to split summary metrics, and insert it before the batching processor.

As it turns out, splitting summary metrics is very common in exporters other than googlecloud:

Most implementations split metrics by creating:

  • a counter timeseries for sum, suffixed with _sum
  • a counter timeseries for count, suffixed with _count
  • a gauge timeseries for each quantile with a quantile label set to the quantile. The suffix of the metric varies among implementations.

The implementations may have drifted too far apart, but it would be also be nice to bring the naming conventions for split metrics together.

We should consider adding a summaryadapter (feel free to suggest better names) that splits summary metrics into multiple parts, which can replace this functionality in exporters, and enable splitting metrics prior to batching them. It could have configuration for how to name the resulting metrics, to allow for backwards-compatibility with existing exporters. Existing exporters could then simply not support summary types in their exporter, and point users to the processor which does this.

Alternatives considered

While it does bring naming conventions for split summary metrics closer together, reduce the complexity of exporters, and solve google cloud's batching problem, it does make the configuration required for users somewhat more complex. If we think the extra complexity is not worth the benefits, then the google cloud exporter will just have to implement batching in the exporter.

cc @aabmass @jsuereth

@buptubuntu
Copy link

buptubuntu commented Jan 12, 2022

It seems the root cause is that the underlying timeseries database do not support summary and quatile metric friendly,so it is actually the exporter that should be responsible for splitting the metric, or maybe in the future the underlying timeseries storage support summary metric friendly , the exporter will just send the summary metric in one batch naturally.

For now,maybe we should first investigate the splitting rule in exporter, if the splitting rule is the same, we can create a processor with the same rule, otherwise it is quite confused for the user

@aabmass
Copy link
Member

aabmass commented Jan 14, 2022

I think this proposal would also solve the batching issue open-telemetry/opentelemetry-collector#4646. See point 4:

Allow customization of the "size" function. There were requests to support batching by the serialized size, we can offer this in the exporter helper much nicer since the custom sizer can actually work using the exporter wire format. So batching can happen after data serialization.

@CatherineF-dev
Copy link

Agree on summaryadapter proposal.

A CUJ is to rename Summary metrics:

  • Now: for a Summary metrics go_gc_duration_seconds, it will have go_gc_duration_seconds_summary_percentile after googlecloud exporter processing it.
  • Expected: use metrics name go_gc_duration_seconds_percentile instead of go_gc_duration_seconds_summary_percentile

It's hard to add a metrics_rename processor after googlecloud exporter.


Currently, googlecloud exporter shoulders two responsibilities:

  1. send data to googlecloud backend
  2. process Summary metrics (split it into three time series)

It will have more flexibilities on Summary metrics if #2 is splitted from googlecloud exporter into a processor.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 7, 2022
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jul 24, 2023
@dmitryax dmitryax added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Jul 24, 2023
@mx-psi
Copy link
Member

mx-psi commented Jan 10, 2024

We have some support for splitting a summary into timeseries on the transform processor (see summary here #30156 (comment) ). Would adding a function for quantiles to the transform processor solve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

6 participants