Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processor includes more than timeout time of telemetry #4442

Closed
dashpole opened this issue Nov 16, 2021 · 2 comments
Closed

Batch processor includes more than timeout time of telemetry #4442

dashpole opened this issue Nov 16, 2021 · 2 comments
Labels
bug Something isn't working Stale

Comments

@dashpole
Copy link
Contributor

Describe the bug
The timeout for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will include timeout worth of data.

I'm interested in this because of interacts with the prometheus receiver and the google cloud exporter. If I have a scrape interval of 10s and a timeout of 9s, I would expect each batch to include at most one scrape.

The batch processor resets the timer after the previous batch has been successfully sent. So, in my example, if I have a timeout of 9s, and it takes 2s for the exporter to send the metric, each batch will include a total of 11s of telemetry. With a 10s scrape interval, that means multiple scrapes will occasionally be included in the same batch, which is not what I expected.

Unfortunately, this causes issues for google cloud monitoring, which only allows a single data point for each stream per-request.

The solution I'd prefer would be to use a time.Ticker instead of a time.Timer so that batches are sent to the exporter at a consistent interval.

@dashpole dashpole added the bug Something isn't working label Nov 16, 2021
@sincejune
Copy link
Contributor

The timeout for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will include timeout worth of data.

timeout is useful when there isn't too much data and send_batch_size is not fully filled. With a reasonable timeout, The batch processor can also deliver data in time.

For your use case (If I understand it correctly), I'd recommend:

  1. Use send_batch_max_size of batch processor which can guarantee a single data point per request. (A ResourceLog will only have one InstrumentationLibraryLog)
  2. Grab metrics periodically at the receiver level as most of the receivers have a similar feature.

@dashpole
Copy link
Contributor Author

With a reasonable timeout, The batch processor can also deliver data in time.

This actually runs into the same issue i'm describing. Let's say I would like the collector to contribute at most 30s of latency to any metric received at a receiver. This is actually quite difficult to do given the current way the batch processor works if the exporter has any latency sending. If I set a timeout of 30 seconds, a metric added at the very beginning of the interval would take 30s + exporter timeout to send.

Use send_batch_max_size of batch processor which can guarantee a single data point per request.

Unfortunately, I don't know the number of metrics that will be in any batch ahead of time. A prometheus receiver could send a single metric, or a thousand metrics. In my case, I set send_batch_max_size to 200, since that is the most that google cloud monitoring will accept.

@github-actions github-actions bot added the Stale label Nov 23, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants