Batch processor includes more than `timeout` time of telemetry #4442

dashpole · 2021-11-16T19:29:09Z

Describe the bug
The timeout for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will include timeout worth of data.

I'm interested in this because of interacts with the prometheus receiver and the google cloud exporter. If I have a scrape interval of 10s and a timeout of 9s, I would expect each batch to include at most one scrape.

The batch processor resets the timer after the previous batch has been successfully sent. So, in my example, if I have a timeout of 9s, and it takes 2s for the exporter to send the metric, each batch will include a total of 11s of telemetry. With a 10s scrape interval, that means multiple scrapes will occasionally be included in the same batch, which is not what I expected.

Unfortunately, this causes issues for google cloud monitoring, which only allows a single data point for each stream per-request.

The solution I'd prefer would be to use a time.Ticker instead of a time.Timer so that batches are sent to the exporter at a consistent interval.

The text was updated successfully, but these errors were encountered:

sincejune · 2021-11-17T15:09:08Z

The timeout for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will include timeout worth of data.

timeout is useful when there isn't too much data and send_batch_size is not fully filled. With a reasonable timeout, The batch processor can also deliver data in time.

For your use case (If I understand it correctly), I'd recommend:

Use send_batch_max_size of batch processor which can guarantee a single data point per request. (A ResourceLog will only have one InstrumentationLibraryLog)
Grab metrics periodically at the receiver level as most of the receivers have a similar feature.

dashpole · 2021-11-22T16:55:59Z

With a reasonable timeout, The batch processor can also deliver data in time.

This actually runs into the same issue i'm describing. Let's say I would like the collector to contribute at most 30s of latency to any metric received at a receiver. This is actually quite difficult to do given the current way the batch processor works if the exporter has any latency sending. If I set a timeout of 30 seconds, a metric added at the very beginning of the interval would take 30s + exporter timeout to send.

Use send_batch_max_size of batch processor which can guarantee a single data point per request.

Unfortunately, I don't know the number of metrics that will be in any batch ahead of time. A prometheus receiver could send a single metric, or a thousand metrics. In my case, I set send_batch_max_size to 200, since that is the most that google cloud monitoring will accept.

dashpole added the bug Something isn't working label Nov 16, 2021

dashpole mentioned this issue Feb 23, 2022

[Proposal] Move batching to exporterhelper #4646

Open

github-actions bot added the Stale label Nov 23, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processor includes more than `timeout` time of telemetry #4442

Batch processor includes more than `timeout` time of telemetry #4442

dashpole commented Nov 16, 2021

sincejune commented Nov 17, 2021

dashpole commented Nov 22, 2021

Batch processor includes more than timeout time of telemetry #4442

Batch processor includes more than timeout time of telemetry #4442

Comments

dashpole commented Nov 16, 2021

sincejune commented Nov 17, 2021

dashpole commented Nov 22, 2021

Batch processor includes more than `timeout` time of telemetry #4442

Batch processor includes more than `timeout` time of telemetry #4442