You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The timeout for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will include timeout worth of data.
I'm interested in this because of interacts with the prometheus receiver and the google cloud exporter. If I have a scrape interval of 10s and a timeout of 9s, I would expect each batch to include at most one scrape.
The batch processor resets the timerafter the previous batch has been successfully sent. So, in my example, if I have a timeout of 9s, and it takes 2s for the exporter to send the metric, each batch will include a total of 11s of telemetry. With a 10s scrape interval, that means multiple scrapes will occasionally be included in the same batch, which is not what I expected.
The timeout for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will include timeout worth of data.
timeout is useful when there isn't too much data and send_batch_size is not fully filled. With a reasonable timeout, The batch processor can also deliver data in time.
For your use case (If I understand it correctly), I'd recommend:
Use send_batch_max_size of batch processor which can guarantee a single data point per request. (A ResourceLog will only have one InstrumentationLibraryLog)
Grab metrics periodically at the receiver level as most of the receivers have a similar feature.
With a reasonable timeout, The batch processor can also deliver data in time.
This actually runs into the same issue i'm describing. Let's say I would like the collector to contribute at most 30s of latency to any metric received at a receiver. This is actually quite difficult to do given the current way the batch processor works if the exporter has any latency sending. If I set a timeout of 30 seconds, a metric added at the very beginning of the interval would take 30s + exporter timeout to send.
Use send_batch_max_size of batch processor which can guarantee a single data point per request.
Unfortunately, I don't know the number of metrics that will be in any batch ahead of time. A prometheus receiver could send a single metric, or a thousand metrics. In my case, I set send_batch_max_size to 200, since that is the most that google cloud monitoring will accept.
Describe the bug
The
timeout
for the batch processor is described as "Time duration after which a batch will be sent regardless of size." I interpret that to mean that each batch will includetimeout
worth of data.I'm interested in this because of interacts with the prometheus receiver and the google cloud exporter. If I have a scrape interval of 10s and a timeout of 9s, I would expect each batch to include at most one scrape.
The batch processor resets the timer after the previous batch has been successfully sent. So, in my example, if I have a timeout of 9s, and it takes 2s for the exporter to send the metric, each batch will include a total of 11s of telemetry. With a 10s scrape interval, that means multiple scrapes will occasionally be included in the same batch, which is not what I expected.
Unfortunately, this causes issues for google cloud monitoring, which only allows a single data point for each stream per-request.
The solution I'd prefer would be to use a time.Ticker instead of a time.Timer so that batches are sent to the exporter at a consistent interval.
The text was updated successfully, but these errors were encountered: