[pkg/stanza] Make batching behavior configurable #21184

dmitryax · 2023-04-26T05:53:27Z

Expose configuration options to control the batching behavior of the stanza receivers:

batch.max_batch_size: The maximum number of spans to batch together.
batch.timeout: The maximum amount of time to wait before sending a batch.

Providing configuration options for batching in the receivers makes it possible to replace the batch processor and prevent data loss when an exporter returns a retriable error.

Introduce new configuration options to control the batching behavior of the stanza receivers: - `batch.max_batch_size`: The maximum number of spans to batch together. - `batch.timeout`: The maximum amount of time to wait before sending a batch. Providing configuration options for batching in the receivers makes it possible to replace the batch processor and prevent data loss when an exporter returns a retryable error.

djaglowski · 2023-04-26T13:01:55Z

These setting were removed prior to declaring the filelog receiver beta, because it was unclear whether we should be batching at all. (context)

I think before we re-add these parameters, we should resolve the question of whether pkg/stanza at all. It's not clear to me that there are any real benefits, but it certainly complicates the codebase quite a bit.

djaglowski · 2023-05-10T15:02:37Z

One more reason we may want to remove the complicated conversion logic altogether: #21740 (comment)

djaglowski

I want to make sure we do not expose configuration for batching until we determine whether or not we are keeping the behavior. See #21184 (comment).

I will open an issue to propose that we remove batching from the stanza adapter. If that is rejected, then we will have to add configuration.

djaglowski · 2023-04-26T12:54:24Z

receiver/tcplogreceiver/README.md

+| `encoding`             | `utf-8`            | The encoding of the file being read. See the list of supported encodings below for available options                                                                                                                               |
+| `operators`            | []                 | An array of [operators](../../pkg/stanza/docs/operators/README.md#what-operators-are-available). See below for more details                                                                                                        |
+| `batch.max_batch_size` | `100`              | Maximum number of log records per emitted batch.                                                                                                                                                                                   |
+| `batch.timeout`        | `100 milliseconds` | Maximum duration to wait before sending out the log records batch.                                                                                                                                                                 |


dmitryax · 2023-05-17T05:11:22Z

My motivation for this was to establish a pipeline that cannot drop any logs. After #20511, the only part of a typical pipeline dropping data is the batch processor. Removing it drops the batch size to 100 (hardcoded in stanza receivers), which significantly slows down the whole pipeline.

This change would bring me a short-term solution. The long-term one would be open-telemetry/opentelemetry-collector#4646. I'm good with rejecting this PR in favor of the long-term solution.

djaglowski · 2023-05-17T13:19:25Z

@dmitryax, I've done some preliminary benchmarking in #21929 to evaluate the impact of the batch size. While I tentatively agree we will need to add batching controls, my intuition is that the adapter codebase can be simplified and that doing so may have performance benefits or impacts on the eventual configuration. To that end, I think it would be reasonable to add the new parameters behind a feature gate for now, but I'd like to spend a little more time evaluating this possibility before we commit to adding configuration that may later be changed. If you're able to review #21928 and #21929, this would help me move forward on the refactoring I'd like to try.

github-actions · 2023-06-01T05:20:18Z