-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[testbed] Add batcher performance tests #36206
Conversation
@dmitryax @sfc-gh-sili I'd love some feedback as to whether this is useful for your purposes, and if I should broaden the scope. |
35c4618
to
89f3353
Compare
By the way, I'm not convinced that the testbed is appropriately sophisticated to measure and describe the differences between the two batchers. The amount of work being done is the same, so we expect the same amount of compute resource. There are still a few salient differences between the two forms of batching that would not be teased apart by this benchmark, for example the way that exporter-batching has to serialize multiple requests and can't benefit from the queue for concurrency when batches are reduced in size. The effect is on individual request latency, which the testbed doesn't measure. I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data. I also claim we'll never get rid of the batch processor and should fix it. As a shining example, the |
@jmacd any idea on how I could simulate a CPU-heavy processor here? The processors I did add to my benchmark were intended to simulate the average use case, but I'd be happy to have a more skewed benchmark there as well. |
Please rebase, fix conflicts and add changelog. |
89f3353
to
ace990a
Compare
c57c82c
to
e6a49ca
Compare
e6a49ca
to
7061eeb
Compare
@swiatekm Thank you, Mikołaj! This is great.
|
7061eeb
to
f872ba4
Compare
Can we merge this PR as-is, and I'll add another benchmark in a follow-up? @MovieStoreGuy @jmacd @sfc-gh-sili I can see the argument that we could get better performance on some processing tasks because batch size can affect CPU cache saturation. I'm not 100% sure how to verify this though. Maybe spawn 100 transform processors doing the same thing? In that case though, I think we'd see the overhead of nested function calls more than anything else though. |
8fff77a
to
be0715a
Compare
# Conflicts: # cmd/oteltestbedcol/builder-config.yaml # Conflicts: # cmd/oteltestbedcol/builder-config.yaml
be0715a
to
f08d866
Compare
Reran the benchmark on current main:
I also ran it with the
Looks like the new batcher is indeed slightly more performant in these simple scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
Add basic batching benchmarks. The primary intent of these is to help verify that we're not introducing performance regressions with open-telemetry/opentelemetry-collector#8122.
I've taken the the 10k DPS benchmark for logs, and ran it in different configurations:
I've reduced the input batch size to 10 to better capture the effect of having no batching at the start of the pipeline on processor performance, which is one of the concerns with moving batching to the exporter.
For now, I'd like to get some comments on whether this is sufficient, or if I should expand my scope to other signal types and/or different parameters for the benchmarks.
Current results
It looks like the new batcher is a bit less performant if the pipeline doesn't contain any processors, but is in fact faster if processors are present, which is surprising to me. But this does assuage our fears that we'd tank processor performance by moving batching to the end of the pipeline.
Link to tracking issue
Fixes open-telemetry/opentelemetry-collector#10836