Flaky Test - load-test (TestLog10kDPS) #9094

TylerHelmuth · 2022-04-05T18:01:43Z

Describe the bug
The TestLog10kDPS test within the load-test workflow is failing intermittently. Here are some examples:

Steps to reproduce
Run the load-test workflow

What did you expect to see?
The TestLog10kDPS tests passing unless code had been modified that would deprecate performance.

What did you see instead?
The TestLog10kDPS tests failing intermittently.

Additional context
There are other tests in load-test workflow that are failing, but I noticed TestLog10kDPS failing the most.

jpkrohling · 2022-04-06T13:53:36Z

@open-telemetry/collector-contrib-approvers, @open-telemetry/collector-approvers, the load tests have always been a source of test failures for us. All we always do is increase the limits once we start hitting them more frequently. I don't remember ever seeing a true positive out of those tests. Are those tests providing value to us? Or are they just sources of noise?

mx-psi · 2022-04-06T14:04:55Z

I agree that the current tests don't add a lot of value and are a frequent source of flakes.

For something unrelated, I recently had a look at what the Rust compiler process is for this, which can maybe be interesting for this discussion. The process is as follows:

PRs are benchmarked out-of-band and a bot post a comments when there are regressions and tags the PR with ‘perf-regression’ (example)
The benchmark compares with the previous commit and reports the relative change (example). It has a defined criteria for when a change is considered relevant, which is based on basic statistics
Someone goes through all PRs tagged as regressions to investigate if they are justified (pressumably before a release?)

Reproducing this in our setup would need additional infra to run benchmarks and store historical data, so I don't think this is something we can do without significant effort, but still I think it's interesting to see what other projects do for inspiration.

tigrannajaryan · 2022-04-06T14:37:44Z

I don't remember ever seeing a true positive out of those tests. Are those tests providing value to us? Or are they just sources of noise?

I remember they once caught a performance regression when the Protobuf dependency was updated to a new version and caused significant slow down.

CNCF provided us with dedicated EC2 instances where we could run the perf tests in a more stable environment, but we never had time to work on this.

djaglowski · 2022-04-06T16:34:10Z

Responding narrowly to the immediate issue - this specific test expectation was "fixed" in #9023, with the standard approach of raising the limit.

TylerHelmuth · 2022-04-06T16:51:09Z

@djaglowski it looks like that PR "fixed" the test named "filelog" within the TestLog10kDPS suite but the list of failures in this issue is the test named "OTLP"

Should I submit a PR to up the OTLP test limit and continue the discussion around a more proper solution in a different issue?

djaglowski · 2022-04-06T17:00:51Z

@TylerHelmuth, you're right, I was too broad in my assertion there.

I think what you've suggested makes sense. A larger discussion on these tests deserves its own issue, but we can reduce loosen these test constraints immediately.

TylerHelmuth · 2022-04-06T17:05:32Z

Ok I will do that. Should I go ahead an up all the values (memory and CPU) for all the tests in this suite? Would probably go a long way to making PRs have green checks instead of red Xs.

djaglowski · 2022-04-06T17:12:52Z

The rule of thumb that has been followed so far is that when establishing the limits, set them at roughly 15% above observed values. We should probably stick to that until there's consensus on the larger question.

djaglowski · 2022-04-07T01:16:55Z

I've opened #9107 to capture the higher level discussion here. This issue will be closed by #9105.

TylerHelmuth added the bug Something isn't working label Apr 5, 2022

TylerHelmuth mentioned this issue Apr 6, 2022

Promote @djaglowski to maintainer role #9104

Merged

This was referenced Apr 6, 2022

Bump limits on performance tests #9105

Merged

[CI] Reassess value of load tests #9107

Open

jpkrohling closed this as completed in #9105 Apr 7, 2022

TylerHelmuth mentioned this issue Apr 14, 2022

CI Benchmark discussion open-telemetry/opentelemetry-go#2791

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky Test - load-test (TestLog10kDPS) #9094

Flaky Test - load-test (TestLog10kDPS) #9094

TylerHelmuth commented Apr 5, 2022

jpkrohling commented Apr 6, 2022

mx-psi commented Apr 6, 2022

tigrannajaryan commented Apr 6, 2022

djaglowski commented Apr 6, 2022

TylerHelmuth commented Apr 6, 2022 •

edited

Loading

djaglowski commented Apr 6, 2022

TylerHelmuth commented Apr 6, 2022

djaglowski commented Apr 6, 2022

djaglowski commented Apr 7, 2022

Flaky Test - load-test (TestLog10kDPS) #9094

Flaky Test - load-test (TestLog10kDPS) #9094

Comments

TylerHelmuth commented Apr 5, 2022

jpkrohling commented Apr 6, 2022

mx-psi commented Apr 6, 2022

tigrannajaryan commented Apr 6, 2022

djaglowski commented Apr 6, 2022

TylerHelmuth commented Apr 6, 2022 • edited Loading

djaglowski commented Apr 6, 2022

TylerHelmuth commented Apr 6, 2022

djaglowski commented Apr 6, 2022

djaglowski commented Apr 7, 2022

TylerHelmuth commented Apr 6, 2022 •

edited

Loading