TelemetryAPIJourney - Retrieving the telemetry payload (cached vs. fresh) #211

afharo · 2022-01-05T14:32:31Z

Summary

This PR adds a Telemetry API Journey with 3 scenarios:

Scenario 1: First hit - non-cached encrypted usage

The first scenario is where the users hit the telemetry endpoint for the first time in the past 4 hours or when a new node is added. This grabs the non-cached encrypted usage data:

Scenario 2: Second+ hit - cached encrypted usage

The second scenario is when users hit the endpoint for the second or more times during the past 4 hours. This grabs the cached encrypted usage data:

Scenario 3: Example flyout and stats API - non-cached non-encrypted usage, check collectors status

The third scenario tests grabbing a fresh copy of non-encrypted usage. This happens when Kibana explicitly asks for this data (the stats API, and example flyout). This grabs non-cached unencrypted usage data.

This scenario also fails when we have failed, timeed-out, and non-ready collector.

Notes

All scenarios run against a freshly installed Kibana instance. This means that the size of the indices is as small as they can get.

dmlemeshko · 2022-01-05T15:41:51Z

@afharo the code looks good, I started a job to see it in Jenkins/kibana-stats cluster: https://kibana-ci.elastic.co/view/Kibana/job/elastic+kibana+load-testing/883/

Do you want these simulations to be executed on daily basis and have results in Kibana-stats cluster, like we do for other ones?
If so, you need to add them here

Of course, you can always run it manually whenever there is a need

dmlemeshko · 2022-01-05T17:49:50Z

@afharo I got slack alert for your simulation with results showing 692 request failures:

Scenario: org.kibanaLoadTest.simulation.branch.TelemetryAPIJourney
Users count: 400
Load testing branch: TelemetryAPIJourney
Kibana branch: main
Elasticsearch: 8.1.0-SNAPSHOT / 2022-01-04T15:05:08.795979329Z / 8ab0d40cb550f3156bd643ae73eabcadfaa607a1
Failed requests: 692 of 1912
Response time (ms):
* 75th percentile: 60000
* 95th percentile: 60000
* 99th percentile: 60001
* Maximum: 60002

It is possible to get gatling reports from Jenkins, but it looks like without cache 400 users is too much.

I also have a general question about your use case: is this end-point triggered by individual Kibana user or just a single call Kibana is doing to stats cluster once in a while? Is there any reason behind 400 users threshold?

afharo · 2022-01-06T05:15:09Z

@dmlemeshko thanks for the ping!

It's actually good news that it fails with that amount of users. We had the suspicion that this endpoint could cause issues if there were too many users requesting it at once.

This load test was built to prove that elastic/kibana#121656 is needed.

dmlemeshko · 2022-01-06T08:37:44Z

@afharo thank you for explanation. Now my understanding is that there is no need to run these simulations on daily basis, is it correct?

We can merge it as is and you can run it anytime on CI. Actually soon there will be a possibility to comment a Kibana PR and it will trigger load simulations based on Team label

afharo · 2022-01-06T08:40:03Z

Thank you! Before merging I'd like someone from @elastic/kibana-core to share their thoughts as well 😇

afharo · 2022-01-06T08:43:43Z

I think that we could apply an additional improvement in the telemetry report generation logic to aggregate multiple concurrent requests into the same promise. Then we can run these journeys on the daily performance checks.

I'm AFK right now. But I'll create an issue on Monday.

src/test/scala/org/kibanaLoadTest/simulation/branch/TelemetryAPICachedJourney.scala

src/test/scala/org/kibanaLoadTest/scenario/TelemetryAPI.scala

afharo · 2022-01-10T17:52:26Z

I think that we could apply an additional improvement in the telemetry report generation logic to aggregate multiple concurrent requests into the same promise. Then we can run these journeys on the daily performance checks.

I'm AFK right now. But I'll create an issue on Monday.

Issue elastic/kibana#122572 created!

…lemetryAPIJourney

Bamieh · 2022-02-23T11:27:56Z

I've updated this PR to include 1 more scenario to check for failed collectors and I did merge the journeys into one with three scenarios. @dmlemeshko what are the next steps here? I think we need:

Test the journey on CI
Make sure our concurrent users' thresholds make sense for the different scenarios. 250 for cached, 30 for non-cached.
Agree if we want to add a config to jenkins and merge the PR

dmlemeshko · 2022-02-23T16:47:29Z

@Bamieh I ran it on CI and I think thresholds are reasonable.

Code looks good as well.

Though I would not add it to our regular (4x times/ day) run. We are currently working on removing the noise for bare metal worker and limited number of scenarios. You can run it anytime manually and compare results in Kibana stats. Let me know if it makes sense.

TelemetryAPIJourney

9531b52

afharo requested a review from a team January 6, 2022 08:40

mshustov reviewed Jan 6, 2022

View reviewed changes

src/test/scala/org/kibanaLoadTest/simulation/branch/TelemetryAPICachedJourney.scala Outdated Show resolved Hide resolved

mshustov reviewed Jan 6, 2022

View reviewed changes

src/test/scala/org/kibanaLoadTest/simulation/branch/TelemetryAPICachedJourney.scala Outdated Show resolved Hide resolved

mshustov reviewed Jan 6, 2022

View reviewed changes

src/test/scala/org/kibanaLoadTest/scenario/TelemetryAPI.scala Show resolved Hide resolved

Remove misleading comments

4374a3b

afharo mentioned this pull request Jan 10, 2022

[Telemetry] Share promise for concurrent telemetry report generation elastic/kibana#122572

Closed

This was referenced Jan 11, 2022

[Telemetry] Report collector success/fail rate elastic/kibana#122636

Closed

[Meta][Telemetry] Reduce telemetry footprint elastic/kibana#119466

Closed

Bamieh added 2 commits February 22, 2022 14:51

Merge branch 'main' of github.com:elastic/kibana-load-testing into Te…

1fe1776

…lemetryAPIJourney

add collector stats testing + tidy up scenarios

a54097e

Bamieh requested review from dmlemeshko and mshustov February 23, 2022 11:15

dmlemeshko approved these changes Feb 23, 2022

View reviewed changes

Bamieh merged commit 11eb3ca into elastic:main Feb 24, 2022

Bamieh deleted the TelemetryAPIJourney branch February 24, 2022 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TelemetryAPIJourney - Retrieving the telemetry payload (cached vs. fresh) #211

TelemetryAPIJourney - Retrieving the telemetry payload (cached vs. fresh) #211

afharo commented Jan 5, 2022 •

edited by Bamieh

Loading

dmlemeshko commented Jan 5, 2022

dmlemeshko commented Jan 5, 2022

afharo commented Jan 6, 2022

dmlemeshko commented Jan 6, 2022

afharo commented Jan 6, 2022

afharo commented Jan 6, 2022

afharo commented Jan 10, 2022

Bamieh commented Feb 23, 2022 •

edited

Loading

dmlemeshko commented Feb 23, 2022

TelemetryAPIJourney - Retrieving the telemetry payload (cached vs. fresh) #211

TelemetryAPIJourney - Retrieving the telemetry payload (cached vs. fresh) #211

Conversation

afharo commented Jan 5, 2022 • edited by Bamieh Loading

Summary

Scenario 1: First hit - non-cached encrypted usage

Scenario 2: Second+ hit - cached encrypted usage

Scenario 3: Example flyout and stats API - non-cached non-encrypted usage, check collectors status

Notes

dmlemeshko commented Jan 5, 2022

dmlemeshko commented Jan 5, 2022

afharo commented Jan 6, 2022

dmlemeshko commented Jan 6, 2022

afharo commented Jan 6, 2022

afharo commented Jan 6, 2022

afharo commented Jan 10, 2022

Bamieh commented Feb 23, 2022 • edited Loading

dmlemeshko commented Feb 23, 2022

afharo commented Jan 5, 2022 •

edited by Bamieh

Loading

Bamieh commented Feb 23, 2022 •

edited

Loading