Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Use server's lastReported on the browser #121656

Merged

Conversation

afharo
Copy link
Member

@afharo afharo commented Dec 20, 2021

Summary

Resolves #87846.

This PR normalizes the logic to decide whether to send telemetry or not:

  • Moves isReportIntervalExpired to common so both the server and public ends use the same logic to decide whether the lastReport date is expired.
  • On the UI, it requests the stored value on the server-side so, if the server or any other browser has already reported telemetry in the expected interval (24h), it will skip it on this browser (cc @thesmallestduck @elastic/infra-telemetry).
  • On the server, the stored value also applies now. So, the same behaviour applies to the UI: if another Kibana instance or any browser has reported telemetry in the last 24h, it won't report again.

NOTE: All these changes are done in favour of reducing the number of requests, hence, the load of Kibana and Elasticsearch due to the generation of the telemetry report multiple times per day.

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
It may fail to store the lastReported updated value after a successful send if the process goes down, the tab is closed or there's a network issue right between the successful report and the SO update. Low Low A new report may be generated if the "in-memory" value says so. However, this is not an issue per-se. It will only momentarily add some extra load to the servers.
Having one sender per day reduces some visibility in our usage: it'll be harder to know how many different instances of Kibana are running for the same cluster, we'll get fewer reports, meaning fewer variability in the user-agents High Low We should actually collect that information explicitly instead.

For maintainers

@afharo afharo added enhancement New value added to drive a business result Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.1.0 auto-backport Deprecated - use backport:version if exact versions are needed v8.0.0 labels Dec 20, 2021
{ savedObjects, elasticsearch }: CoreStart,
{ telemetryCollectionManager }: FetcherTaskDepsStart
) {
public start({ savedObjects }: CoreStart, { telemetryCollectionManager }: FetcherTaskDepsStart) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only removed the unused elasticsearch

if (!this.lastReported || Date.now() - this.lastReported > REPORT_INTERVAL_MS) {
// Check both: in-memory and SO-driven value.
// This will avoid the server retrying over and over when it has issues with storing the state in the SO.
if (isReportIntervalExpired(this.lastReported) && isReportIntervalExpired(lastReported)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies the 2nd risk I listed in the description of the PR. I hope we're OK with it.

@afharo afharo force-pushed the telemetry/browser-uses-servers-lastReported branch from a3292cc to fb9e7aa Compare December 20, 2021 17:20
@afharo afharo marked this pull request as ready for review December 21, 2021 10:33
@afharo afharo requested a review from a team as a code owner December 21, 2021 10:33
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@afharo
Copy link
Member Author

afharo commented Dec 23, 2021

@elasticmachine merge upstream

Copy link
Member

@Bamieh Bamieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR we're basically removing any redundancy in reporting telemetry usage. since we have no way to confirm that we actually got the data other than the status: 200 to check if the endpoint is reachable we'd be risking losing usage due to any connectivity issues.

Do you think adding another counter redundancyFactor that allows reporting usage 3 times a day for example before shouldSendUsage starts returning false? This way we have some redundancy while reducing the number of telemetry calls.

We'd also need product approval (@thesmallestduck ) that we are OK with losing any kind of redundancy we have.

Note that we do cache usage now every 4 hours so the cost of redundancy is greatly reduced.

@afharo
Copy link
Member Author

afharo commented Dec 23, 2021

since we have no way to confirm that we actually got the data other than the status: 200

Shouldn't this highlight an issue on the receiving side instead? IMO, if the request made it to the receiving end, that's a success from the Kibana POV. IMO, we shouldn't overload Kibana at the cost of performance to overcome issues on the receiving end.

If the users find that Telemetry is heavily decreasing the performance of their deployments, they'll simply disable it. I'd rather lose 1 day worth of telemetry (if the receiving end falsely replies 200) vs. missing the entire dataset.

We'd also need product approval (@thesmallestduck ) that we are OK with losing any kind of redundancy we have.

I completely agree! Adding him as a reviewer to make sure we wait for his approval.

Note that we do cache usage now every 4 hours so the cost of redundancy is greatly reduced.

Indeed it does! However, IMO, there are some cases that are not covered by the caching solution: big deployments with multiple instances of Kibana behind a load balancer, serving a high number of users. Each additional Kibana instance in the deployment will come with the cost of one "empty cache". This PR allows scalability at zero cost.

The cache mechanism is still useful for any retries we may need due to the receiver being down.

@afharo
Copy link
Member Author

afharo commented Jan 5, 2022

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
telemetry 28 29 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
telemetry 24.3KB 24.7KB +457.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@thesmallestduck
Copy link

Reducing to 1 browser-transmitted payload per day is okay by me.

@afharo afharo merged commit 2d17554 into elastic:main Jan 6, 2022
@afharo afharo deleted the telemetry/browser-uses-servers-lastReported branch January 6, 2022 13:37
@kibanamachine
Copy link
Contributor

💔 Backport failed

The backport operation could not be completed due to the following error:
You must specify a valid Github repository

You can specify it via either:

The backport PRs will be merged automatically after passing CI.

To backport manually run:
node scripts/backport --pr 121656

@afharo
Copy link
Member Author

afharo commented Jan 10, 2022

💚 All backports created successfully

Status Branch Result
8.0

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

afharo added a commit to afharo/kibana that referenced this pull request Jan 10, 2022
afharo added a commit that referenced this pull request Jan 10, 2022
gbamparop pushed a commit to gbamparop/kibana that referenced this pull request Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed enhancement New value added to drive a business result Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Telemetry] Use lastReported timestamp on the 'browser' as well
6 participants