-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] Use server's lastReported
on the browser
#121656
[Telemetry] Use server's lastReported
on the browser
#121656
Conversation
{ savedObjects, elasticsearch }: CoreStart, | ||
{ telemetryCollectionManager }: FetcherTaskDepsStart | ||
) { | ||
public start({ savedObjects }: CoreStart, { telemetryCollectionManager }: FetcherTaskDepsStart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only removed the unused elasticsearch
if (!this.lastReported || Date.now() - this.lastReported > REPORT_INTERVAL_MS) { | ||
// Check both: in-memory and SO-driven value. | ||
// This will avoid the server retrying over and over when it has issues with storing the state in the SO. | ||
if (isReportIntervalExpired(this.lastReported) && isReportIntervalExpired(lastReported)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies the 2nd risk I listed in the description of the PR. I hope we're OK with it.
a3292cc
to
fb9e7aa
Compare
…er-uses-servers-lastReported
Pinging @elastic/kibana-core (Team:Core) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR we're basically removing any redundancy in reporting telemetry usage. since we have no way to confirm that we actually got the data other than the status: 200
to check if the endpoint is reachable we'd be risking losing usage due to any connectivity issues.
Do you think adding another counter redundancyFactor
that allows reporting usage 3 times a day for example before shouldSendUsage
starts returning false
? This way we have some redundancy while reducing the number of telemetry calls.
We'd also need product approval (@thesmallestduck ) that we are OK with losing any kind of redundancy we have.
Note that we do cache usage now every 4 hours so the cost of redundancy is greatly reduced.
Shouldn't this highlight an issue on the receiving side instead? IMO, if the request made it to the receiving end, that's a success from the Kibana POV. IMO, we shouldn't overload Kibana at the cost of performance to overcome issues on the receiving end. If the users find that Telemetry is heavily decreasing the performance of their deployments, they'll simply disable it. I'd rather lose 1 day worth of telemetry (if the receiving end falsely replies
I completely agree! Adding him as a reviewer to make sure we wait for his approval.
Indeed it does! However, IMO, there are some cases that are not covered by the caching solution: big deployments with multiple instances of Kibana behind a load balancer, serving a high number of users. Each additional Kibana instance in the deployment will come with the cost of one "empty cache". This PR allows scalability at zero cost. The cache mechanism is still useful for any retries we may need due to the receiver being down. |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Module Count
Page load bundle
History
To update your PR or re-run it, just comment with: |
Reducing to 1 browser-transmitted payload per day is okay by me. |
💔 Backport failedThe backport operation could not be completed due to the following error: You can specify it via either:
The backport PRs will be merged automatically after passing CI. To backport manually run: |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit 2d17554)
Summary
Resolves #87846.
This PR normalizes the logic to decide whether to send telemetry or not:
isReportIntervalExpired
tocommon
so both theserver
andpublic
ends use the same logic to decide whether thelastReport
date is expired.NOTE: All these changes are done in favour of reducing the number of requests, hence, the load of Kibana and Elasticsearch due to the generation of the telemetry report multiple times per day.
Checklist
Delete any items that are not applicable to this PR.
Risk Matrix
Delete this section if it is not applicable to this PR.
Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.
When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:
lastReported
updated value after a successful send if the process goes down, the tab is closed or there's a network issue right between the successful report and the SO update.For maintainers