Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry plugin] Do not await for getOptInStatus() on stop() #134735

Merged

Conversation

gsoldevila
Copy link
Contributor

In the scope of graceful shutdown, we want to ensure all async operations are correctly finished when stopping the Kibana server.

@gsoldevila gsoldevila added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting v8.4.0 labels Jun 20, 2022
@gsoldevila gsoldevila marked this pull request as ready for review June 23, 2022 14:20
@gsoldevila gsoldevila requested a review from a team as a code owner June 23, 2022 14:20
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the proposed changes will trigger the optIn call in the setup method. Can we proactively test that?

That said, a huge ++ to removing the await of this.optInPromise in the stop method.

Comment on lines 223 to 226
const internalRepositoryClient = await firstValueFrom(
// if the Observable completes without emitting, firstValueFrom will trigger an exception
this.savedObjectsInternalClient$.pipe(catchError((err) => of(undefined)))
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const internalRepositoryClient = await firstValueFrom(
// if the Observable completes without emitting, firstValueFrom will trigger an exception
this.savedObjectsInternalClient$.pipe(catchError((err) => of(undefined)))
);
const internalRepositoryClient = await firstValueFrom(
this.savedObjectsInternalClient$,
{ defaultValue: undefined },
);

Comment on lines 241 to 244
const configMaybe = await withTimeout<TelemetryConfigType>({
promise: firstValueFrom(this.config$),
timeoutMs: 3000,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3s? AFAIK, this.config$ should be immediate (we use the subscriber to account for config reloads)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the config does not change no new emissions will happen. Thus, the RxJS pipeline defined in the constructor will get stuck, cause we are using exhaustMap.

I was using this 3 second timeout to unblock the pipeline and proceed to the next emission of the timer(), which will takeUntil(this.pluginStop$) and complete the RxJS pipeline.

Perhaps another option would be to use switchMap instead of exhaustMap, this way, we would be cancelling the firstValueFrom(this.config$) every OPT_IN_POLL_INTERVAL_MS if it hasn't finished. WDYT?

Copy link
Contributor Author

@gsoldevila gsoldevila Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the config$ Observable that we obtain from ConfigService should complete on stop, but this does not seem to be the case ATM:

https://github.com/elastic/kibana/blob/main/packages/kbn-config/src/config_service.ts#L250-L255

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I just tried the switchMap approach and it works. It is a bit simpler and does not require this 3 seconds timeout, so I updated the PR with these changes.

src/plugins/telemetry/server/plugin.ts Show resolved Hide resolved
@gsoldevila
Copy link
Contributor Author

I'm not sure the proposed changes will trigger the optIn call in the setup method. Can we proactively test that?

Good catch! Just added a unit test for that.

@gsoldevila gsoldevila requested a review from afharo June 27, 2022 11:24
@@ -206,19 +206,22 @@ export class TelemetryPlugin implements Plugin<TelemetryPluginSetup, TelemetryPl
this.startFetcher(core, telemetryCollectionManager);

return {
getIsOptedIn: async () => this.isOptedIn$.value === true,
getIsOptedIn: async () => this.isOptedIn === true,
Copy link
Contributor Author

@gsoldevila gsoldevila Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed we could remove the async keyword from this function, but it would alter the setup contract. IDK if it's worth creating a separate task WDYT?

Copy link
Member

@afharo afharo Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to deprecate that method in favour of the observable: #122632 Let's keep it as is for now :)

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about switchMap: IMO, it may lead to accumulating calls to getOptInStatus if OPT_IN_POLL_INTERVAL_MS is faster than the request.

What do you think?

// Poll for the opt-in status
this.isOptedIn$ = timer(0, OPT_IN_POLL_INTERVAL_MS).pipe(
takeUntil(this.pluginStop$),
switchMap(() => this.getOptInStatus()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this lead to memory leaks?

If getOptInStatus takes longer than OPT_IN_POLL_INTERVAL_MS to execute, switchMap will trigger a new call on every tick of the timer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a very valid point here. From this article I thought that the inner Observable was cancelled with each new higher order emission, including cancelling ongoing HTTP calls. Then, I performed a test to make sure, and realised it must be the Angular HTTP client doing the magic behind the scenes (using an abort signal or similar).

image

I'll try to find another way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched back to exhaustMap and moved the takeUntil operator after the exhaustMap. This way, the RxJS pipeline seems to be completed() correctly.

@gsoldevila gsoldevila requested a review from afharo June 29, 2022 10:10
@kibana-ci
Copy link
Collaborator

kibana-ci commented Jun 29, 2022

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #6 / APMEventClient cancels a search when a request is aborted

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! As long as the CI goes green :)

@gsoldevila gsoldevila merged commit d1ac92c into elastic:main Jun 29, 2022
@gsoldevila gsoldevila changed the title Do not await for getOptInStatus() on stop() [Telemetry plugin] Do not await for getOptInStatus() on stop() Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants