-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task Manager ignores the loss of connectivity with ES and can potentially prevent Kibana from loading #75501
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
The suggested approach sounds right. But I'm curious about:
Do you mean via the plugin APIs? We already have a What about tasks that are running when the connection is lost? I guess they'll get retried if we just drop them, but it seems like some are highly likely to work and you wouldn't want to retry (eg, slack, email), and some are highly likely to not work and you would want to retry (anything touching ES). "It's complicated". |
Just tried to wait for |
The gate only applies when Kibana is starting up from scratch, it doesn't apply when ES connectivity is lost and a new Kibana index is being set up - which is what broke things for @azasypkin . We can probably block the requests until ES is back 🤔 but as most operations are a request to schedule things for future execution, I'm not sure we have to. |
If ES goes down but Kibana keeps running, Task Manager continues to poll ignoring the down time.
This introduces three issues:
[error][plugins][taskManager][taskManager] Failed to poll for work: Error: No Living connections
No field found for [task.retryAt] in mapping
as the SO mapping hasn't yet been defined. (more detail error is below).Suggested approach:
Task Manager should stop polling the moment ES becomes
unavailable
and listen in for theonline$: Observable<OnlineStatusRetryScheduler>
that Platform expose so that it can start again when it becomesavailable
. We must also await the SavedObject mapping's creation. As theStart
event happens before the mappings are created, new requests could come into Task Manager before it's ready, so it should buffer these and respond to them when it can.The text was updated successfully, but these errors were encountered: