Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to make task manager run a task immediately #50214

Closed
mikecote opened this issue Nov 11, 2019 · 11 comments
Closed

Ability to make task manager run a task immediately #50214

mikecote opened this issue Nov 11, 2019 · 11 comments
Assignees
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.6.0

Comments

@mikecote
Copy link
Contributor

mikecote commented Nov 11, 2019

Follow up task as result of #45152

We'll add an API to Task Manager that allows us to run a Task immediately, unless it is currently running, and this will allow us to force a refresh a scheduled tasks manually.

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-stack-services (Team:Stack Services)

@gmmorris
Copy link
Contributor

gmmorris commented Dec 3, 2019

Here's a curiosity... if you try to call runNow for a task that has failed... should we retry it, or throw an error? 🤔

I can see someone trying to rerun a task that has failed to see if its behaviour is now valid (if, for example, its results relies on a response from an external service.
This is a behavioural change as we do not currently allow a failed task to be rerun in any way (we reschedule several times before giving up, failed means we ran out of attempts and they all failed).

any thoughts @mikecote @peterschretlen ?

@bmcconaghy
Copy link
Contributor

I would say runNow should run it regardless of it being in the failed state. Presumably the person calling this would have reason to believe it should now succeed and it doesn't really hurt anything if we try and it just fails again.

@gmmorris
Copy link
Contributor

gmmorris commented Dec 3, 2019

That's my instinct as well.
Need to run through the implications within the TM lifecycle, but it makes sense.

@gmmorris
Copy link
Contributor

I have an open issue in my PR, which I'm not actually sure needs to be addressed, but I'd like to hear your thoughts.

If the task being run by runNow fails, it is treated the same as any task that fails - it is rescheduled to try again, assuming there are more attempts available to it. This means a call to runNow might report a failure, and it might then succeed minutes later. Some thought needs to be put into how to address/communicate this.

What this means is that if you schedule a task with a runAt in the future, and then call runNow, it will try to run it now instead of at the runAt.
Now, presume the task run fails, you'll get a response from runNow saying it has failed to run.
But, as this is just a normal task run, Task Manager will reschedule the task to try it again.
This means, that by default, 5 minutes later Task Manager will rerun the task and that time - it might pass.

This means you would have had a runNow API call that failed, and 5 minutes later, out of nowhere, it passes.

This could be confusing, on the other hand - it is Task Manager's normal behaviour, so I'm not sure this is actually a problem.

Any thoughts?

@gmmorris
Copy link
Contributor

Another question: what should a successful runNow return?
My instinct is that the state of the task is private, and a run should simply result in success (the promise has resolved) or failure (promise is rejected with an appropriate error, depending on why it failed), but we could return the state of the task if we wished... what do we thing?

@mikecote
Copy link
Contributor Author

What this means is that if you schedule a task with a runAt in the future, and then call runNow, it will try to run it now instead of at the runAt.
Now, presume the task run fails, you'll get a response from runNow saying it has failed to run.
But, as this is just a normal task run, Task Manager will reschedule the task to try it again.
This means, that by default, 5 minutes later Task Manager will rerun the task and that time - it might pass.

I think we're missing one issue that would solve this question. Based on #39349, alerts that fail running would just try again at the next interval. I think TM supports this when providing 'interval' but possibly not alerting's usage. So we may have to create an issue to cover this gap now that we're not moving over to TM's interval.

This would solve the question where the alert would just run again at its next interval.

Another question: what should a successful runNow return?

I think simply resolving the promise for now is good enough, we won't be doing anything with the result at this time.

@gmmorris
Copy link
Contributor

What this means is that if you schedule a task with a runAt in the future, and then call runNow, it will try to run it now instead of at the runAt.
Now, presume the task run fails, you'll get a response from runNow saying it has failed to run.
But, as this is just a normal task run, Task Manager will reschedule the task to try it again.
This means, that by default, 5 minutes later Task Manager will rerun the task and that time - it might pass.

I think we're missing one issue that would solve this question. Based on #39349, alerts that fail running would just try again at the next interval. I think TM supports this when providing 'interval' but possibly not alerting's usage. So we may have to create an issue to cover this gap now that we're not moving over to TM's interval.

This would solve the question where the alert would just run again at its next interval.

yes, that's true - when using TM's interval it'll default to rerunning at that point.
As long as the returned runAt from alerting takes the next interval into account I think that'll be fine. but I'll double check.

Another question: what should a successful runNow return?

I think simply resolving the promise for now is good enough, we won't be doing anything with the result at this time.

Cool, I'll keep the ID in there for clarity.

@bmcconaghy bmcconaghy added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:Stack Services labels Dec 12, 2019
@LeeDr
Copy link

LeeDr commented Jan 16, 2020

We're past 7.6.0 Feature Freeze so if this isn't a bug it should probably bump to v7.7.0.

@pmuellr
Copy link
Member

pmuellr commented Jan 16, 2020

This marked in the GH project as done, so I think it should be closed, but @gmmorris would know for sure.

@gmmorris
Copy link
Contributor

Yup, this was done, not sure why the PR didn't close this at the time.

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.6.0
Projects
None yet
Development

No branches or pull requests

7 participants