Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps] retry rule runs when retryable errors occur #138124

Open
pmuellr opened this issue Aug 4, 2022 · 2 comments
Open

[ResponseOps] retry rule runs when retryable errors occur #138124

pmuellr opened this issue Aug 4, 2022 · 2 comments
Labels
Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework R&D Research and development ticket (not meant to produce code, but to make a decision) resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Aug 4, 2022

Currently when a rule executor throws an error, we don't retry the execution. However, there are cases where this would probably be a good thing to enable. An example is a transient networking error like a Socket Hang Up. If we know the rule is "safe" to run again, and detect a transient error, a retry after a short delay has a good chance of executing successfully.

Lots of things to consider:

  • is the rule writing data anywhere, such that a retry would overwrite that data or otherwise result in confusing / corrupt data being written
  • is the rule sensitive to it's interval, runAt time, etc, such that it would calculate inappropriate time intervals on a retry

Probably also applies to the ES index connector, but ... harder, as by definition it is writing data - we don't want to write it twice! It's also the case that we could instrument the ES index connector itself for retry support, and not have to worry about it at the "connector level".

@pmuellr pmuellr added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework labels Aug 4, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@pmuellr pmuellr added the resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility label Aug 4, 2022
@mikecote mikecote moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Aug 4, 2022
@mikecote
Copy link
Contributor

mikecote commented Aug 8, 2022

Linking with #50215 (comment) for running rules ad-hoc. Once one of these pieces of work is done, we should be able to automatically enable the other issue using the same solution..

@mikecote mikecote added the R&D Research and development ticket (not meant to produce code, but to make a decision) label Sep 23, 2022
@doakalexi doakalexi self-assigned this Sep 29, 2022
@doakalexi doakalexi moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Sep 29, 2022
@doakalexi doakalexi removed their assignment Sep 30, 2022
@doakalexi doakalexi moved this from In Progress to Todo in AppEx: ResponseOps - Execution & Connectors Sep 30, 2022
@ymao1 ymao1 self-assigned this Oct 10, 2022
@ymao1 ymao1 moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Oct 10, 2022
@ymao1 ymao1 removed their assignment Oct 10, 2022
@ymao1 ymao1 moved this from In Progress to Todo in AppEx: ResponseOps - Execution & Connectors Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework R&D Research and development ticket (not meant to produce code, but to make a decision) resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

No branches or pull requests

5 participants