[ResponseOps] retry rule runs when retryable errors occur #138124

pmuellr · 2022-08-04T14:18:27Z

Currently when a rule executor throws an error, we don't retry the execution. However, there are cases where this would probably be a good thing to enable. An example is a transient networking error like a Socket Hang Up. If we know the rule is "safe" to run again, and detect a transient error, a retry after a short delay has a good chance of executing successfully.

Lots of things to consider:

is the rule writing data anywhere, such that a retry would overwrite that data or otherwise result in confusing / corrupt data being written
is the rule sensitive to it's interval, runAt time, etc, such that it would calculate inappropriate time intervals on a retry

Probably also applies to the ES index connector, but ... harder, as by definition it is writing data - we don't want to write it twice! It's also the case that we could instrument the ES index connector itself for retry support, and not have to worry about it at the "connector level".

elasticmachine · 2022-08-04T14:18:29Z

Pinging @elastic/response-ops (Team:ResponseOps)

mikecote · 2022-08-08T20:28:22Z

Linking with #50215 (comment) for running rules ad-hoc. Once one of these pieces of work is done, we should be able to automatically enable the other issue using the same solution..

pmuellr added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework labels Aug 4, 2022

pmuellr added this to AppEx: ResponseOps - Execution & Connectors Aug 4, 2022

pmuellr moved this to Awaiting Triage in AppEx: ResponseOps - Execution & Connectors Aug 4, 2022

pmuellr added the resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility label Aug 4, 2022

mikecote moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Aug 4, 2022

mikecote mentioned this issue Aug 8, 2022

Ability run an alert immediately #50215

Closed

mikecote added the R&D Research and development ticket (not meant to produce code, but to make a decision) label Sep 23, 2022

doakalexi self-assigned this Sep 29, 2022

doakalexi moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Sep 29, 2022

doakalexi removed their assignment Sep 30, 2022

doakalexi moved this from In Progress to Todo in AppEx: ResponseOps - Execution & Connectors Sep 30, 2022

ymao1 self-assigned this Oct 10, 2022

ymao1 moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Oct 10, 2022

ymao1 removed their assignment Oct 10, 2022

ymao1 moved this from In Progress to Todo in AppEx: ResponseOps - Execution & Connectors Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps] retry rule runs when retryable errors occur #138124

[ResponseOps] retry rule runs when retryable errors occur #138124

pmuellr commented Aug 4, 2022

elasticmachine commented Aug 4, 2022

mikecote commented Aug 8, 2022

[ResponseOps] retry rule runs when retryable errors occur #138124

[ResponseOps] retry rule runs when retryable errors occur #138124

Comments

pmuellr commented Aug 4, 2022

elasticmachine commented Aug 4, 2022

mikecote commented Aug 8, 2022