-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Ensure alerts are scheduled when rule times out #128276
Conversation
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here LGTM as is. There are some other areas we may want to address re: cancellation in addition though.
We should remove the experimentalFeatures.securityRulesCancelEnabled
flag and choose an appropriate value (possibly undefined
) for ruleTaskTimeout
based on the outcome of https://github.com/elastic/security-team/issues/3415. This can be a follow up PR but we should target 8.2 for that work if possible.
Threat match rules have a separate function buildExecutionIntervalValidator
that implements timeout functionality as well. If we can remove that and rely on the search timeout instead that would unify the rule types more. We'll need to modify some of the logic in the threat match executor so it breaks out of the loop when searchAfterBulkCreate
returns an error though.
id: alertId, | ||
kibanaSiemAppUrl: (meta as { kibana_siem_app_url?: string } | undefined) | ||
?.kibana_siem_app_url, | ||
outputIndex: ruleDataClient.indexNameWithNamespace(spaceId), | ||
ruleId, | ||
esClient: services.scopedClusterClient.asCurrentUser, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we try to schedule throttled notifications after a rule is cancelled, the search
executed inside this function will also be cancelled and we won't be able to schedule the actions. We may need the alerting framework to provide a secondary "un-cancellable" client that we can use during the actions scheduling process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point!
Friendly reminder: Looks like this PR hasn’t been backported yet. |
Summary
Fixes: #121559
Related:
#120506
NOTE: Tests are very difficult to write for this case. There are tests in the alerting framework for the constructs used. Below is the method I used for manual testing, in detail.
To test:
Start an http server on localhost:
sudo python3 -m http.server 5605
or usenetcat
.Create a rule that creates alerts and then times out. Easiest way I have found to accomplish this is below:
![image](https://user-images.githubusercontent.com/611653/159723590-cf9167fa-a887-4735-afc6-670787d9d54c.png)
If you create more than 2 alerts, the first 2 alerts will be created and then the task will stall, and eventually time out.For maintainers