-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Telemetry for potential rule execution guardrails #122535
Comments
Pinging @elastic/response-ops (Team:ResponseOps) |
Prior conversation on capturing the max number of alerts created during execution => #116047. |
side discussion with Mike: we're thinking capturing percentiles rather than min/max. For instance, we can easily compare p50 vs p90, and determine if the value at p90 is an outlier vs p50, if it's much larger than p50. If they're similar, the p50 is probably not an outlier. We're less interested in outliers than things that are more consistently "not good". |
I'm wondering if we can split this issue into several deliverables (no need for multiple issues, just multiple PRs). |
We should split this into smaller deliverables. There may be an issue already for "How much time was spent in elasticsearch searches?". |
I will leave it up to the person who picks up the issue if they want to do both telemetry questions at once or if they feel it should be split up and create a separate issue for one of the other telemetry questions. |
Might be worth using percentiles and maybe breaking down by rule type. |
Before implementing guardrails and limitations to the alerting rules, we should gather data to validate where guardrails are necessary in relation to the rule execution. The following would be interesting to gather, and potentially others as we think of them.
Copied from #60315, we can start with the following:
The text was updated successfully, but these errors were encountered: