[Metrics UI] Add support for severity levels to threshold alerts #88591

sorantis · 2021-01-18T12:27:22Z

User story
As a DevOps engineer I want to be able to set multiple threshold values based on severity levels, so that I could channel notifications based on the severity levels.

Describe the feature:
Today our threshold alerts support only one default severity level. Customers are asking to introduce multiple severity levels, e.g. warning, critical. The Kibana alerting team has been working on some design mockups for multi-level notifications, which can be found here.

In 7.11 the alerting team will has added generic UI for the definition of conditions for Action Groups. We will need to incorporate multiple severity levels in our alerts. Note, this can (and probably will) be applicable to other observability alert types.

Currently the following thresholds are interesting from the Inventory/Metric alerting perspective: warning threshold, alert/critical threshold. However other solutions have different assumptions. This needs to be aligned.

cc @cyrille-leclerc, @mukeshelastic.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-01-18T12:27:23Z

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

Zacqary · 2021-01-26T22:10:52Z

The design mockups don't easily translate to Metrics alerts since they're using the full-width layout:

And we're still debating how to incorporate that layout into the multi-conditional alert types that we use in Observability, seems like we're targeting 7.13 for it: #69233

I can throw something together that works with the way we do multiple conditions right now, probably by including a Critical/Warning/Minor/etc. threshold component within each conditional expression. I don't expect it to look pretty, though. Is that a blocker, or do we just want to get this feature out even if the UI looks kind of janky?

Zacqary · 2021-01-29T19:05:27Z

Here's what I've got implemented so far:

@katefarrar Thoughts?

Not sure I like the look of the Add warning threshold button in particular, but I'm not sure what to tweak about it.

Zacqary · 2021-01-29T19:09:28Z

Is there any reason why we should make it possible to select a different Comparator for the Critical vs. Warning threshold? Or should we have a single Comparator value and lock it to both of them?

It's currently a bit of a hassle to get the <ThresholdExpression> component to update in response to an outside param change rather than user input, so before I do the work to fix that I'd like to know if it's necessary.

Zacqary · 2021-01-29T21:23:54Z

If we're going to be switching to using action groups to differentiate between threshold states, we might want to remove the Alert on No Data checkbox:

and instead handle No Data states with action groups.

This change would break backwards compatibility with existing No Data alerts so I'm not sure how we should approach releasing it? I think Alerting is out of beta now, but I really don't think we should wait for a Breaking release to make this kind of change. Especially if Warning thresholds are going to use their own separate action group.

Zacqary · 2021-01-29T21:30:55Z

Currently the following thresholds are interesting from the Inventory/Metric alerting perspective: warning threshold, alert/critical threshold

I feel like I want to call it the "Critical" threshold if the Warning threshold is present, but this would create a weird discrepancy with action group naming. Assuming we want to change the name of the Fired action group, it could be jarring for users without a Warning threshold configured to see this:

So maybe Alert and Warning are better choices

Zacqary · 2021-01-29T21:37:58Z

Oof hmm I don't like "Run when Alert"

Currently the action group is called "Fired" so it says "Run when Fired" and "Run when Recovered," but I don't think we want to do:

"Run when Critical" sounds fine, but see above for why I don't think it's a good idea to do that either.

I'm checking with the Alerting team to see if it's possible to dynamically change the contents of the Run when menu depending on the configured Alert parameters; if so, we could just call that action group Fired when no Warning threshold is configured, and switch it to Critical when it is. But barring that solution, what should we do about this linguistic problem?

EDIT: We do not have that option yet. Tracking it here: #89898

Zacqary · 2021-01-29T23:16:10Z

I remember we did a lot of work figuring out the correct language for alert previews, but now I'm not sure what to do about multiple severity thresholds.

There were 5 critical and 2 warning instances that satisfied the conditions of this alert in the last hour

Seems closest to the current language but it feels off. "Satisfied the conditions of this alert" feels redundant when you enumerate the severity thresholds.

There were 5 instances that satisfied the critical conditions, and 2 instances that satisfied the warning conditions, of this alert in the last hour

Feels better but also needlessly verbose.

Zacqary · 2021-02-01T18:58:36Z

With the current logic, this Warning threshold would never fire, because ALL conditions need to have their thresholds met. Should we even support this use case, in which one condition can have a Warning but not others, or should we force all conditions to have a Warning threshold if you enable it on one of them?

katefarrar · 2021-02-01T22:31:16Z

Here's what I've got implemented so far:

@katefarrar Thoughts?

Not sure I like the look of the Add warning threshold button in particular, but I'm not sure what to tweak about it.

we're going to add plusInCircleFilled before Add warning threshold to make it a little more clear

katefarrar · 2021-02-01T22:35:17Z

For the alerting language, I think this works well...

There were 5 instances that satisfied the critical conditions, and 2 instances that satisfied the warning conditions of this alert in the last hour.

I'd rather err on the side of verbose than unclear 🙂

Zacqary · 2021-02-01T22:40:11Z

For Group By alerts we currently say:

There were 5 instances across 3 hosts that satisfied the conditions...

What do we think of switching this to:

Across 3 hosts, there were 5 instances that satisfied the critical conditions, and 2 instances that satisfied the warning conditions...

I feel like that'd be less of a headache than continuing to put the number of hosts afterward

EDIT: Actually this would be a huge i18n burden to correctly handle the conditional capital letter at the beginning, so I think we should stick with the host number in the middle

sorantis added enhancement New value added to drive a business result Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Jan 18, 2021

sgrodzicki added this to the Metrics UI 7.12 milestone Jan 18, 2021

Zacqary self-assigned this Jan 19, 2021

Zacqary added the metrics-ui:alerts label Jan 21, 2021

Zacqary removed their assignment Jan 21, 2021

Zacqary removed the metrics-ui:alerts label Jan 21, 2021

sgrodzicki assigned Zacqary Feb 1, 2021

Zacqary mentioned this issue Feb 1, 2021

[Metrics UI] Add support for severity levels to Inventory alerts #89912

Closed

Zacqary mentioned this issue Feb 2, 2021

[Metrics UI] Add warning severity to Metric Alerts #90070

Merged

5 tasks

Zacqary closed this as completed in #90070 Feb 9, 2021

EamonnTP mentioned this issue Feb 11, 2021

Document severity level for threshold alerts elastic/observability-docs#384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics UI] Add support for severity levels to threshold alerts #88591

[Metrics UI] Add support for severity levels to threshold alerts #88591

sorantis commented Jan 18, 2021

elasticmachine commented Jan 18, 2021

Zacqary commented Jan 26, 2021 •

edited

Loading

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021 •

edited

Loading

Zacqary commented Jan 29, 2021 •

edited

Loading

Zacqary commented Feb 1, 2021

katefarrar commented Feb 1, 2021

katefarrar commented Feb 1, 2021 •

edited

Loading

Zacqary commented Feb 1, 2021 •

edited

Loading

[Metrics UI] Add support for severity levels to threshold alerts #88591

[Metrics UI] Add support for severity levels to threshold alerts #88591

Comments

sorantis commented Jan 18, 2021

elasticmachine commented Jan 18, 2021

Zacqary commented Jan 26, 2021 • edited Loading

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021

Zacqary commented Jan 29, 2021 • edited Loading

Zacqary commented Jan 29, 2021 • edited Loading

Zacqary commented Feb 1, 2021

katefarrar commented Feb 1, 2021

katefarrar commented Feb 1, 2021 • edited Loading

Zacqary commented Feb 1, 2021 • edited Loading

Zacqary commented Jan 26, 2021 •

edited

Loading

Zacqary commented Jan 29, 2021 •

edited

Loading

Zacqary commented Jan 29, 2021 •

edited

Loading

katefarrar commented Feb 1, 2021 •

edited

Loading

Zacqary commented Feb 1, 2021 •

edited

Loading