Feature Request - Notification also from degraded to active #11

jayhding · 2016-08-12T05:17:33Z

Right now notification will only be sent when service has become degraded for a while, but we would also like to receive notifications when it has recovered from degraded status. Then we can know what is the service's final status.

ndelitski · 2016-08-12T07:40:59Z

What do you think on how should we notify if a service periodically jumping from a degraded to an active state and vice versa? is it ok to receive too many notifications? current logic is when a service become degraded you will receive only one notification independent on next status changes, maybe we should have specific settings enabling this feature?

jayhding · 2016-08-12T09:08:29Z

That's exactly what we often see that service is flapping between active and degraded, actually I did change the code to notify for both directions and we have used for some time.

It is true it will generate more emails and that's why we changed to notify as slack message.

But considering the convenient to access slack, we can easily know if a service is back to normal state without connecting to private network in after hours situation.

Definitely it is fine to control this feature by a specific flag.

We can see if @SydOps also share the same opinion as me.

ozbillwang · 2016-08-12T23:01:09Z

I am not really care the recovered status. Agree with @ndelitski, no need too many notifications. On Rancher server + hosts, especially for enterprise, we will install thousands containers, if there are too many notifications, operators will ignore them directly.

Second, we don't use Rancher Alarms as main alarms system. We have others, such sensu, dynatrace, etc. These alarms system will report the application and service high level health, more than containers health. If one container is unhealthy, but HA/ELB or website works fine, we don't spend time on the problem immediately. Rancher-alarms for me is only for operators or developers who get quick notification for particular rancher container services. Only notify when it is needed.

The best is, within Slack, you can delete the messages, if the slack bot is smart enough, it should be fine to delete previous degraded message, if it thinks the broken container is back and active. But I don't know how difficult to write code as this way.

Recovery notification is good feature, if we can add the codes, but make sure we can have option to turn it on/off easily.

flaccid · 2016-08-14T01:17:15Z

We all have different desires, use cases however the degraded->active to detect flapping has been very useful. Control by settings yes please.

Too many notifications to slack isn't really an issue particularly if you use a dedicated channel. Concept of DevOps/agile/CD here is to stop work and fix to keep the pipeline going. The spice must flow!

I doubt you can delete messages done by a webhook as its not a real bot/user, worth checking though. Deletion in my opinion however is changing history, where a potential log of that can help in doing a post-mortem of certain events.

What we have found is that if you get a rancher alarm, something is wrong so any operator really should look at it straight away especially if in production. Its much different to getting noise for something like host alerting on 'busy cpu' where its informative and can be ignored.

ndelitski · 2016-08-14T09:41:47Z

For the start if we implement an option like notifyWhenRecovered=true which is disabled by default, everybody ok with it? It will be configurable per target(email|slack...)

flaccid · 2016-08-14T22:49:48Z

Absolutely. For first version that is great, using the same template I'd assume.

ndelitski mentioned this issue Aug 14, 2016

release 0.2.0 #15

Open

ndelitski added the feature label Aug 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Notification also from degraded to active #11

Feature Request - Notification also from degraded to active #11

jayhding commented Aug 12, 2016

ndelitski commented Aug 12, 2016

jayhding commented Aug 12, 2016

ozbillwang commented Aug 12, 2016 •

edited

Loading

flaccid commented Aug 14, 2016

ndelitski commented Aug 14, 2016

flaccid commented Aug 14, 2016

Feature Request - Notification also from degraded to active #11

Feature Request - Notification also from degraded to active #11

Comments

jayhding commented Aug 12, 2016

ndelitski commented Aug 12, 2016

jayhding commented Aug 12, 2016

ozbillwang commented Aug 12, 2016 • edited Loading

flaccid commented Aug 14, 2016

ndelitski commented Aug 14, 2016

flaccid commented Aug 14, 2016

ozbillwang commented Aug 12, 2016 •

edited

Loading