Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: alerta timeout field and durationField question #1264

Closed
m4ce opened this issue Mar 17, 2017 · 7 comments
Closed

Feature: alerta timeout field and durationField question #1264

m4ce opened this issue Mar 17, 2017 · 7 comments

Comments

@m4ce
Copy link
Contributor

m4ce commented Mar 17, 2017

Hi,

I was looking to specify a timeout for an alert which doesn't have a recovery state. It should clear out after a certain period of time.

I came across the durationField field:

DurationField Optional field key to add the alert duration to the data. The duration is always in units of nanoseconds.

node.durationField(value string)

Would durationField help me achieve that? If so, the alerta timeout field could be set to the same value?

Thanks,
Matteo

@m4ce
Copy link
Contributor Author

m4ce commented Mar 21, 2017

@nathanielc, sorry to bug you here. I guess I can add support for this timeout in the Alerta alertNode myself. I will test it and open a PR afterward. However, not sure what this durationField is really about. If I search through the code, I cannot see how this is useful at current beside exposing its value in the alert data.

@nathanielc
Copy link
Contributor

@m4ce You are correct the duration field is informational only, it represents the duration of time the event has been triggering.

A PR would be welcome, does the timeout depend on the data? Or would defining a static timeout in the TICKscript be sufficient?

@m4ce
Copy link
Contributor Author

m4ce commented Mar 21, 2017

Defining a static timeout under the alertNode would be sufficient IMHO:

alert()
  timeout(5m)

This will basically expire the alert after 5 minutes. At current, I have alerts which will never recover as no recovery state can be defined. Alerta has this concept in the alert definition. However, Kapacitor does not.

@nathanielc
Copy link
Contributor

@m4ce That seems reasonable to me, but lets make it specific to the alerta output instead of all alert handlers since not all can support timeouts. So something like this:

|alert()
   .crit(...)
   .alerta()
       .timeout(5m)
       ...

@m4ce
Copy link
Contributor Author

m4ce commented May 12, 2017

@nathanielc - I haven't had time to get back to this yet. A question for you on this topic. Like I said, I have many alerts going through Kapacitor which do not have a state transition (e.g. ERROR -> OK, OK -> ERROR), but instead they are always errors. However, the problem with this situation is that these errors will never recover in Kapacitor. Is there any way to expire an alert after a certain amount of time? Some sort of TTL applied to the alert.

@m4ce
Copy link
Contributor Author

m4ce commented Aug 30, 2017

@nathanielc, I'd like to resume the discussion over this issue. As discussed, I have alerts which will never recover on their own as they are one-off error messages. How to go about this in Kapacitor? Is there any way to expire alerts after some time? What would you advice be on this matter?

@nathanielc
Copy link
Contributor

Fixed in #1545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants