Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PagerDuty Enhancement #291

Merged
merged 1 commit into from
Mar 9, 2016
Merged

PagerDuty Enhancement #291

merged 1 commit into from
Mar 9, 2016

Conversation

rossmcdonald
Copy link
Contributor

Improvements to the PagerDuty service integration:

  • Adds the concept of alert levels that correspond to the PagerDuty trigger and resolved events
  • Includes the alert ID in the PagerDuty requests, instead of opening a new incident with every alert

Fixes #288

@rossmcdonald rossmcdonald force-pushed the ross-pagerduty-update branch 2 times, most recently from 06a59c5 to 968764e Compare March 8, 2016 20:06
@rossmcdonald
Copy link
Contributor Author

@nathanielc I think this is ready to go. The current functionality has changed so that a kapacitor.WarnAlert or kapacitor.CritAlert will send a trigger event to the PagerDuty API (opening an alert). All other kapacitor.AlertLevel types will send a resolve, resolving the alert.

Testing this with the PagerDuty API, on a critical alert:

{
  "client": "kapacitor",
  "client_url": "http://[::]:9092",
  "description": "cpu/kapacitor1 is CRITICAL",
  "details": "{\"Series\":[{\"name\":\"cpu\",\"tags\":{\"cpu\":\"cpu0\",\"host\":\"kapacitor1\"},\"columns\":[\"time\",\"time_guest\",\"time_guest_nice\",\"time_idle\",\"time_iowait\",\"time_irq\",\"time_nice\",\"time_softirq\",\"time_steal\",\"time_system\",\"time_user\",\"usage_guest\",\"usage_guest_nice\",\"usage_idle\",\"usage_iowait\",\"usage_irq\",\"usage_nice\",\"usage_softirq\",\"usage_steal\",\"usage_system\",\"usage_user\"],\"values\":[[\"2016-03-09T20:57:30Z\",0,0,174491.01,11.72,41.94,5.68,107.41,0,341.67,167.67,0,0,21.72131147540617,0,1.2295081967237784,0,34.73360655744749,0,42.213114754184005,0.1024590163935572]]}],\"Err\":null}",
  "event_type": "trigger",
  "incident_key": "cpu/kapacitor1",
  "service_key": "<SERVICE KEY>"
}

And then on an okay:

{
  "client": "kapacitor",
  "client_url": "http://[::]:9092",
  "description": "cpu/kapacitor1 is OK",
  "details": "{\"Series\":[{\"name\":\"cpu\",\"tags\":{\"cpu\":\"cpu0\",\"host\":\"kapacitor1\"},\"columns\":[\"time\",\"time_guest\",\"time_guest_nice\",\"time_idle\",\"time_iowait\",\"time_irq\",\"time_nice\",\"time_softirq\",\"time_steal\",\"time_system\",\"time_user\",\"usage_guest\",\"usage_guest_nice\",\"usage_idle\",\"usage_iowait\",\"usage_irq\",\"usage_nice\",\"usage_softirq\",\"usage_steal\",\"usage_system\",\"usage_user\"],\"values\":[[\"2016-03-09T20:57:50Z\",0,0,174499.04,11.78,42.13,5.68,112.7,0,347.57,167.71,0,0,81.52284263953392,0.30456852791859523,0.406091370558127,0,8.527918781720883,0,9.238578680197262,0]]}],\"Err\":null}",
  "event_type": "resolve",
  "incident_key": "cpu/kapacitor1",
  "service_key": "<SERVICE KEY>"
}

The only downside I can see here is that an .info will also resolve the alert, but I would assume that people won't want to be alerted on .info.

@rossmcdonald rossmcdonald changed the title WIP - PagerDuty Enhancement PagerDuty Enhancement Mar 9, 2016
@rossmcdonald rossmcdonald added this to the v0.11 milestone Mar 9, 2016
@nathanielc
Copy link
Contributor

@rossmcdonald What if info alerts where just ignored by PagerDuty? Maybe log an error saying infos are ignore and then ignore it? That way infos are not side effecting and the concept of an informational alert doesn't work with PagerDuty since it will page someone. Thoughts?

@nathanielc
Copy link
Contributor

Overall looks great, thanks for digging in.

@rossmcdonald
Copy link
Contributor Author

@nathanielc I completely agree. Erring out will be much more straight-forward and ensure an alert is not accidentally resolved prematurely.

How does this look?

[cpu_alert:alert2] 2016/03/09 22:32:40 E! failed to send alert data to PagerDuty: AlertLevel 'info' is currently ignored by the PagerDuty service

@nathanielc
Copy link
Contributor

👍

rossmcdonald added a commit that referenced this pull request Mar 9, 2016
@rossmcdonald rossmcdonald merged commit c9d0f34 into master Mar 9, 2016
@rossmcdonald rossmcdonald deleted the ross-pagerduty-update branch March 9, 2016 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PagerDuty Service Enhancement
2 participants