Skip to content

Latest commit

 

History

History
90 lines (60 loc) · 3.6 KB

README.md

File metadata and controls

90 lines (60 loc) · 3.6 KB

azure-alerts

Pinja.Azure.Alerts

This is common baseline solution to apply good default alerts to any azure deployment with few simple commands. Alerts are applied based on resource types to all matching resources in resource group.

Goal is also to support more complicated alerts when needed by extending, overwriting existing alerts or by creting new ones. When none of these are applicaple you can still use underlaying Az powershell to create that specific alert using same infrastructure like action groups.

Requirements

Applying good defaults

This is basic setup that cause alerts trigger in most common situations.

You have to define receiver ActionGroupReceiver, which is part that sends alerts to channels like email or webhooks.

Install-Module Pinja.Azure.Alerts
Import-Module Pinja.Azure.Alerts

$receiver = New-AzActionGroupReceiver `
    -Name 'alerta-webhook' `
    -WebhookReceiver `
    -ServiceUri "http://your.alerta.domain/webhooks/azuremonitor" `
    -UseCommonAlertSchema

Get-DefaultAlertRules | Set-AlertRules -ResourceGroup [Your resource group] -ActionGroupReceiver $receiver

Note that Set-AlertRules supports -WhatIf parameter for dry runs that makes developing alert rules much easier.

For full documentation see:

Get-Help Set-AlertRules -Full

Overwriting special cases

Its common that there is good baseline alert for type but there are exception that either requires addional documentation or different limits.

Naming few:

  • Additional fix or validation steps to documentation.
  • Replace documentation with custom steps.
  • Different configuration for criteria. For example maybe one of api will have problems if CPU goes over 50% istead of default provided.
  • And so on...

Documenting alerts

Idea is that with alert there is builtin documentation for each alert sent in description that contains information how to validate and fix possible situation.

Fix steps are often common, like restarting or upscaling web application and this repository is maintaining common fix steps for those situation.

Validation is usually defined by project as example it may require to login or test actual web application in user perspective how it behaves after alert is triggered before further actions are done.

For this reason there is support to easily extend documentation for specific alerts of resource. As example if payment releated api have increased error rate it is usually good routine to point to test payments instead of something else.

$rules = Get-DefaultAlertRules
$overWrites = New-AlertRuleOverwrite `
        -ResourceType "Microsoft.Web/Sites" `
        -Name "Few Server errors" `
        -FixSteps "https://youAdditionalSteps.com" `
        -ResourceFilter { $_.Name -like "*my-web-api*" } `
        -FixStepsLocation Before

$rules |
    Set-AlertRules -ResourceGroup [Your resource group] -ActionGroupReceiver $receiver -OverWrites $overWrites

Adds additional documentation to alert rule Microsoft.Web/Sites > Few Server errors on web site where resource name matches *my-web-api*.

Others

See New-AlertRuleOverwrite help for full documentation how to override defaults for specific resources.

Get-Help New-AlertRuleOverwrite -Full