Alertmonitor is a webapp for displaying alerts from Prometheus or other data sources. Alerts are received from Alertmanager (push notifications) or they can be retrieved directly from Prometheus (polling).
Combination of both push and poll mechanisms assure always up-to-date state of alerts without any data loss.
Alertmonitor receives alerts from Alertmanager on the HTTP endpoint: /alertmonitor/webhook
.
Alternatively, if webhook receiver is not configured, Alertmonitor can pull alerts directly from Prometheus.
Ideally both approaches should be used in combination. This way you'll always receive alert immediately when it is fired, and yet it
synchronizes alerts with Prometheus in case if any alert (or clear) has been lost.
Alertmonitor automatically correlates firing and resolving alerts to display current state of active alarms.
Alertmonitor displays monitored objects as Instances or SmartTargets. Instance is one particular exporter on a server, while SmartTarget combines all instances on the same server and displays all active alerts on that server.
Alertmonitor supports PromQL for making queries and range queries.
Tags provide a quick way of filtering alerts.
Alertmonitor can store alerts in MongoDB database. It stores all incoming messages and whole journal (history) of alerts.
Screenshots:
Deploy Alertmonitor on Docker:
docker run -d -p 8080:8080 matjaz99/alertmonitor:latest
Open web browser and go to address: http://hostname:8080/alertmonitor/
There is also docker-compose.yml
file available for deployment in Swarm cluster.
Docker images are available on Docker hub: https://hub.docker.com/r/matjaz99/alertmonitor
Alertmonitor strongly relies on properly configured labels in alert rules. Placing additional labels into alert rules will enrich the information that alert carries, such as: severity, metric labels, current metric value, alert tags or group name.
Alertmonitor recognizes the following labels:
Label | Description |
---|---|
severity | Optional (default=indeterminate). Severity is the weight of event. Possible values: critical , major , minor , warning , clear and informational |
priority | Optional (default=low). Priority tells how urgent is alarm. Possible values: high , medium , low |
info | Mandatory. Detailed information about the alert. Important: Info may not contain variables which change over the time (such as current metric value), because it creates new time series of alerts each time and the correlation will not work.! |
instance | Optional. instance is usually already included in metric, but sometimes if alert rule doesn't return instance, you can provide its value here by any other means. Usually IP address and port of exporter. |
nodename | Optional. Name of this instance. |
tags | Optional. Custom tags that describe the alert (comma separated). Tags are used for quick filtering in Alertmonitor. |
group | Optional. Custom group name. |
url | Optional. Custom URL that is related to alert. |
eventType | Optional. Event type according to IUT-T X.733 recommendation |
probableCause | Optional. Probable cause according to IUT-T X.733 recommendation |
description | Optional. Additional description. Value is read from a labels section if it exists, otherwise from annotations section. |
currentValue | Optional. Current metric value. Get it with: {{ humanize $value }} . Append units (eg. % or MB) if you need to do so. Important: Current value may not be in labels section of alert rule but inside annotations ! |
Alert's
correlationId
is defined by:alertname
,info
,instance
andjob
. Clear event should produce the samecorrelationId
.
It is highly recommended to write alert rules in such manner that
instance
,job
andinfo
are always present in the resulting alert.
Example of alert rule in Prometheus:
groups:
- name: my-alerts
rules:
- alert: High CPU usage
expr: sum(rate(process_cpu_seconds_total[5m])) by (instance) * 100 > 80
for: 1m
labels:
severity: critical
info: CPU alert for Node '{{ $labels.instance }}'
tags: hardware, server, cpu, overload
url: 'http://${GRAFANA_HOSTNAME}/dashboard/'
description: Node {{ $labels.instance }}
annotations:
currentValue: '{{ $value }}%'
In order to receive alerts, configure an Alertmonitor receiver endpoint in alertmanager.yml
configuration file.
route:
receiver: alertmonitor
group_by: [alertname]
group_wait: 10s
group_interval: 5m
repeat_interval: 3h
receivers:
- name: alertmonitor
webhook_configs:
- url: http://alertmonitor:8080/alertmonitor/webhook
send_resolved: true
Since version 3, Alertmonitor supports Data Providers. Data providers are sources that will send data to Alertmonitor. It is possible to define more Prometheus servers or some other sources.
Alertmonitor supports the following data providers:
- Prometheus
- Eventlogger
Data Providers are configured in providers.yml
configuration file.
If no providers are defined, there still exists a single data provider (called .default
) that is constructed from
the environment variables and is bound to the /alertmonitor/webhook
URI endpoint. All requests received on that
endpoint will be processed by .default
data provider process.
All URI endpoints must start with
/alertmonitor/webhook
.
It is not recommended to configure two or more providers to send events to the same URI endpoint!
Alertmonitor offers the following views.
This view shows all currently active alerts.
Active alerts can be filtered by selecting one or more tags. Deselect all tags to show all alerts.
Search (in upper right corner) allows searching alerts by: instance
, alertname
, info
, job
, description
. Search can
be used in combination with tags.
Query view provides a GUI for executing PromQL queries. Two types of queries are supported: query
and query_range
.
When executing query range, start/end date and step are configurable.
This view shows all history of received events. The size of journal is limited by ALERTMONITOR_JOURNAL_SIZE
parameter.
When journal reaches its maximum size, the oldest events will be removed (First in, first out).
Remark: all the data in Alertmonitor is based on journal events. For example, Alertmonitor can only show targets, which have at least one alert recorded in journal.
This view shows raw messages as they were received by the HTTP webhook.
Alertmonitor strips protocol and port from instance
label and what remains is target's hostname or IP address or FQDN.
Alertmonitor then filters alerts and displays those who's hostnames match.
Each target shows number of active alerts and an icon indicating the highest severity of raised alert.
Show configured data providers.
Quick statistics is shown for each data provider:
- number of alerts by severity
- number of received messages/alerts
- timers (up time, time since last event...)
- synchronization success rate
Report is an overview of the whole monitoring system. It shows some statistics of Alertmonitor
and Prometheus. It also shows the status of instances, average availability and other KPIs.
Most KPIs and metrics are queried from Prometheus.
Here it is possible to change some configuration parameters during runtime.
Application meta data, version, build info, maintainers, public channels...
Alertmonitor supports the following data storage options:
- In memory
- MongoDB
- Kafka (experimental)
In memory option works out-of-the-box without any configuration.
To enable MongoDB data storage, set the environment variable ALERTMONITOR_MONGODB_ENABLED
to true
and configure ALERTMONITOR_MONGODB_CONNECTION_STRING
. MongoDB is disabled by default.
Data is deleted according to retention policy (ALERTMONITOR_DATA_RETENTION_DAYS
).
Alertmonitor will automatically create new database alertmonitor
in MongoDB, if it doesn't exist yet.
Kafka is experimental feature. If enabled, Alertmonitor will send all alerts to Kafka topic. This feature might be useful for further processing and analytics of alarms.
Alertmonitor can be configured with environment variables. Variables starting with ALERTMONITOR_*
are related
to behaviour of the application, while other variables may be used for other purposes such as log
configuration or custom environment variable substitution.
With environment variables, only
.default
data provider (Prometheus) can be configured. Other providers are configured inproviders.yml
(see Data Providers). If providers config file is not found, default data provider is constructed from environment variables.
A list of supported environment variables:
EnvVar | Description |
---|---|
ALERTMONITOR_DATAPROVIDERS_CONFIG_FILE | File path to providers.yml configuration file. Default: /opt/alertmonitor/providers.yml |
ALERTMONITOR_DATAPROVIDERS_DEFAULT_PROVIDER_NAME | Name of default data provider. Default: .default |
ALERTMONITOR_DATA_RETENTION_DAYS | History data in days. Default: 7 |
ALERTMONITOR_SYNC_INTERVAL_SEC | Periodic synchronisation interval in seconds. Default: 900 |
ALERTMONITOR_PROMETHEUS_SERVER | The URL of Prometheus server. Default: http://localhost:9090 |
ALERTMONITOR_PROMETHEUS_CLIENT_POOL_SIZE | Pool size of http clients for communication with Prometheus API. Default: 1 |
ALERTMONITOR_HTTP_CLIENT_READ_TIMEOUT_SEC | Timeout of http client requests. Default: 120 |
ALERTMONITOR_HTTP_CLIENT_CONNECT_TIMEOUT_SEC | Connection timeout of http client. Default: 10 |
ALERTMONITOR_DATE_FORMAT | Date format for displaying in GUI. Default: yyyy/MM/dd H:mm:ss |
ALERTMONITOR_KAFKA_ENABLED | Enable or disable publishing to Kafka. This is experimental feature! Default: false |
ALERTMONITOR_KAFKA_SERVER | Hostname and port for Kafka. Default: hostname:9092 |
ALERTMONITOR_KAFKA_TOPIC | Name of topic. Default: alertmonitor_notifications |
ALERTMONITOR_MONGODB_ENABLED | Enable or disable storing data to MongoDB. If disabled, data is stored in memory only. Default: false |
ALERTMONITOR_MONGODB_CONNECTION_STRING | The connection string for MongoDB (username, password and host). |
Alertmonitor supports http
and https
protocols for communication with data providers.
In case of https
, no certificate validation is checked.
Http client in Alertmonitor supports basic authentication when connecting to the data provider (only for https clients).
Currently, username and password can only be provided via ALERTMONITOR_PROMETHEUS_SERVER
variable using syntax https://username:password@hostname:port
.
Prometheus doesn't support substitution of environment variables in alert rules. Alertmonitor does that for you.
Environment variables can be set on system level or directly on docker containers. Example in docker-compose file.
environment:
- GRAFANA_HOSTNAME: my.grafana.domain
Template in labels to be replaced: ${GRAFANA_HOSTNAME}
.
Alertmonitor will automatically replace all occurrences of templates with corresponding environment variables.
Example when comes this handy: you may link an alert with Grafana dashboard by using url
label: http://${GRAFANA_HOSTNAME}/dashboard
.
Alertmonitor will replace all occurrences of template with corresponding environment variable:
url: http://my.grafana.domain/dashboard
.
Environment variable substitution works on the following labels:
nodename
info
tags
url
description
Metrics are available in Prometheus format on URI endpoint:
GET /alertmonitor/metrics
Alertmonitor supports the following metrics in Prometheus format:
alertmonitor_build_info
alertmonitor_providers_info
alertmonitor_webhook_requests_received_total
alertmonitor_journal_events_total
alertmonitor_active_alerts_count
alertmonitor_alerts_balance_factor
alertmonitor_last_event_timestamp
alertmonitor_prom_api_duration_seconds
alertmonitor_sync_success
alertmonitor_sync_interval_seconds
alertmonitor_db_inserts_total
alertmonitor_db_queries_total
alertmonitor_db_updates_total
alertmonitor_db_deletes_total
alertmonitor_db_failures_total
alertmonitor_memory_total_bytes
alertmonitor_memory_free_bytes
alertmonitor_memory_max_bytes
alertmonitor_available_processors
Inside container log files are located in directory /opt/alertmonitor/log
.
Configure the log file location with environment variable SIMPLELOGGER_FILENAME=/opt/alertmonitor/log/alertmonitor.log
Rolling file policy can be also configured. For complete configuration of simple-logger visit https://github.com/matjaz99/simple-logger
Google group for Alertmonitor Users:
https://groups.google.com/g/alertmonitor-users
Alertmonitor is written in Java. It's a maven project. It runs as web app on Apache Tomcat server and uses JSF 2.2 with Primefaces 12.0 for frontend interface.
In version 1.5.1 I switched from Java 8 to Java 13. I had to add javax.annotations
dependency to pom.xml file.
Sorry, simple-logger is not available on Maven central repo. You can either build it on your own
or download jar file from here.
Then manually import it into your local repository:
wget http://matjazcerkvenik.si/download/simple-logger-1.7.0.jar
mvn install:install-file -Dfile=simple-logger-1.7.0.jar -DgroupId=si.matjazcerkvenik.simplelogger -DartifactId=simple-logger -Dversion=1.7.0 -Dpackaging=jar
Run the project with maven:
mvn tomcat7:run
Build docker image and push to docker hub:
docker build -t {{namespace}}/{{image}}:{{tag}} .
docker push {{namespace}}/{{image}}:{{tag}}