Skip to content

Commit

Permalink
Moved mixin to monitoring, removed Cassandra-specific alerts/panels
Browse files Browse the repository at this point in the history
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
  • Loading branch information
jpkrohling committed Aug 7, 2019
1 parent a809b49 commit a30b0a0
Show file tree
Hide file tree
Showing 7 changed files with 121 additions and 84 deletions.
27 changes: 0 additions & 27 deletions examples/jaeger-mixin/README.md

This file was deleted.

95 changes: 95 additions & 0 deletions monitoring/jaeger-mixin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Prometheus monitoring mixin for Jaeger

The Prometheus monitoring mixin for Jaeger provides a starting point for people wanting to monitor Jaeger using Prometheus, Alertmanager, and Grafana. To use it, you'll need [`jsonnet`](https://github.com/google/go-jsonnet) and [`jb` (jsonnet-bundler)](https://github.com/jsonnet-bundler/jsonnet-bundler). They can be installed using `go get`, as follows:

```console
$ go get github.com/google/go-jsonnet/cmd/jsonnet
$ go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
```

Your monitoring mixin can then be initialized as follows:

```console
$ jb init
$ jb install \
github.com/jaegertracing/jaeger/monitoring/jaeger-mixin@master \
github.com/grafana/jsonnet-libs/grafana-builder@master \
github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@master
```

In the directory where your mixin was initialized, create a new `monitoring-setup.jsonnet`, specifying how your monitoring stack should look like: this file is yours, any customizations to Prometheus, Grafana, or Alertmanager should take place here. A simple example providing only the Jaeger dashboard for Grafana would be:

```jsonnet
local jaegerDashboard = (import 'jaeger-mixin/mixin.libsonnet').grafanaDashboards;
{ ['dashboards-jaeger.json']: jaegerDashboard['jaeger.json'] }
```

The manifest files can be generated via the `jsonnet` command below. Once the command finishes, the file `manifests/dashboards-jaeger.json` should be available and can be loaded directly into Grafana.

```console
$ jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet
```

An example producing the manifests for a complete monitoring stack is located in this directory, as `monitoring-setup.example.jsonnet`. The manifests include Prometheus, Grafana, and Alertmanager managed via the Prometheus Operator for Kubernetes.

```jsonnet
local jaegerAlerts = (import 'jaeger-mixin/alerts.libsonnet').prometheusAlerts;
local jaegerDashboard = (import 'jaeger-mixin/mixin.libsonnet').grafanaDashboards;
local kp =
(import 'kube-prometheus/kube-prometheus.libsonnet') +
{
_config+:: {
namespace: 'monitoring',
},
grafanaDashboards+:: {
'jaeger.json': jaegerDashboard['jaeger.json'],
},
prometheusAlerts+:: jaegerAlerts,
};
{ ['00namespace-' + name + '.json']: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name + '.json']: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name + '.json']: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name + '.json']: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name + '.json']: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name + '.json']: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name + '.json']: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name + '.json']: kp.grafana[name] for name in std.objectFields(kp.grafana) }
```

The manifest files can be generated via `jsonnet` and passed directly to `kubectl`:

```console
$ jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet
$ kubectl apply -f manifests/
```

The resulting manifests will include everything that is needed to have a Prometheus, Alertmanager, and Grafana instances. Whenever a new alert rule is needed, or a new dashboard has to be defined, change your `monitoring-setup.jsonnet`, re-generate and re-apply the manifests.

Make sure your Prometheus setup is properly scraping the Jaeger components, either by creating a `ServiceMonitor` (and the backing `Service` objects), or via `PodMonitor` resources, like:

```console
$ kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: tracing
namespace: monitoring
spec:
podMetricsEndpoints:
- interval: 5s
targetPort: 14269
selector:
matchLabels:
app: jaeger
EOF
```

This `PodMonitor` tells Prometheus to scrape the port `14269` from all pods containing the label `app: jaeger`. If you have the Jaeger Collector, Agent, and Query in different pods, you might need to adjust or create further `PodMonitor` resources to scrape metrics from the other ports.

This mixin was originally developed by [Grafana Labs](https://github.com/grafana/jsonnet-libs/tree/master/jaeger-mixin).

## Background

* For more information about monitoring mixins, see this [design doc](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/view).
Original file line number Diff line number Diff line change
Expand Up @@ -109,30 +109,6 @@ local percentErrsWithTotal(metric_errs, metric_total) = '100 * sum(rate(%(metric
{{ $labels.job }} {{ $labels.instance }} is seeing {{ printf "%.2f" $value }}% query errors on {{ $labels.operation }}.
|||,
},
}, {
alert: 'JaegerCassandraWritesFailing',
expr: percentErrsWithTotal('jaeger_cassandra_errors_total', 'jaeger_cassandra_attempts_total') + '> 1',
'for': '15m',
labels: {
severity: 'warning',
},
annotations: {
message: |||
{{ $labels.job }} {{ $labels.instance }} is seeing {{ printf "%.2f" $value }}% query errors on {{ $labels.operation }}.
|||,
},
}, {
alert: 'JaegerCassandraReadsFailing',
expr: percentErrsWithTotal('jaeger_cassandra_read_errors_total', 'jaeger_cassandra_read_attempts_total') + '> 1',
'for': '15m',
labels: {
severity: 'warning',
},
annotations: {
message: |||
{{ $labels.job }} {{ $labels.instance }} is seeing {{ printf "%.2f" $value }}% query errors on {{ $labels.operation }}.
|||,
},
}],
},
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ local g = (import 'grafana-builder/grafana.libsonnet') + {

{
grafanaDashboards+: {
'jaeger-write.json':
g.dashboard('Jaeger / Write')
'jaeger.json':
g.dashboard('Jaeger')
.addRow(
g.row('Services')
.addPanel(
Expand Down Expand Up @@ -74,7 +74,7 @@ local g = (import 'grafana-builder/grafana.libsonnet') + {
)
)
.addRow(
g.row('Collector - Queue Stats')
g.row('Collector Queue')
.addPanel(
g.panel('span queue length') +
g.queryPanel('jaeger_collector_queue_length', '{{instance}}') +
Expand All @@ -85,23 +85,6 @@ local g = (import 'grafana-builder/grafana.libsonnet') + {
g.queryPanel('histogram_quantile(0.95, sum(rate(jaeger_collector_in_queue_latency_bucket[1m])) by (le, instance))', '{{instance}}')
)
)
.addRow(
g.row('Cassandra')
.addPanel(
g.panel('insert attempt rate') +
g.qpsPanelErrTotal('jaeger_cassandra_errors_total', 'jaeger_cassandra_attempts_total') +
g.stack
)
.addPanel(
g.panel('% inserts erroring') +
g.queryPanel('sum(rate(jaeger_cassandra_errors_total[1m])) by (instance) / sum(rate(jaeger_cassandra_attempts_total[1m])) by (instance)', '{{instance}}') +
{ yaxes: g.yaxes({ format: 'percentunit', max: 1 }) } +
g.stack,
)
),

'jaeger-read.json':
g.dashboard('Jaeger / Read')
.addRow(
g.row('Query')
.addPanel(
Expand All @@ -114,19 +97,6 @@ local g = (import 'grafana-builder/grafana.libsonnet') + {
g.queryPanel('histogram_quantile(0.99, sum(rate(jaeger_query_latency_bucket[1m])) by (le, instance))', '{{instance}}') +
g.stack
)
)
.addRow(
g.row('Cassandra')
.addPanel(
g.panel('qps') +
g.qpsPanelErrTotal('jaeger_cassandra_read_errors_total', 'jaeger_cassandra_read_attempts_total') +
g.stack
)
.addPanel(
g.panel('latency - 99 percentile') +
g.queryPanel('histogram_quantile(0.99, sum(rate(jaeger_cassandra_read_latency_ok_bucket[1m])) by (le, instance))', '{{instance}}') +
g.stack,
)
),
},
}
File renamed without changes.
File renamed without changes.
23 changes: 23 additions & 0 deletions monitoring/jaeger-mixin/monitoring-setup.example.jsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
local jaegerAlerts = (import 'jaeger-mixin/alerts.libsonnet').prometheusAlerts;
local jaegerDashboard = (import 'jaeger-mixin/mixin.libsonnet').grafanaDashboards;

local kp =
(import 'kube-prometheus/kube-prometheus.libsonnet') +
{
_config+:: {
namespace: 'monitoring',
},
grafanaDashboards+:: {
'jaeger.json': jaegerDashboard['jaeger.json'],
},
prometheusAlerts+:: jaegerAlerts,
};

{ ['00namespace-' + name + '.json']: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name + '.json']: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name + '.json']: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name + '.json']: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name + '.json']: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name + '.json']: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name + '.json']: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name + '.json']: kp.grafana[name] for name in std.objectFields(kp.grafana) }

0 comments on commit a30b0a0

Please sign in to comment.