Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Commit

Permalink
monitoring: cadvisor observables review (#17239)
Browse files Browse the repository at this point in the history
Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead.

Reworked container restart: use cAdvisor metrics to detect container restarts in all environments

cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite

Shared Group titles: titles are now in `shared` package for consistency and ease of editing
  • Loading branch information
bobheadxi authored Jan 13, 2021
1 parent f85cdb2 commit 7a84129
Show file tree
Hide file tree
Showing 28 changed files with 349 additions and 775 deletions.
504 changes: 94 additions & 410 deletions doc/admin/observability/alert_solutions.md

Large diffs are not rendered by default.

230 changes: 86 additions & 144 deletions doc/admin/observability/dashboards.md

Large diffs are not rendered by default.

34 changes: 34 additions & 0 deletions doc/dev/background-information/observability/cadvisor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Sourcegraph cAdvisor

We ship a custom [cAdvisor](https://github.com/google/cadvisor) image as part of the standard Sourcegraph Kubernetes and docker-compose distribution.
cAdvisor exports container monitoring metrics scraped by [Prometheus](./prometheus.md) and visualized in [Grafana](./grafana.md).

The image is defined in [`docker-images/cadvisor`](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/docker-images/cadvisor).

## Monitoring

Monitoring on cAdvisor metrics is defined in the [monitoring generator](./monitoring-generator.md).
cAdvisor observables are generally defined as [shared observables](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring/definitions/shared).

When adding monitoring on cAdvisor metrics, please ensure that the [metric can be identified](#identifying-containers) (if not, it is likely the [metric is not supported](#available-metrics)).

## Identifying containers

How relevant containers are identified from exported cAdvisor metrics is documented in [`CadvisorNameMatcher`](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+type:symbol+CadvisorNameMatcher&patternType=literal), which generates the label matcher for [monitoring observables](#monitoring).

Because cAdvisor run on a *machine* and exports *container* metrics, standard strategies for identifying what container a metric belongs to (such as Prometheus scrape target labels) cannot be used, because all the metrics look like they belong to cAdvisor.
Making things complicated is how containers are identified on various environments (namely Kubernetes and docker-compose) varies, sometimes due to characteristics of the environments and sometimes due to naming inconsistencies within Sourcegraph.
Variations in how cAdvisor generates the `name` label it provides also makes things difficult (in some environments, it cannot generate one at all!), so we might have to create a custom naming strategy.

## Available metrics

Exported metrics are documented in the [cAdvisor Prometheus metrics list](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics).
In the list, the column `-disable_metrics parameter` indicates the "group" the metric belongs in.

Container runtime and deployment environment compatability for various metrics seem to be grouped by these groups - before using a metric, ensure that the metric is supported in all relevant environments (for example, both Docker and `containerd` container runtimes).
Support is generally poorly documented, but a search through the [cAdvisor repository issues](https://github.com/google/cadvisor/issues) might provide some hints.

### Known issues

- `disk` metrics are not available in `containerd`: [cadvisor#2785](https://github.com/google/cadvisor/issues/2785)
- `diskIO` metrics do not seem to be available in Kubernetes: [sourcegraph#12163](https://github.com/sourcegraph/sourcegraph/issues/12163)
1 change: 1 addition & 0 deletions doc/dev/background-information/observability/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,5 @@ Observability at Sourcegraph includes:
- [Monitoring generator](./monitoring-generator.md)
- [Sourcegraph Grafana](./grafana.md)
- [Sourcegraph Prometheus](./prometheus.md)
- [Sourcegraph cAdvisor](./cadvisor.md)
- [Observability for site administrators](../../../admin/observability/index.md)
3 changes: 3 additions & 0 deletions docker-images/cadvisor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Sourcegraph cAdvisor

Learn more about the Sourcegraph cAdvisor distribution in the [cAdvisor documentation](https://docs.sourcegraph.com/dev/background-information/observability/cadvisor).

This file was deleted.

9 changes: 4 additions & 5 deletions monitoring/definitions/executor_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ func ExecutorQueue() *monitoring.Container {
},
},
{
Title: "Container monitoring (not available on server)",
Title: shared.TitleContainerMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -109,12 +109,11 @@ func ExecutorQueue() *monitoring.Container {
},
{
shared.ContainerRestarts("executor-queue", monitoring.ObservableOwnerCodeIntel).Observable(),
shared.ContainerFsInodes("executor-queue", monitoring.ObservableOwnerCodeIntel).Observable(),
},
},
},
{
Title: "Provisioning indicators (not available on server)",
Title: shared.TitleProvisioningIndicators,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -128,7 +127,7 @@ func ExecutorQueue() *monitoring.Container {
},
},
{
Title: "Golang runtime monitoring",
Title: shared.TitleGolangMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -138,7 +137,7 @@ func ExecutorQueue() *monitoring.Container {
},
},
{
Title: "Kubernetes monitoring (ignore if using Docker Compose or server)",
Title: shared.TitleKubernetesMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand Down
9 changes: 4 additions & 5 deletions monitoring/definitions/frontend.go
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,7 @@ func Frontend() *monitoring.Container {
},
},
{
Title: "Container monitoring (not available on server)",
Title: shared.TitleContainerMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -729,12 +729,11 @@ func Frontend() *monitoring.Container {
},
{
shared.ContainerRestarts(containerName, monitoring.ObservableOwnerCloud).Observable(),
shared.ContainerFsInodes(containerName, monitoring.ObservableOwnerCloud).Observable(),
},
},
},
{
Title: "Provisioning indicators (not available on server)",
Title: shared.TitleProvisioningIndicators,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -748,7 +747,7 @@ func Frontend() *monitoring.Container {
},
},
{
Title: "Golang runtime monitoring",
Title: shared.TitleGolangMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -758,7 +757,7 @@ func Frontend() *monitoring.Container {
},
},
{
Title: "Kubernetes monitoring (ignore if using Docker Compose or server)",
Title: shared.TitleKubernetesMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand Down
20 changes: 4 additions & 16 deletions monitoring/definitions/git_server.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package definitions

import (
"fmt"
"time"

"github.com/sourcegraph/sourcegraph/monitoring/definitions/shared"
Expand Down Expand Up @@ -96,7 +95,7 @@ func GitServer() *monitoring.Container {
},
},
{
Title: "Container monitoring (not available on server)",
Title: shared.TitleContainerMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -105,23 +104,12 @@ func GitServer() *monitoring.Container {
},
{
shared.ContainerRestarts("gitserver", monitoring.ObservableOwnerCloud).Observable(),
shared.ContainerFsInodes("gitserver", monitoring.ObservableOwnerCloud).Observable(),
},
{
{
Name: "fs_io_operations",
Description: "filesystem reads and writes rate by instance over 1h",
Query: fmt.Sprintf(`sum by(name) (rate(container_fs_reads_total{%[1]s}[1h]) + rate(container_fs_writes_total{%[1]s}[1h]))`, shared.CadvisorNameMatcher("gitserver")),
Warning: monitoring.Alert().GreaterOrEqual(5000),
Panel: monitoring.Panel().LegendFormat("{{name}}"),
Owner: monitoring.ObservableOwnerCloud,
PossibleSolutions: "none",
},
shared.ContainerIOUsage("gitserver", monitoring.ObservableOwnerCloud).Observable(),
},
},
},
{
Title: "Provisioning indicators (not available on server)",
Title: shared.TitleProvisioningIndicators,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -139,7 +127,7 @@ func GitServer() *monitoring.Container {
},
},
{
Title: "Golang runtime monitoring",
Title: shared.TitleGolangMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand Down
9 changes: 4 additions & 5 deletions monitoring/definitions/github_proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ func GitHubProxy() *monitoring.Container {
},
},
{
Title: "Container monitoring (not available on server)",
Title: shared.TitleContainerMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -41,12 +41,11 @@ func GitHubProxy() *monitoring.Container {
},
{
shared.ContainerRestarts("github-proxy", monitoring.ObservableOwnerCloud).Observable(),
shared.ContainerFsInodes("github-proxy", monitoring.ObservableOwnerCloud).Observable(),
},
},
},
{
Title: "Provisioning indicators (not available on server)",
Title: shared.TitleProvisioningIndicators,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -60,7 +59,7 @@ func GitHubProxy() *monitoring.Container {
},
},
{
Title: "Golang runtime monitoring",
Title: shared.TitleGolangMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -70,7 +69,7 @@ func GitHubProxy() *monitoring.Container {
},
},
{
Title: "Kubernetes monitoring (ignore if using Docker Compose or server)",
Title: shared.TitleKubernetesMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand Down
4 changes: 2 additions & 2 deletions monitoring/definitions/postgres.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ func Postgres() *monitoring.Container {
},
},
{
Title: "Provisioning indicators (not available on server)",
Title: shared.TitleProvisioningIndicators,
Hidden: true,
// See docstring for databaseContainerNames
Rows: []monitoring.Row{
Expand All @@ -111,7 +111,7 @@ func Postgres() *monitoring.Container {
},
},
{
Title: "Kubernetes monitoring (ignore if using Docker Compose or server)",
Title: shared.TitleKubernetesMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand Down
9 changes: 4 additions & 5 deletions monitoring/definitions/precise_code_intel_indexer.go
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ func PreciseCodeIntelIndexer() *monitoring.Container {
},
},
{
Title: "Container monitoring (not available on server)",
Title: shared.TitleContainerMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -145,12 +145,11 @@ func PreciseCodeIntelIndexer() *monitoring.Container {
},
{
shared.ContainerRestarts("precise-code-intel-worker", monitoring.ObservableOwnerCodeIntel).Observable(),
shared.ContainerFsInodes("precise-code-intel-worker", monitoring.ObservableOwnerCodeIntel).Observable(),
},
},
},
{
Title: "Provisioning indicators (not available on server)",
Title: shared.TitleProvisioningIndicators,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -164,7 +163,7 @@ func PreciseCodeIntelIndexer() *monitoring.Container {
},
},
{
Title: "Golang runtime monitoring",
Title: shared.TitleGolangMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand All @@ -174,7 +173,7 @@ func PreciseCodeIntelIndexer() *monitoring.Container {
},
},
{
Title: "Kubernetes monitoring (ignore if using Docker Compose or server)",
Title: shared.TitleKubernetesMonitoring,
Hidden: true,
Rows: []monitoring.Row{
{
Expand Down
Loading

0 comments on commit 7a84129

Please sign in to comment.