Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add text boxes and descriptions to reads and writes dashboards #324

Merged
merged 38 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4642b5c
feature: add some text boxes and descriptions
darrenjaneczek Jun 8, 2021
77f8609
fix: text replacements, repair addRows
darrenjaneczek Jun 9, 2021
c4db3e1
fix: changelog
darrenjaneczek Jun 9, 2021
9e6c2f4
Changing copy to add 'latency' as well.
Jun 13, 2021
7a7b13c
Cut down on text from initial PR. Tucked existing text from the compa…
Jun 13, 2021
acc320a
Getting rid of a few space/comma errors.
Jun 13, 2021
8368248
Update CHANGELOG.md
darrenjaneczek Jun 15, 2021
6ad57cd
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
c33303a
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
cb7054c
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
357db43
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
19cb601
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
4735870
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
fa48a91
fix: formatting - limit to 4 panels per row
darrenjaneczek Jun 15, 2021
dafb212
Merge branch 'darrenjaneczek/dashboard-descriptions-reads-writes' of …
darrenjaneczek Jun 15, 2021
c7b7871
fmt
darrenjaneczek Jun 15, 2021
6c0066c
fix: remove accidental line
darrenjaneczek Jun 15, 2021
773926a
Update cortex-mixin/dashboards/dashboard-utils.libsonnet
darrenjaneczek Jun 15, 2021
a12d815
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
b335df9
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
73e65cf
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
13f0fa3
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
6c0ebb8
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
ea7d87d
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
4aed696
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
c411115
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
0c17f02
fix: Requests per second
darrenjaneczek Jun 15, 2021
b8ccacc
fix: text
darrenjaneczek Jun 15, 2021
d5b14c1
Apply suggestions from code review as per @osg-grafana
darrenjaneczek Jun 15, 2021
b22d22e
fix: clarity
darrenjaneczek Jun 15, 2021
dffe62a
Apply suggestions from code review as per @osg-grafana
darrenjaneczek Jun 15, 2021
2aae011
Merge branch 'darrenjaneczek/dashboard-descriptions-reads-writes' of …
darrenjaneczek Jun 15, 2021
eafdbfc
fix: query formatting to aid in merge
darrenjaneczek Jun 17, 2021
dddd6e7
fix: query formatting to aid in merge
darrenjaneczek Jun 17, 2021
fcc4896
fix: consistent labelling
darrenjaneczek Jun 17, 2021
513b096
fix: ensure panel titles are consistent
darrenjaneczek Jun 17, 2021
5794607
fix: resolve review feedback
darrenjaneczek Jun 21, 2021
4fb7275
Merge branch 'main' into darrenjaneczek/dashboard-descriptions-reads-…
pracucci Jun 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* [CHANGE] Renamed `CortexInconsistentConfig` alert to `CortexInconsistentRuntimeConfig` and increased severity to `critical`. #335
* [CHANGE] Increased `CortexBadRuntimeConfig` alert severity to `critical` and removed support for `cortex_overrides_last_reload_successful` metric (was removed in Cortex 1.3.0). #335
* [ENHANCEMENT] cortex-mixin: Make `cluster_namespace_deployment:kube_pod_container_resource_requests_{cpu_cores,memory_bytes}:sum` backwards compatible with `kube-state-metrics` v2.0.0. #317
* [ENHANCEMENT] Added documentation text panels and descriptions to reads and writes dashboards. #324
* [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308
* [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329
* [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335
Expand Down
6 changes: 6 additions & 0 deletions cortex-mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,11 @@

// The label used to differentiate between different nodes (i.e. servers).
per_node_label: 'instance',

// Whether certain dashboard description headers should be shown
show_dashboard_descriptions: {
writes: true,
reads: true,
},
},
}
62 changes: 37 additions & 25 deletions cortex-mixin/dashboards/compactor.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,6 @@ local utils = import 'mixin-utils/utils.libsonnet';
.addClusterSelectorTemplates()
.addRow(
$.row('Summary')
.addPanel(
$.textPanel('', |||
- **Per-instance runs**: number of times a compactor instance triggers a compaction across all tenants its shard manage.
- **Tenants compaction progress**: in a multi-tenant cluster it shows the progress of tenants compacted while compaction is running. Reset to 0 once the compaction run is completed for all tenants in the shard.
|||),
)
.addPanel(
$.startedCompletedFailedPanel(
'Per-instance runs / sec',
Expand All @@ -20,7 +14,13 @@ local utils = import 'mixin-utils/utils.libsonnet';
'sum(rate(cortex_compactor_runs_failed_total{%s}[$__rate_interval]))' % $.jobMatcher($._config.job_names.compactor)
) +
$.bars +
{ yaxes: $.yaxes('ops') },
{ yaxes: $.yaxes('ops') } +
$.panelDescription(
'Per-instance runs',
|||
Number of times a compactor instance triggers a compaction across all tenants that it manages.
|||
),
)
.addPanel(
$.panel('Tenants compaction progress') +
Expand All @@ -31,42 +31,55 @@ local utils = import 'mixin-utils/utils.libsonnet';
cortex_compactor_tenants_skipped{%s}
) / cortex_compactor_tenants_discovered{%s}
||| % [$.jobMatcher($._config.job_names.compactor), $.jobMatcher($._config.job_names.compactor), $.jobMatcher($._config.job_names.compactor), $.jobMatcher($._config.job_names.compactor)], '{{%s}}' % $._config.per_instance_label) +
{ yaxes: $.yaxes({ format: 'percentunit', max: 1 }) },
{ yaxes: $.yaxes({ format: 'percentunit', max: 1 }) } +
$.panelDescription(
'Tenants compaction progress',
|||
In a multi-tenant cluster, display the progress of tenants that are compacted while compaction is running.
Reset to <tt>0</tt> after the compaction run is completed for all tenants in the shard.
|||
),
)
)
.addRow(
$.row('')
.addPanel(
$.textPanel('', |||
- **Compacted blocks**: number of blocks generated as a result of a compaction operation.
- **Per-block compaction duration**: time taken to generate a single compacted block.
|||),
)
.addPanel(
$.panel('Compacted blocks / sec') +
$.queryPanel('sum(rate(prometheus_tsdb_compactions_total{%s}[$__rate_interval]))' % $.jobMatcher($._config.job_names.compactor), 'blocks') +
{ yaxes: $.yaxes('ops') },
{ yaxes: $.yaxes('ops') } +
$.panelDescription(
'Compacted blocks / sec',
|||
Rate of blocks that are generated as a result of a compaction operation.
|||
),
)
.addPanel(
$.panel('Per-block compaction duration') +
$.latencyPanel('prometheus_tsdb_compaction_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor))
$.latencyPanel('prometheus_tsdb_compaction_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor)) +
$.panelDescription(
'Per-block compaction duration',
|||
Display the amount of time that it has taken to generate a single compacted block.
|||
),
)
)
.addRow(
$.row('')
.addPanel(
$.textPanel('', |||
- **Average blocks / tenant**: the average number of blocks per tenant.
- **Tenants with largest number of blocks**: the 10 tenants with the largest number of blocks.
|||),
)
.addPanel(
$.panel('Average blocks / tenant') +
$.queryPanel('avg(max by(user) (cortex_bucket_blocks_count{%s}))' % $.jobMatcher($._config.job_names.compactor), 'avg'),
)
.addPanel(
$.panel('Tenants with largest number of blocks') +
$.queryPanel('topk(10, max by(user) (cortex_bucket_blocks_count{%s}))' % $.jobMatcher($._config.job_names.compactor), '{{user}}'),
$.queryPanel('topk(10, max by(user) (cortex_bucket_blocks_count{%s}))' % $.jobMatcher($._config.job_names.compactor), '{{user}}') +
$.panelDescription(
'Tenants with largest number of blocks',
|||
The 10 tenants with the largest number of blocks.
|||
),
)
)
.addRow(
Expand Down Expand Up @@ -103,6 +116,5 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.latencyPanel('cortex_compactor_meta_sync_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor)),
)
)
.addRow($.objectStorePanels1('Object Store', 'compactor'))
.addRow($.objectStorePanels2('', 'compactor')),
.addRows($.getObjectStoreRows('Object Store', 'compactor')),
}
146 changes: 118 additions & 28 deletions cortex-mixin/dashboards/dashboard-utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,24 @@ local utils = import 'mixin-utils/utils.libsonnet';
then self.addRow(row)
else self,

addRowsIf(condition, rows)::
if condition
then
local reduceRows(dashboard, remainingRows) =
if (std.length(remainingRows) == 0)
then dashboard
else
reduceRows(
dashboard.addRow(remainingRows[0]),
std.slice(remainingRows, 1, std.length(remainingRows), 1)
)
;
reduceRows(self, rows)
else self,

addRows(rows)::
self.addRowsIf(true, rows),

addClusterSelectorTemplates(multi=true)::
local d = self {
tags: $._config.tags,
Expand Down Expand Up @@ -43,7 +61,6 @@ local utils = import 'mixin-utils/utils.libsonnet';
else d
.addTemplate('cluster', 'cortex_build_info', 'cluster')
.addTemplate('namespace', 'cortex_build_info{cluster=~"$cluster"}', 'namespace'),

},

// The mixin allow specialism of the job selector depending on if its a single binary
Expand Down Expand Up @@ -274,7 +291,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
type: 'text',
} + options,

objectStorePanels1(title, component)::
getObjectStoreRows(title, component):: [
super.row(title)
.addPanel(
$.panel('Operations / sec') +
Expand All @@ -288,62 +305,135 @@ local utils = import 'mixin-utils/utils.libsonnet';
{ yaxes: $.yaxes('percentunit') },
)
.addPanel(
$.panel('Op: Attributes') +
$.panel('Latency of Op: Attributes') +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the user see this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached screenshot, @osg-grafana
image

$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="attributes"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: Exists') +
$.panel('Latency of Op: Exists') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="exists"}' % [$.namespaceMatcher(), component]),
),

// Second row of Object Store stats
objectStorePanels2(title, component)::
super.row(title)
$.row('')
.addPanel(
$.panel('Op: Get') +
$.panel('Latency of Op: Get') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="get"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: GetRange') +
$.panel('Latency of Op: GetRange') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="get_range"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: Upload') +
$.panel('Latency of Op: Upload') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="upload"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: Delete') +
$.panel('Latency of Op: Delete') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="delete"}' % [$.namespaceMatcher(), component]),
),
],

thanosMemcachedCache(title, jobName, component, cacheName)::
local config = {
jobMatcher: $.jobMatcher(jobName),
component: component,
cacheName: cacheName,
};
super.row(title)
.addPanel(
$.panel('QPS') +
$.queryPanel('sum by(operation) (rate(thanos_memcached_operations_total{%s,component="%s",name="%s"}[$__rate_interval]))' % [$.jobMatcher(jobName), component, cacheName], '{{operation}}') +
$.panel('Requests / sec') +
$.queryPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

|||
sum by(operation) (
rate(
thanos_memcached_operations_total{
%(jobMatcher)s,
component="%(component)s",
name="%(cacheName)s"
}[$__rate_interval]
)
)
||| % config,
'{{operation}}'
) +
$.stack +
{ yaxes: $.yaxes('ops') },
{ yaxes: $.yaxes('ops') }
)
.addPanel(
$.panel('Latency (getmulti)') +
$.latencyPanel('thanos_memcached_operation_duration_seconds', '{%s,operation="getmulti",component="%s",name="%s"}' % [$.jobMatcher(jobName), component, cacheName])
$.latencyPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

'thanos_memcached_operation_duration_seconds',
|||
{
%(jobMatcher)s,
operation="getmulti",
component="%(component)s",
name="%(cacheName)s"
darrenjaneczek marked this conversation as resolved.
Show resolved Hide resolved
}
||| % config
)
)
.addPanel(
$.panel('Hit ratio') +
$.queryPanel('sum(rate(thanos_cache_memcached_hits_total{%s,component="%s",name="%s"}[$__rate_interval])) / sum(rate(thanos_cache_memcached_requests_total{%s,component="%s",name="%s"}[$__rate_interval]))' %
[
$.jobMatcher(jobName),
component,
cacheName,
$.jobMatcher(jobName),
component,
cacheName,
], 'items') +
{ yaxes: $.yaxes('percentunit') },
$.queryPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

|||
sum(
rate(
thanos_cache_memcached_hits_total{
%(jobMatcher)s,
component="%(component)s",
name="%(cacheName)s"
}[$__rate_interval]
)
)
/
sum(
rate(
thanos_cache_memcached_requests_total{
%(jobMatcher)s,
component="%(component)s",
name="%(cacheName)s"
}[$__rate_interval]
)
)
||| % config,
'items'
) +
{ yaxes: $.yaxes('percentunit') }
),

filterNodeDiskContainer(containerName)::
|||
ignoring(%s) group_right() (label_replace(count by(%s, %s, device) (container_fs_writes_bytes_total{%s,container="%s",device!~".*sda.*"}), "device", "$1", "device", "/dev/(.*)") * 0)
||| % [$._config.per_instance_label, $._config.per_node_label, $._config.per_instance_label, $.namespaceMatcher(), containerName],
ignoring(%s) group_right() (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

label_replace(
count by(
%s,
%s,
device
)
(
container_fs_writes_bytes_total{
%s,
container="%s",
device!~".*sda.*"
}
),
"device",
"$1",
"device",
"/dev/(.*)"
) * 0
)
||| % [
$._config.per_instance_label,
$._config.per_node_label,
$._config.per_instance_label,
$.namespaceMatcher(),
containerName,
],

panelDescription(title, description):: {
description: |||
### %s
%s
||| % [title, description],
},
}
Loading