Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import cortex mixin from upstream #373

Merged
merged 535 commits into from
Oct 19, 2021
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
535 commits
Select commit Hold shift + click to select a range
fbab4f2
Increased CortexAllocatingTooMuchMemory alert threshold
pracucci Jan 27, 2021
c7a4115
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/256…
pracucci Jan 27, 2021
bbc5c34
Add alert for etcd memory limits close
gouthamve Feb 25, 2021
0ff411b
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/261…
gouthamve Feb 25, 2021
d5ad43a
the distributor now supports push via GRPC (https://github.com/grafan…
replay Mar 5, 2021
adb82b6
Fixed CortexQuerierHighRefetchRate alert
pracucci Mar 9, 2021
667ca36
Fixed label matcher
pracucci Mar 10, 2021
5c75158
Sort legend descending in the CPU/memory panels
pracucci Mar 12, 2021
97806d9
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/271…
pracucci Mar 12, 2021
353f403
Merge branch 'main' into fix-refetch-alert
pracucci Mar 12, 2021
d281bd6
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/268…
pracucci Mar 12, 2021
a7bdd2e
Add slow queries dashboard
pracucci Mar 16, 2021
419eaba
Added tenant ID field to the table
pracucci Mar 16, 2021
c616398
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/272…
pracucci Mar 16, 2021
0774870
Add recording rules to calculate Cortex scaling
tomwilkie Mar 18, 2021
fe87d90
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/278…
tomwilkie Mar 19, 2021
8486777
Increased CortexRequestErrors alert severity
pracucci Mar 23, 2021
1c4dec6
Fixed "Disk Writes" and "Disk Reads" panels
pracucci Mar 23, 2021
aaaefee
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/280…
pracucci Mar 23, 2021
4ff33a3
Pre-compute aggregations to optimize scaling recording rules
pracucci Mar 30, 2021
7ba8424
Removed 5m step from subquery
pracucci Mar 31, 2021
f1fb713
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/284…
pracucci Mar 31, 2021
5c36a63
Add function to customize compactor statefulset
pracucci Apr 2, 2021
d7fbc23
Use the job name in compactor alerts too
pracucci Apr 2, 2021
2715796
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/287…
pracucci Apr 2, 2021
73c6770
Fixed CortexCompactorRunFailed threshold
pracucci Apr 2, 2021
5862286
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/288…
pracucci Apr 2, 2021
a103a95
Added Cortex Rollout progress dashboard
pracucci Apr 2, 2021
b0d4587
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/289…
pracucci Apr 6, 2021
efa84d4
Fix 'Unhealthy pods' in Cortex Rollout dashboard
pracucci Apr 7, 2021
319ecb8
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/290…
pracucci Apr 7, 2021
6f9fbc0
Simplify compactor alerts
gouthamve Apr 20, 2021
7b9dc6b
Use the right metric
gouthamve Apr 20, 2021
c11d9e6
Apply suggestions from code review
gouthamve Apr 20, 2021
8722867
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/292…
gouthamve Apr 20, 2021
7bbb172
Fix CortexCompactorHasNotSuccessfullyRunCompaction to avoid false pos…
pracucci Apr 21, 2021
91eb55e
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/279…
pracucci Apr 21, 2021
139acd3
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/294…
pracucci Apr 21, 2021
8c30820
Introduce ingester instance limits to configuration, and add alerts. …
pstibrany Apr 22, 2021
67ee413
Improve CortexRulerFailedRingCheck alert
pracucci May 4, 2021
f745408
Added example Loki query to CortexTenantHasPartialBlocks playbook
pracucci May 4, 2021
a6c0e8d
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/298…
pracucci May 4, 2021
278ced6
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/297…
pracucci May 4, 2021
e2113d7
Default dashboards to Cortex blocks storage only
pracucci May 11, 2021
a1af0bf
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/302…
pracucci May 12, 2021
3e0ea6b
Add missing memberlist components to alerts
simonswine May 13, 2021
2c33b1f
mixin: Add gateway to valid job names (for GEM)
umamialex May 19, 2021
49e5bd8
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/310…
pracucci May 19, 2021
d9e5b31
Only show namespaces from selected cluster. "All" works thanks to usi…
pstibrany May 19, 2021
d29b27e
Fixed CortexIngesterHasNotShippedBlocks alert false positive
pracucci May 24, 2021
ac1de90
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/312…
pracucci May 27, 2021
37b2881
Fixed mixin linter
pracucci May 26, 2021
b960469
Add placeholders to make the linter pass
pracucci May 26, 2021
2e6ea09
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/314…
pracucci May 31, 2021
193597f
cortex-mixin: Use kube_pod_container_resource_{requests,limits} metrics
jvrplmlmn May 31, 2021
4a5f52a
cortex-mixin: Make the recording rules backwards compatible
jvrplmlmn May 31, 2021
f9f8cc2
refactor: functions to reduce code duplication
darrenjaneczek Jun 1, 2021
c99479d
fix: lint
darrenjaneczek Jun 1, 2021
0ff9164
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/317…
pracucci Jun 1, 2021
7f00375
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/318…
pracucci Jun 1, 2021
9d9c739
refactor: config for job aggregation strings
darrenjaneczek Jun 2, 2021
ec05ad6
lint
darrenjaneczek Jun 2, 2021
46b2c11
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 3, 2021
1671e38
fix: syntax
darrenjaneczek Jun 3, 2021
3299e55
refactor: added a group_config
darrenjaneczek Jun 4, 2021
4a5b372
refactor: added a group_config
darrenjaneczek Jun 4, 2021
3b6693d
refactor: added a group_config
darrenjaneczek Jun 4, 2021
df6a760
Lower CortexIngesterRestarts severity
pracucci Jun 7, 2021
5958a65
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/305…
pracucci Jun 8, 2021
ac5a656
Merge branch 'main' into lower-ingester-restarts-severity
pracucci Jun 8, 2021
edd68a4
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/321…
pracucci Jun 8, 2021
77718f5
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/319…
pracucci Jun 10, 2021
2df257a
feature: add some text boxes and descriptions
darrenjaneczek Jun 8, 2021
143eb01
fix: text replacements, repair addRows
darrenjaneczek Jun 9, 2021
6d04a83
Changing copy to add 'latency' as well.
Jun 13, 2021
98dfc2d
Cut down on text from initial PR. Tucked existing text from the compa…
Jun 13, 2021
1d5daac
Getting rid of a few space/comma errors.
Jun 13, 2021
48e8168
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
427d787
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
2a4cfd2
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
290ea24
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
08f2f32
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
2c3a117
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
da98abc
fix: formatting - limit to 4 panels per row
darrenjaneczek Jun 15, 2021
e57039d
Merge branch 'darrenjaneczek/dashboard-descriptions-reads-writes' of …
darrenjaneczek Jun 15, 2021
35f9e73
fmt
darrenjaneczek Jun 15, 2021
877e06f
fix: remove accidental line
darrenjaneczek Jun 15, 2021
b46277a
Update cortex-mixin/dashboards/dashboard-utils.libsonnet
darrenjaneczek Jun 15, 2021
2395da8
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
302c8ae
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
da1744f
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
58dc25b
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
99c0b8a
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
9e8b0a9
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
c3c4c68
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
3bbcb8a
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
4160279
fix: Requests per second
darrenjaneczek Jun 15, 2021
3f131f4
fix: text
darrenjaneczek Jun 15, 2021
aee69f1
Apply suggestions from code review as per @osg-grafana
darrenjaneczek Jun 15, 2021
299099f
fix: clarity
darrenjaneczek Jun 15, 2021
7d5a0e1
Apply suggestions from code review as per @osg-grafana
darrenjaneczek Jun 15, 2021
2ff44e4
Merge branch 'darrenjaneczek/dashboard-descriptions-reads-writes' of …
darrenjaneczek Jun 15, 2021
1c214d3
Add a simple playbook for ingester series limit alert.
cstyan Jun 16, 2021
c59c9b6
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/330…
pracucci Jun 17, 2021
d062e94
Add cortex-gw-internal to watched gateway metrics (https://github.com…
johannaratliff Jun 17, 2021
1ed03d6
fix: query formatting to aid in merge
darrenjaneczek Jun 17, 2021
c24a79a
fix: query formatting to aid in merge
darrenjaneczek Jun 17, 2021
677b9c4
fix: consistent labelling
darrenjaneczek Jun 17, 2021
e56f0e1
fix: ensure panel titles are consistent
darrenjaneczek Jun 17, 2021
c857e01
Improved CortexIngesterReachingSeriesLimit playbook and added CortexI…
pracucci Jun 21, 2021
70d0bf6
Better formatting for ingester_instance_limits+ example
pracucci Jun 21, 2021
b3fe9d5
Clarify which alerts apply to chunks storage only
pracucci Jun 21, 2021
b511c4e
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/333…
pracucci Jun 21, 2021
a4b9505
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/332…
pracucci Jun 21, 2021
4e75e0f
Improve compactor alerts and playbooks
pracucci Jun 21, 2021
02eaf92
Addressed review comments
pracucci Jun 21, 2021
7b96c22
Update cortex-mixin/docs/playbooks.md
pracucci Jun 21, 2021
e676f7c
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/334…
pracucci Jun 21, 2021
6f2542d
Fixed and improved runtime config alerts and playbooks
pracucci Jun 21, 2021
8984245
fix: resolve review feedback
darrenjaneczek Jun 21, 2021
7a75f25
Update cortex-mixin/docs/playbooks.md
pracucci Jun 22, 2021
585582d
Update cortex-mixin/docs/playbooks.md
pracucci Jun 22, 2021
46a8a0e
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/335…
pracucci Jun 22, 2021
6fe9763
MarkCortexTableSyncFailure and CortexOldChunkInMemory alerts as chunk…
pracucci Jun 22, 2021
edecc3d
Merge branch 'main' into darrenjaneczek/dashboard-descriptions-reads-…
pracucci Jun 22, 2021
fd3df9a
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/324…
pracucci Jun 22, 2021
c891992
Fixed whitespace noise
pracucci Jun 22, 2021
339b410
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/337…
pracucci Jun 22, 2021
6aba412
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/336…
pracucci Jun 22, 2021
447bc0a
refactor: resources dashboard comtainer functions
darrenjaneczek Jun 16, 2021
a777065
revert: matching spacing format of main
darrenjaneczek Jun 22, 2021
4399d9b
lint: white noise
darrenjaneczek Jun 22, 2021
1be26db
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/331…
pracucci Jun 22, 2021
a3e9b28
Add playbook for CortexRequestErrors and config option to exclude spe…
pracucci Jun 23, 2021
2af795f
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/338…
pracucci Jun 23, 2021
c090874
Change min-step to 15s to show better detail.
bboreham Jun 25, 2021
00e2c38
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/340…
bboreham Jun 30, 2021
18c3cb9
Added playbook for CortexFrontendQueriesStuck and CortexSchedulerQuer…
pracucci Jul 1, 2021
38aabca
Remove CortexQuerierCapacityFull alert
pracucci Jul 1, 2021
53adf94
Added playbook for CortexProvisioningTooManyWrites
pracucci Jul 1, 2021
f4b5dd0
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/343…
pracucci Jul 1, 2021
bc50863
Added playbook for CortexAllocatingTooMuchMemory
pracucci Jul 2, 2021
b592c8b
Address review feedback
pracucci Jul 2, 2021
7eb687e
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/345…
pracucci Jul 2, 2021
9d80934
Replaced CortexCacheRequestErrors with CortexMemcachedRequestErrors
pracucci Jul 2, 2021
f08854b
Replace ruler alerts, and add playbooks.
pstibrany Jul 2, 2021
1ba6047
Addressed review comments
pracucci Jul 2, 2021
7377e55
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/341…
pracucci Jul 2, 2021
dec2b14
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/342…
pracucci Jul 2, 2021
357876b
Fix white space.
pstibrany Jul 2, 2021
a1465fb
Better alert messages.
pstibrany Jul 2, 2021
ec605f2
Merge branch 'main' into ruler-alerts
pstibrany Jul 2, 2021
027e654
Improve CortexIngesterReachingSeriesLimit playbook
pracucci Jul 2, 2021
348a00d
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/348…
pracucci Jul 2, 2021
fb98b9a
Add playbook for CortexProvisioningTooManyActiveSeries
pracucci Jul 2, 2021
77d4b45
Merge branch 'main' into playbook-for-CortexCacheRequestErrors
pracucci Jul 2, 2021
7f33efb
Improve messaging.
pstibrany Jul 2, 2021
8480176
Merge remote-tracking branch 'origin/ruler-alerts' into ruler-alerts
pstibrany Jul 2, 2021
d876f21
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/346…
pracucci Jul 2, 2021
a815207
Merge branch 'main' into ruler-alerts
pracucci Jul 2, 2021
9de2964
Fixed formatting
pracucci Jul 2, 2021
5cd7ca0
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/347…
pracucci Jul 2, 2021
e9a89a7
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/350…
pracucci Jul 2, 2021
66e36d8
Improved alert messages with Cortex cluster
pracucci Jul 2, 2021
17bc2eb
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/351…
pracucci Jul 2, 2021
c0625cf
Improved CortexRequestLatency playbook
pracucci Jul 5, 2021
03cfca3
Added 'Per route p99 latency' to ruler configuration API
pracucci Jul 5, 2021
c9f5db8
Addressed review comments
pracucci Jul 5, 2021
d97836f
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/352…
pracucci Jul 5, 2021
106e70c
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/353…
pracucci Jul 5, 2021
acb4a07
Aded object storage metrics for Ruler and Alertmanager
pracucci Jul 6, 2021
82d1e11
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/354…
pracucci Jul 6, 2021
97a918c
Add playbook entry for CortexGossipMembersMismatch.
stevesg Jul 14, 2021
61362fd
Clarify data loss related to 'not healthy index found' issue
pracucci Jul 14, 2021
661082b
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/357…
pracucci Jul 14, 2021
dd543cc
Review comments.
stevesg Jul 15, 2021
35a1249
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/356…
stevesg Jul 15, 2021
3248aae
Improve CortexIngesterReachingSeriesLimit playbook
pracucci Jul 22, 2021
65c91af
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/360…
pracucci Jul 22, 2021
ec2f95c
Increased CortexIngesterReachingSeriesLimit critical alert threshold …
pracucci Jul 27, 2021
c5d98a9
Increase CortexIngesterReachingSeriesLimit warning `for` duration
beorn7 Jul 26, 2021
1005fcd
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/362…
pracucci Jul 28, 2021
f63182f
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/363…
pracucci Jul 28, 2021
35b2479
Fix scaling dashboard to work on multi-zone ingesters
pracucci Jul 28, 2021
cec1a40
Simplified cluster_namespace_deployment:actual_replicas:count recordi…
pracucci Jul 28, 2021
3d0e6f5
Added a comment to explain '.*?'
pracucci Jul 28, 2021
2a02111
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/365…
pracucci Jul 28, 2021
432c4a3
Fix rollout dashboard to work with multi-zone deployments
pracucci Jul 29, 2021
a5950dc
Fixed legends
pracucci Jul 29, 2021
ae25c9a
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/366…
pracucci Jul 29, 2021
4498eb1
Extend Alertmanager dashboard with currently unused metrics.
stevesg May 25, 2021
00d7414
Review comments + fix latency panel.
stevesg May 27, 2021
b0e76f9
Review comments.
stevesg Jul 30, 2021
05ef90e
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/313…
stevesg Jul 30, 2021
918a35f
Clarify the gsutil mv command for moving corrupted blocks
Aug 16, 2021
98d38cc
Modify log message to fit example command
Aug 17, 2021
54a5fa9
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/370…
Aug 17, 2021
bf14ff5
Update grafana-builder from Mar 2019 to Feb 2021
bboreham Aug 18, 2021
3031864
Match query-frontend/query-scheduler/querier custom deployments by de…
pracucci Aug 24, 2021
fcd07d8
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/376…
pracucci Aug 24, 2021
0ea95cd
Create playbooks for sharded alertmanager
grobinson-grafana Aug 25, 2021
05cdd4a
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/378…
stevesg Aug 25, 2021
fd26edb
Add new alerts for alertmanager sharding mode of operation.
stevesg Aug 24, 2021
cea7f02
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/377…
stevesg Aug 25, 2021
a283035
fix(rules): upstream recording rule switched to sum_irate
Duologic Aug 25, 2021
a0e1967
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/379…
gouthamve Aug 25, 2021
03ffdeb
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/372…
bboreham Aug 26, 2021
cad752b
Fix CortexIngesterReachingSeriesLimit playbook
aknuds1 Aug 26, 2021
1ea2087
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/382…
aknuds1 Aug 26, 2021
64b9842
feat: Allow configuration of ring members in gossip alerts
jdbaldry May 11, 2021
24a3c91
fix: Add store-gateway and compactor ring_members
jdbaldry May 11, 2021
2b39d4a
fix: Match all ingester workloads and avoid matching the cortex-gateway
jdbaldry May 24, 2021
10e0dfd
feat: Optionally allow use of array or string to configure ring members
jdbaldry Jun 3, 2021
2bbc3c7
address review feedback
jdbaldry Jul 20, 2021
cbed1b6
fix: Correct ingester and querier regexps
jdbaldry Aug 27, 2021
5b6235c
Fixes to initial state sync panels on alertmanager dashboard.
stevesg Sep 2, 2021
fe18187
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/385…
stevesg Sep 2, 2021
f2774b8
Add rate back to Alertmanager dashboard initial syncs panel.
stevesg Sep 7, 2021
7a84065
Make the overrides metric name configurable.
gouthamve Sep 7, 2021
177067d
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/389…
gouthamve Sep 8, 2021
50dc732
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/390…
stevesg Sep 14, 2021
7d042ed
Improve Cortex / Queries dashboard
pracucci Sep 14, 2021
699334b
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/393…
pracucci Sep 14, 2021
a861947
Add recording rules for speeding up Alertmanager dashboard.
stevesg Sep 2, 2021
43c1423
Fixes from testing.
stevesg Sep 15, 2021
1033b9d
Move rules to their own group.
stevesg Sep 15, 2021
c0d2408
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/387…
stevesg Sep 22, 2021
17e71c4
Split `cortex_api` recording rule group into three groups.
stevesg Oct 4, 2021
2e7fe0f
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/401…
pracucci Oct 4, 2021
6787211
Update gsutil installation playbook
pracucci Oct 6, 2021
d5a3188
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/402…
pracucci Oct 6, 2021
bbb9b34
Use `$._config.job_names.gateway` in resources dashboards.
stevesg Oct 12, 2021
ca7cc8a
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/403…
stevesg Oct 12, 2021
c180588
Fine tune CortexIngesterReachingSeriesLimit alert
pracucci Oct 13, 2021
eebc529
Add CortexRolloutStuck alert
pracucci Oct 13, 2021
ea3274f
Fixed playbook
pracucci Oct 13, 2021
4de2e29
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/404…
pracucci Oct 14, 2021
fd975db
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/405…
pracucci Oct 14, 2021
dcb3306
Added CortexFailingToTalkToConsul alert
pracucci Oct 13, 2021
d79b304
Fixed alert message
pracucci Oct 13, 2021
d02bb6b
Update alert to be generic to KV stores
pracucci Oct 14, 2021
859efc9
Merge pull request https://github.com/grafana/cortex-jsonnet/pull/406…
pracucci Oct 14, 2021
1680cc8
Merge commit '859efc9' into 20211014_import-cortex-mixin
simonswine Oct 18, 2021
6f1bc18
Add README
simonswine Oct 18, 2021
4e65adf
Add mimir-mixin CI checks
simonswine Oct 18, 2021
8b4b942
Update build image
simonswine Oct 19, 2021
c625205
Move to operations folder
simonswine Oct 19, 2021
61250ef
Add missing zip to build-image
simonswine Oct 19, 2021
e7b4eab
Run prettifier on playbooks.md
simonswine Oct 19, 2021
93f9b88
Update build-image
simonswine Oct 19, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test-build-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ jobs:
run: make BUILD_IN_CONTAINER=false check-protos
- name: Check Generated Documentation
run: make BUILD_IN_CONTAINER=false check-doc
- name: Check Mixin
run: make BUILD_IN_CONTAINER=false check-mixin
- name: Check White Noise.
run: make BUILD_IN_CONTAINER=false check-white-noise
- name: Check License Header
Expand Down
35 changes: 34 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# WARNING: do not commit to a repository!
-include Makefile.local

.PHONY: all test integration-tests cover clean images protos exes dist doc clean-doc check-doc push-multiarch-build-image license check-license format
.PHONY: all test integration-tests cover clean images protos exes dist doc clean-doc check-doc push-multiarch-build-image license check-license format check-mixin check-mixin-jb check-mixin-mixtool checkin-mixin-playbook build-mixin format-mixin
.DEFAULT_GOAL := all

# Version number
Expand All @@ -25,6 +25,12 @@ GIT_REVISION := $(shell git rev-parse --short HEAD)
GIT_BRANCH := $(shell git rev-parse --abbrev-ref HEAD)
UPTODATE := .uptodate

# path to jsonnetfmt
JSONNET_FMT := jsonnetfmt

# path to the mimir/mixin
MIXIN_PATH := jsonnet/mimir-mixin

.PHONY: image-tag
image-tag:
@echo $(IMAGE_TAG)
Expand Down Expand Up @@ -313,6 +319,33 @@ clean-white-noise:
check-white-noise: clean-white-noise
@git diff --exit-code --quiet -- '*.md' || (echo "Please remove trailing whitespaces running 'make clean-white-noise'" && false)

check-mixin: format-mixin check-mixin-jb check-mixin-mixtool check-mixin-playbook
@git diff --exit-code --quiet -- $(MIXIN_PATH) || (echo "Please format mixin by running 'make format-mixin'" && false)

@cd $(MIXIN_PATH) && \
jb install && \
mixtool lint mixin.libsonnet

check-mixin-jb:
@cd $(MIXIN_PATH) && \
jb install

check-mixin-mixtool: check-mixin-jb
@cd $(MIXIN_PATH) && \
mixtool lint mixin.libsonnet

check-mixin-playbook: build-mixin
@$(MIXIN_PATH)/scripts/lint-playbooks.sh

build-mixin: check-mixin-jb
@rm -rf $(MIXIN_PATH)/out && mkdir $(MIXIN_PATH)/out
@cd $(MIXIN_PATH) && \
mixtool generate all --output-alerts out/alerts.yaml --output-rules out/rules.yaml --directory out/dashboards mixin.libsonnet && \
zip -q -r mimir-mixin.zip out

format-mixin:
@find $(MIXIN_PATH) -type f -name '*.libsonnet' -print -o -name '*.jsonnet' -print | xargs jsonnetfmt -i

web-serve:
cd website && hugo --config config.toml --minify -v server

Expand Down
3 changes: 3 additions & 0 deletions jsonnet/mimir-mixin/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/out/
/vendor/
/mimir-mixin.zip
18 changes: 18 additions & 0 deletions jsonnet/mimir-mixin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Monitoring for Mimir

To generate the Grafana dashboards and Prometheus alerts for Mimir:

## Usage

```console
$ GO111MODULE=on go get github.com/monitoring-mixins/mixtool/cmd/mixtool
$ GO111MODULE=on go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
$ git clone https://github.com/grafana/mimir.git
$ make build-mixin
```

This will leave all the alerts and dashboards in jsonnet/mimir-mixin/mimir-mixin.zip (or jsonnet/mimir-mixin/out).

## Known Problems

If you get an error like `cannot use cli.StringSliceFlag literal (type cli.StringSliceFlag) as type cli.Flag in slice literal` when installing [mixtool](https://github.com/monitoring-mixins/mixtool/issues/27), make sure you set `GO111MODULE=on` before `go get`.
13 changes: 13 additions & 0 deletions jsonnet/mimir-mixin/alerts.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
prometheusAlerts+::
(import 'alerts/alerts.libsonnet') +
(import 'alerts/alertmanager.libsonnet') +

(if std.member($._config.storage_engine, 'blocks')
then
(import 'alerts/blocks.libsonnet') +
(import 'alerts/compactor.libsonnet')
else {}) +

{ _config:: $._config + $._group_config },
}
98 changes: 98 additions & 0 deletions jsonnet/mimir-mixin/alerts/alertmanager.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
groups+: [
{
name: 'alertmanager_alerts',
rules: [
{
alert: 'CortexAlertmanagerSyncConfigsFailing',
expr: |||
rate(cortex_alertmanager_sync_configs_failed_total[5m]) > 0
|||,
'for': '30m',
labels: {
severity: 'critical',
},
annotations: {
message: |||
Cortex Alertmanager {{ $labels.job }}/{{ $labels.instance }} is failing to read tenant configurations from storage.
|||,
},
},
{
alert: 'CortexAlertmanagerRingCheckFailing',
expr: |||
rate(cortex_alertmanager_ring_check_errors_total[2m]) > 0
|||,
'for': '10m',
labels: {
severity: 'critical',
},
annotations: {
message: |||
Cortex Alertmanager {{ $labels.job }}/{{ $labels.instance }} is unable to check tenants ownership via the ring.
|||,
},
},
{
alert: 'CortexAlertmanagerPartialStateMergeFailing',
expr: |||
rate(cortex_alertmanager_partial_state_merges_failed_total[2m]) > 0
|||,
'for': '10m',
labels: {
severity: 'critical',
},
annotations: {
message: |||
Cortex Alertmanager {{ $labels.job }}/{{ $labels.instance }} is failing to merge partial state changes received from a replica.
|||,
},
},
{
alert: 'CortexAlertmanagerReplicationFailing',
expr: |||
rate(cortex_alertmanager_state_replication_failed_total[2m]) > 0
|||,
'for': '10m',
labels: {
severity: 'critical',
},
annotations: {
message: |||
Cortex Alertmanager {{ $labels.job }}/{{ $labels.instance }} is failing to replicating partial state to its replicas.
|||,
},
},
{
alert: 'CortexAlertmanagerPersistStateFailing',
expr: |||
rate(cortex_alertmanager_state_persist_failed_total[15m]) > 0
|||,
'for': '1h',
labels: {
severity: 'critical',
},
annotations: {
message: |||
Cortex Alertmanager {{ $labels.job }}/{{ $labels.instance }} is unable to persist full state snaphots to remote storage.
|||,
},
},
{
alert: 'CortexAlertmanagerInitialSyncFailed',
expr: |||
increase(cortex_alertmanager_state_initial_sync_completed_total{outcome="failed"}[1m]) > 0
|||,
labels: {
severity: 'critical',
},
annotations: {
message: |||
Cortex Alertmanager {{ $labels.job }}/{{ $labels.instance }} was unable to obtain some initial state when starting up.
|||,
},
},
],
},
],
}
Loading