-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Alertmanager dashboard with currently unused metrics. #313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good job! It definitely improves the visibility over the new alertmanager sharding. I left few comments I would be glad if you could take a look.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! I left few final nits, but consider it already approved. Please test it in a dev env before merging. Thanks!
a16eacf
to
d998b25
Compare
Metrics for general operation: - Added "Tenants" stat panel using: `cortex_alertmanager_tenants_discovered` - Added "Tenant Configuration Sync" row using: `cortex_alertmanager_sync_configs_failed_total` `cortex_alertmanager_sync_configs_total` `cortex_alertmanager_ring_check_errors_total` Metrics specific to sharding operation: - Added "Sharding Initial State Sync" row using: `cortex_alertmanager_state_initial_sync_completed_total` `cortex_alertmanager_state_initial_sync_completed_total` `cortex_alertmanager_state_initial_sync_duration_seconds` - Added "Sharding State Operations" row using: `cortex_alertmanager_state_fetch_replica_state_total` `cortex_alertmanager_state_fetch_replica_state_failed_total` `cortex_alertmanager_state_replication_total` `cortex_alertmanager_state_replication_failed_total` `cortex_alertmanager_partial_state_merges_total` `cortex_alertmanager_partial_state_merges_failed_total` `cortex_alertmanager_state_persist_total` `cortex_alertmanager_state_persist_failed_total`
144e088
to
629d288
Compare
Updated and tested. |
…er-sharding Extend Alertmanager dashboard with currently unused metrics.
What this PR does:
Metrics for general operation:
Added "Tenants" stat panel using:
cortex_alertmanager_tenants_discovered
Added "Tenant Configuration Sync" row using:
cortex_alertmanager_sync_configs_failed_total
cortex_alertmanager_sync_configs_total
cortex_alertmanager_ring_check_errors_total
Metrics specific to sharding operation:
Added "Sharding Initial State Sync" row using:
cortex_alertmanager_state_initial_sync_completed_total
cortex_alertmanager_state_initial_sync_completed_total
cortex_alertmanager_state_initial_sync_duration_seconds
Added "Sharding State Operations" row using:
cortex_alertmanager_state_fetch_replica_state_total
cortex_alertmanager_state_fetch_replica_state_failed_total
cortex_alertmanager_state_replication_total
cortex_alertmanager_state_replication_failed_total
cortex_alertmanager_partial_state_merges_total
cortex_alertmanager_partial_state_merges_failed_total
cortex_alertmanager_state_persist_total
cortex_alertmanager_state_persist_failed_total
I did not add a configuration to enable/disable the sharding-specific dashboards as the resulting jsonnet is somewhat messy, but I am happy to add it if deemed necessary.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]