-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added 'Mimir / Overview resources' dashboard #3481
Conversation
@@ -526,7 +526,7 @@ | |||
"steppedLine": false, | |||
"targets": [ | |||
{ | |||
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_written_bytes_total[$__rate_interval]\n )\n)\n+\nignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=\"alertmanager\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n", | |||
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_written_bytes_total[$__rate_interval]\n )\n)\n+\nignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"alertmanager\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: few other dashboards have this change to disk I/O query because I've changed the container="..."
matcher to a regex matcher container=~"..."
@@ -690,7 +690,7 @@ | |||
"steppedLine": false, | |||
"targets": [ | |||
{ | |||
"expr": "max by(persistentvolumeclaim) (\n kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} /\n kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\nand\ncount by(persistentvolumeclaim) (\n kube_persistentvolumeclaim_labels{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n label_name=\"alertmanager\"\n }\n)\n", | |||
"expr": "max by(persistentvolumeclaim) (\n kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} /\n kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\nand\ncount by(persistentvolumeclaim) (\n kube_persistentvolumeclaim_labels{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n label_name=~\"(alertmanager).*\"\n }\n)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: to simplify the multi-zone matching, I've applied the .*
suffix to all label_name
matchers.
|
||
resourcesPanelQueries(metric, instanceName):: | ||
resourceUtilizationQuery(metric, instanceName):: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: introduced a new utility function to generate the resource utilization query (without request/limit queries).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to not have requests/limits? I think we found them helpful when they were added on the other resource dashboards
I imagine because those will be different when this dashboard is used on a microservices deployment. In that case we can repeat the panel for every different container we have.
Alternatively, we can display the memory utilization as a % of the limit instead of an absolute value. I think I prefer this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine because those will be different when this dashboard is used on a microservices deployment.
For this reason.
In that case we can repeat the panel for every different container we have.
Well, that goes against the idea of having an "Overview". If you want to go deep, you open the "Writes resources" or "Reads resources" dashboard.
$._config.resources_panel_queries[$._config.deployment_type]['%s_usage' % metric] % { | ||
instance: $._config.per_instance_label, | ||
namespace: $.namespaceMatcher(), | ||
instanceName: instanceName, | ||
}, | ||
|
||
resourceUtilizationAndLimitQueries(metric, instanceName):: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: I renamed resourcesPanelQueries()
to resourceUtilizationAndLimitQueries()
to clarify it also include resources request/limit.
ad54cdf
to
ca43ec6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
resourcesPanelQueries(metric, instanceName):: | ||
resourceUtilizationQuery(metric, instanceName):: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to not have requests/limits? I think we found them helpful when they were added on the other resource dashboards
I imagine because those will be different when this dashboard is used on a microservices deployment. In that case we can repeat the panel for every different container we have.
Alternatively, we can display the memory utilization as a % of the limit instead of an absolute value. I think I prefer this option.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
ca43ec6
to
f8c95c2
Compare
…oards Signed-off-by: Marco Pracucci <marco@pracucci.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for addressing my feedback
@@ -163,7 +163,7 @@ | |||
"height": "250px", | |||
"panels": [ | |||
{ | |||
"content": "These panels show an overview on the write path. \nVisit the following specific dashboards to drill down into the write path:\n\n- <a target=\"_blank\" href=\"./d/8280707b8f16e7b87b840fc1cc92d4c5/mimir-writes?${__url_time_range}&${__all_variables}\">Writes</a>\n- <a target=\"_blank\" href=\"./d/bc9160e50b52e89e0e49c840fea3d379/mimir-writes-resources?${__url_time_range}&${__all_variables}\">Writes resources</a>\n- <a target=\"_blank\" href=\"./d/978c1cb452585c96697a238eaac7fe2d/mimir-writes-networking?${__url_time_range}&${__all_variables}\">Writes networking</a>\n", | |||
"content": "These panels show an overview on the write path. \nVisit the following specific dashboards to drill down into the write path:\n\n- <a target=\"_blank\" href=\"./d/8280707b8f16e7b87b840fc1cc92d4c5/mimir-writes?${__url_time_range}&${__all_variables}\">Writes</a>\n- <a target=\"_blank\" href=\"./d/bc9160e50b52e89e0e49c840fea3d379/mimir-writes-resources?${__url_time_range}&${__all_variables}\">Writes resources</a>\n- <a target=\"_blank\" href=\"./d/978c1cb452585c96697a238eaac7fe2d/mimir-writes-networking?${__url_time_range}&${__all_variables}\">Writes networking</a>\n- <a target=\"_blank\" href=\"./d/a9b92d3c4d1af325d872a9e9a7083d71/mimir-overview-resources?${__url_time_range}&${__all_variables}\">Overview resources</a>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic because "writes" is a verb and a noun in this context. Is it "Writes’ resources"? or is it "[Something] writes resources"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression is that this is multiple writes’ resources rather than a single write’s resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please simplify the multiple prepositions (down into) in "Visit the following specific dashboards to drill down into the write path:" Suggestion: To examine the write path in detail, see a specific dashboard:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: #3488
* Added 'Mimir / Overview resources' dashboard Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplified resourceUtilizationQuery() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not allow to select multiple namespaces like other resources dashboards Signed-off-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Marco Pracucci <marco@pracucci.com>
What this PR does
In this PR I'm proposing a new dashboard called "Mimir / Overview resources" which shows an overview over a Mimir cluster resources utilization (CPU, memory, disk). The Mimir components are grouped into "Writes", "Reads" and "Backend" and the dashboard both work with microservices and read-write deployment mode.
Note to reviewers:
Example of microservices deployment:
Example of read-write deployment:
Which issue(s) this PR fixes or relates to
Part of #3361
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]