-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk usage metrics for containerd #2785
Comments
Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead. Reworked container restart: use cAdvisor metrics to detect container restarts in all environments cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite Shared Group titles: titles are now in `shared` package for consistency and ease of editing
I have experienced the same issue.
There is some conversation about these metrics containerd/containerd#678. I suppose that contained provide this information. |
PR was submited . #2872 |
Is there any timeline for a fix of this issue? |
+1 on looking for any update or timeline regarding this issue - these metrics are pretty important for observability and workload behavior. |
Adding onto this ticket since we're blocked on switching to the |
@bobbypage Do you have an update on this? Best I can follow is that there is a possibly-working version in the Alternatively it seems like work has gone into not using cadvisor for container stats and k8s 1.23 has an alpha feature-gate which uses the cri stats provider ( |
Enabling the |
It appears that this won't be addressed any time soon, as KEP-2371 moves most of the stats collection out of cadvisor into the CRI interface. Is there an interim solution for users that need these stats? |
The workaround for now is to use the |
Is that branch being actively maintained? Do you know if it still works normally with other runtimes? |
Yes, it is maintained we just pushed the latest v0.45.0 changes to this branch. The reason we need this separate branch is because to get the Disk usage metrics on containerd requires importing the CRI API into cAdvisor. However, we can't import the CRI API into cAdvisor because cAdvisor is imported by k8s and k8s itself includes the CRI API which results in a circular dependency. So the workaround for now is to have this separate branch which includes CRI API. (see #2872 (comment) for that discussion).
Yes, it will work with other runtimes as well, but if containerd is not used there is no benefit of using it. |
Ah hmm. So I take it the circular dependency prohibits this branch from being embedded in the kubelet, and there's no easy path towards doing so? Running a standalone deployment of cadvisor isn't particularly palatable, as asking our users to retool their monitoring stacks to make use of that would be a non-trivial amount of work. I'm honestly surprised that we got this far into the dockershim depreciation with cadvisor missing feature parity for one of the most popular replacement runtimes. |
@brandond are you referring that most folks are using the existing |
Would it be possible to list a config somewhere that only gathers the disk? GKE controls our normal cadvisor so running a minimal "container disk metrics only" daemonset seems like a simple work around. |
|
Refreshing my memory on this issue, I realised we didn't link to the exporter @ribbybibby written to address this: https://github.com/utilitywarehouse/kube-summary-exporter. We have been running it for nearly 2 years now. |
Any updates on this ? containerd is the default and recommended runtime for GKE , but there is still no support for |
It appears at least on
|
Any hope for this to be implemented soon ? |
* Add dashboards * Introduce new value `IsGardenCluster` * Add dashboard providers configmap * Add datasource configMap * Add service * Add dashboard configMaps * Add deployment * Add ingress * Move helper function at the end * Deploy oidc dashboard only if authentication webhook is enabled * Integrate plutono in gop flow * Adapt seed plutono * Adapt shoot plutono * Integrate vali * Adapt test * Adapt integration and e2e test * --------------Empty separator commit--------------- * Reuse dashboard among shoot and garden * Change datasource name from `cluster-prometheus` to `prometheus` Update plutono.go * Adapt apiserver-overview dashboard to make it reusable. Rename dashboard variable "apiserver" to "pod" Add 2 variables: job and pod Add the pod variable to the promql expressions * Reuse `apiserver overview` dashboard * Reuse `apiserver` related other dashboard * make default selection all * Reuse apiserver-request-duration-and-response dashboard Old shoot dashboard had some random stuff also * Add pod logs to kubernetes pods dashbboard * Remove Pod file system usage metrics ref - google/cadvisor#2785 * Adapt PC doc * Address review * Use same port for all use case * Drop special handling for OIDC webhook * Allow garden dashboard to have additional dashboards * Adapt test * Use wildcard cert for ingress in runtime cluster * Address review * Address review * Update docs/usage/trusted-tls-for-garden-runtime.md * Update docs/README.md --------- Co-authored-by: Tim Usner <tim.usner@sap.com>
* Add dashboards * Introduce new value `IsGardenCluster` * Add dashboard providers configmap * Add datasource configMap * Add service * Add dashboard configMaps * Add deployment * Add ingress * Move helper function at the end * Deploy oidc dashboard only if authentication webhook is enabled * Integrate plutono in gop flow * Adapt seed plutono * Adapt shoot plutono * Integrate vali * Adapt test * Adapt integration and e2e test * --------------Empty separator commit--------------- * Reuse dashboard among shoot and garden * Change datasource name from `cluster-prometheus` to `prometheus` Update plutono.go * Adapt apiserver-overview dashboard to make it reusable. Rename dashboard variable "apiserver" to "pod" Add 2 variables: job and pod Add the pod variable to the promql expressions * Reuse `apiserver overview` dashboard * Reuse `apiserver` related other dashboard * make default selection all * Reuse apiserver-request-duration-and-response dashboard Old shoot dashboard had some random stuff also * Add pod logs to kubernetes pods dashbboard * Remove Pod file system usage metrics ref - google/cadvisor#2785 * Adapt PC doc * Address review * Use same port for all use case * Drop special handling for OIDC webhook * Allow garden dashboard to have additional dashboards * Adapt test * Use wildcard cert for ingress in runtime cluster * Address review * Address review * Update docs/usage/trusted-tls-for-garden-runtime.md * Update docs/README.md --------- Co-authored-by: Tim Usner <tim.usner@sap.com>
* Add dashboards * Introduce new value `IsGardenCluster` * Add dashboard providers configmap * Add datasource configMap * Add service * Add dashboard configMaps * Add deployment * Add ingress * Move helper function at the end * Deploy oidc dashboard only if authentication webhook is enabled * Integrate plutono in gop flow * Adapt seed plutono * Adapt shoot plutono * Integrate vali * Adapt test * Adapt integration and e2e test * --------------Empty separator commit--------------- * Reuse dashboard among shoot and garden * Change datasource name from `cluster-prometheus` to `prometheus` Update plutono.go * Adapt apiserver-overview dashboard to make it reusable. Rename dashboard variable "apiserver" to "pod" Add 2 variables: job and pod Add the pod variable to the promql expressions * Reuse `apiserver overview` dashboard * Reuse `apiserver` related other dashboard * make default selection all * Reuse apiserver-request-duration-and-response dashboard Old shoot dashboard had some random stuff also * Add pod logs to kubernetes pods dashbboard * Remove Pod file system usage metrics ref - google/cadvisor#2785 * Adapt PC doc * Address review * Use same port for all use case * Drop special handling for OIDC webhook * Allow garden dashboard to have additional dashboards * Adapt test * Use wildcard cert for ingress in runtime cluster * Address review * Address review * Update docs/usage/trusted-tls-for-garden-runtime.md * Update docs/README.md --------- Co-authored-by: Tim Usner <tim.usner@sap.com>
Any updates? |
I tried to rebase (https://github.com/google/cadvisor/tree/containerd-cri) on v0.48.0 (and v0.47.1) in both cases the resource usage blows up: 🙁 I do see values for the metrics, but didn't validate that they are correct as is reported not to be in other comments. |
I doubt this is going to be fixed, given the work in progress to move stats into the CRI API, and use the CRI stats to replace the data currently served at the cadvisor metrics endpoint - as discussed above.
It looks like containerd's cgroupv2 manager does not currently support filesystem utilization stats; it only returns data for PIDs, CPU, memory, block IO, RDMA, and HugeTLB. |
Is Have Any updates? |
use the crictl tool can get container fs usage ,eg crictl stats
CONTAINER CPU % MEM DISK INODES
0674440a33dbd 0.00 1.438MB 102.4kB 24
2e2f101e7ce72 0.06 62.43MB 114.7kB 29
37ed67b1e33cf 1.58 346.1MB 110.6MB 41
and the data show in DISK row come from this code type ContainerStats struct {
// Information of the container.
Attributes *ContainerAttributes `protobuf:"bytes,1,opt,name=attributes,proto3" json:"attributes,omitempty"`
// CPU usage gathered from the container.
Cpu *CpuUsage `protobuf:"bytes,2,opt,name=cpu,proto3" json:"cpu,omitempty"`
// Memory usage gathered from the container.
Memory *MemoryUsage `protobuf:"bytes,3,opt,name=memory,proto3" json:"memory,omitempty"`
// Usage of the writable layer.
WritableLayer *FilesystemUsage `protobuf:"bytes,4,opt,name=writable_layer,json=writableLayer,proto3" json:"writable_layer,omitempty"`
// Swap usage gathered from the container.
Swap *SwapUsage `protobuf:"bytes,5,opt,name=swap,proto3" json:"swap,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_sizecache int32 `json:"-"`
}
so containerd has ability to get container fs usage ,but why cadvisor not call this ListContainerStats api? |
I believe that is gated on the Have you tried enabling it on your node? |
It looks like that breaks other things: kubernetes/kubernetes#111276 |
+1 Is there any solution now? |
What happened to the usage metric? |
Is there any solution to find disk usage metrics for containerd via prometheus ? |
@robini - quoting @george-angel :
|
Yep also been running that for about a year. |
Is there any timeline for a fix of this issue? 🙏 |
Hi all - when can we expect fix for this ? |
When switching from docker to containerd as my container runtime in Kubernetes, I noticed that
container_fs_usage_bytes
metrics were no longer being exported for my containers.It looks like disk usage metrics aren't implemented for containerd, as noted by this comment: https://github.com/google/cadvisor/blob/v0.38.6/container/containerd/handler.go#L164-L165.
Disk usage is a pretty important metric to monitor, so I think, if possible, this should be added.
The text was updated successfully, but these errors were encountered: