Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update heuristic for container creation time #2800

Merged
merged 3 commits into from
Mar 3, 2021

Conversation

odinuge
Copy link
Contributor

@odinuge odinuge commented Feb 7, 2021

The file "cgroup.clone_children" is not a part of cgroup v2, so use
"cgroup.events" instead. This heuristic is quite bad, so also check the
cgroup directories to see.

The old metric based on this is also constantly increasing due to this, making it more or less useless:

Screenshot from 2021-02-07 17-06-27

The file "cgroup.clone_children" is not a part of cgroup v2, so use
"cgroup.events" instead. This heuristic is quite bad, so also check the
cgroup directories to see.
@google-cla google-cla bot added the cla: yes label Feb 7, 2021
@k8s-ci-robot
Copy link
Collaborator

Hi @odinuge. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@iwankgb
Copy link
Collaborator

iwankgb commented Feb 7, 2021

/ok-to-test

Copy link
Collaborator

@iwankgb iwankgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odinuge, can you provide a test for your change, please?

@odinuge
Copy link
Contributor Author

odinuge commented Feb 10, 2021

@odinuge, can you provide a test for your change, please?

Yeah! Thought I could see if you guys were interested before writing tests. This function doesn't have a test already, but can definitely add one.

@iwankgb
Copy link
Collaborator

iwankgb commented Feb 10, 2021

The chart the you added to this PR clearly show a bug: monotonically increasing start time of a running container :D

@bobbypage
Copy link
Collaborator

bobbypage commented Feb 12, 2021

Thanks @odinuge for the fix. Do you think this will fix kubernetes/kubernetes#98494 as well?

I'm curious how well the ModTime of the cgroup path works as a heuristic there as you proposed.

@odinuge
Copy link
Contributor Author

odinuge commented Feb 12, 2021

Do you think this will fix kubernetes/kubernetes#98494 as well?

Ahh, I see. The current, and with my new patch, that would probably not be the case. In cadvisor, the CreationTime

// Time at which the container was created.
CreationTime time.Time `json:"creation_time,omitempty"`

clearly states that is it the time of the container creation (and that is not ==start time of a container). I think that the CreationTime should keep its value as long as the cgroup dir is not removed, and with that logic, this patch works.

However, if you want the start time of the CRI, I think we need another (and better) heuristic, preferably from CRI and not by reading cgroup files. I really don't think it is a good idea to track system daemon start times by reading mod times on cgroup files, compared to just providing it from the system or CRI. That is however another discussion.

I'm curious how well the ModTime of the cgroup path works as a heuristic there as you proposed.

If we talk about the "container creation time", I think this patch is better than the old one, but I still think that this heuristic can never be 100% by nature, due to the fact that cgroups aren't designed to to this. One can look at the start time of the oldest proc in a cgroup, or something like that.

Also, started looking at tests, but not really sure how useful they will be. Guess I have to mock an fs, or use temp dirs.

@bobbypage
Copy link
Collaborator

bobbypage commented Feb 17, 2021

Thanks @odinuge. Agree, the goal here is to track CreationTime, not last start time.

However, if you want the start time of the CRI, I think we need another (and better) heuristic, preferably from CRI and not by reading cgroup files. I really don't think it is a good idea to track system daemon start times by reading mod times on cgroup files, compared to just providing it from the system or CRI. That is however another discussion.

Definitely, agree the tracking of that should be moved into CRI. I think the confusion stemed from the fact that the kubelet summary API calls this field startTime while it is in fact CreationTime.

LGTM on your changes here. Only one question, did you have a chance to test out the modTime of the cgroup and to see it if results in more or less stable CreationTime? I hope the results are a bit better than the graph you posted in #2800 (comment) since we should expect that CreationTime should remain static for a running container :)

@harche
Copy link
Contributor

harche commented Feb 17, 2021

Thanks @odinuge, I see failures reported in /stats/summary test to with respect to container start time. I have documented it here, #2801 (comment)

@harche
Copy link
Contributor

harche commented Feb 17, 2021

I just verified running Summary API test with this change, and it seem like this PR addresses the start time related failures in /stats/summary reported in #2801 (comment)

Thanks @odinuge for addressing this.

@harche
Copy link
Contributor

harche commented Feb 17, 2021

LGTM

@bobbypage bobbypage self-assigned this Feb 17, 2021
@odinuge
Copy link
Contributor Author

odinuge commented Feb 17, 2021

Thanks @bobbypage!

LGTM on your changes here. Only one question, did you have a chance to test out the modTime of the cgroup and to see it if results in more or less stable CreationTime? I hope the results are a bit better than the graph you posted in #2800 (comment) since we should expect that CreationTime should remain static for a running container :)

Will test this later today and tomorrow, and report back as soon as I have more details.. 😄

Definitely, agree the tracking of that should be moved into CRI. I think the confusion stemed from the fact that the kubelet summary API calls this field startTime while it is in fact CreationTime.

Yeah! If k8s wants a better "started" time, they could look into it separately. 👍

@harche: Thanks @odinuge for addressing this.

Thanks! I am working with this becuase I want a more mature cgroup v2 support in Kubernetes so that I/we can start using it in production. 😄

@odinuge
Copy link
Contributor Author

odinuge commented Feb 22, 2021

I have only had time to test with cgroup v2 (have no clusters with prometheus and cgroup v2), but it looks like it works well with there!

Pods on a given node:

Screenshot from 2021-02-22 09-36-33

The lowest line is kubepods.slice, and the timestamp corresponds to boot.

Here is a zoomed out one where one clearly can see the ever increasing metric, before it drops down with this patch:

Screenshot from 2021-02-22 09-41-22


The only issue I see so far is the root cgroup (only applies to cgroup v1, looks like this is correct for v2). You see it as the bottom line here, with a timestamp corresponding to the install date of the OS (I assume, since it is a date in early 2020). Maybe we can use boot time for the root cgroup?

edit: fixed by using boot time as lowest possible value

Screenshot from 2021-02-22 09-36-46

@harche
Copy link
Contributor

harche commented Feb 22, 2021

The only issue I see so far is the root cgroup (only applies to cgroup v1, looks like this is correct for v2). You see it as the bottom line here, with a timestamp corresponding to the install date of the OS (I assume, since it is a date in early 2020). Maybe we can use boot time for the root cgroup?

Makes sense to use boot time.

@odinuge
Copy link
Contributor Author

odinuge commented Feb 22, 2021

Makes sense to use boot time.

Nice! Added a check to instead always use max(creation_time, boot) in the last commit. Does that look ok? The fix is working, and the root cgroup gets the boot time as "CreatedAt".

}
}

for _, cgroupPath := range cgroupPaths {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we iterating over cgroupPaths again when we already doing it at line 79? Can we not merge the two loops and just iterate once? The only difference that I see between the first loop and the second is you are updating cgroupPath variable based on if it's v1 or v2.

Copy link
Contributor Author

@odinuge odinuge Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it would be cleaner in order to make it simpler to understand the flow, but I can merge them. (If you are thinking about performance, I have no idea what the go compiler will do to cases like this). Agree that the duplicated logic is kinda stupid, so guess I can fix that anyway. Thanks! 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new change now. Does that look better @harche?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks @odinuge

Copy link
Contributor

@harche harche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

On some systems, root cgroup might report the time when creating the
folder /sys/fs/cgroup/subsys, so limit to boot time.
@bobbypage
Copy link
Collaborator

LGTM

@bobbypage bobbypage merged commit 1fcfaa6 into google:master Mar 3, 2021
@odinuge odinuge deleted the container_creation_time branch March 8, 2021 10:23
chenchun pushed a commit to chenchun/kubernetes that referenced this pull request Mar 20, 2024
chenchun pushed a commit to chenchun/kubernetes that referenced this pull request Mar 20, 2024
移植upstream对kubelet及cadvisor的修改,修复使用cgroupv2时指标收集统计的问题
1. port cadvisor pr google/cadvisor#2839 reading cpu stats on cgroup v2
2. port cadvisor pr google/cadvisor#2837 read "max" value for cgroup v2
3. port cadvisor pr google/cadvisor#2801 gathering of stats for root cgroup on v2
4. port cadvisor pr google/cadvisor#2800: Update heuristic for container creation time
5. Fix cgroup handling for systemd with cgroup v2
6. test: adjust summary test for cgroup v2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants