Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing metrics on cgroups v2 #3007

Merged
merged 2 commits into from
Nov 9, 2021

Conversation

mhermida
Copy link
Contributor

@mhermida mhermida commented Nov 6, 2021

Path lookups fail in the cgroups v2 cgroupPaths map (common/helpers.go, getSpecInternal) that only has key "" and some metrics are missing (cpu cfs related ones, for instance).

@google-cla google-cla bot added the cla: yes label Nov 6, 2021
@k8s-ci-robot
Copy link
Collaborator

Hi @mhermida. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobbypage
Copy link
Collaborator

/cc @kolyshkin @giuseppe

@@ -195,7 +195,8 @@ func getSpecInternal(cgroupPaths map[string]string, machineInfoFactory info.Mach
}

// Processes, read it's value from pids path directly
pidsRoot, ok := cgroupPaths["pids"]
//pidsRoot, ok := cgroupPaths["pids"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line

@bobbypage
Copy link
Collaborator

LGTM on the change overall, I tested out cadvisor on cgroupv2 image without this change and added some debug logging of the cgroupsPath and they looked like this:

cgroupsPath:map[:/sys/fs/cgroup/system.slice/systemd-update-utmp.service]
cgroupsPath:map[:/sys/fs/cgroup/system.slice/containerd.service]
cgroupsPath:map[:/sys/fs/cgroup/system.slice/containerd.service]
cgroupsPath:map[:/sys/fs/cgroup/system.slice/docker.socket]

so as a result, none of the subcontrollers were found. So I think change here is appropriate, on cgroupv2 we will only have a single cgroupsPath with key "" and from there we can get the individual controller files.

@kolyshkin
Copy link
Contributor

Overall SGTM, although I don't like the complication of having to add cgroup2UnifiedMode to every call. Perhaps this can be dropped, and getControllerPath should determine whether we use v2 all by itself (the call is very cheap as we cache it).

I'm on PTO, let me take a closer look later this week.

@mhermida
Copy link
Contributor Author

mhermida commented Nov 9, 2021

Hi @bobbypage, @kolyshkin
thanks for your reviews and comments.

...although I don't like the complication of having to add cgroup2UnifiedMode to every call.

yes, I don't like it much either, but the only way of avoiding it I found was to call cgroups.IsCgroup2UnifiedMode() within getControllerPath and I wasn't very convinced about that one too, so whatever you think is best.

@giuseppe
Copy link
Contributor

giuseppe commented Nov 9, 2021

the cgroups.IsCgroup2UnifiedMode() result is memoized, so it should be fine to call it from getControllerPath()

@mhermida
Copy link
Contributor Author

mhermida commented Nov 9, 2021

Thanks @giuseppe!!
The thing dropping the parameter cgroup2UnifiedMode from getControllerPath (which could also be dropped from getSpecInternal) and calling cgroups.IsCgroup2UnifiedMode() from the functions is that testing becomes a bit more complex as we need to adapt the behaviour of cgroups.IsCgroup2UnifiedMode() to cgroups v1 or cgroups v2 depending on which case are the tests for. Not sure how to do that...

@giuseppe
Copy link
Contributor

giuseppe commented Nov 9, 2021

I am fine with the current version. Deferring to @bobbypage and @kolyshkin :)

@bobbypage
Copy link
Collaborator

bobbypage commented Nov 9, 2021

ack, I'm happy with change here too, we can always come back and refactor the cgroupv2 check. Thank you for the fix.

@bobbypage
Copy link
Collaborator

/ok-to-test

@bobbypage bobbypage merged commit b0be1ac into google:master Nov 9, 2021
@kolyshkin
Copy link
Contributor

The thing dropping the parameter cgroup2UnifiedMode from getControllerPath (which could also be dropped from getSpecInternal) and calling cgroups.IsCgroup2UnifiedMode() from the functions is that testing becomes a bit more complex as we need to adapt the behaviour of cgroups.IsCgroup2UnifiedMode() to cgroups v1 or cgroups v2 depending on which case are the tests for. Not sure how to do that...

I see two approaches.

The first one would be to check whether you have v1 or v2 right inside the test, and only use test cases for the available version. This relies on the fact that CI is executed in both cgroup v1 and cgroup v2 environments (which AFAIK is not entirely true at the moment, but this can and should be fixed). The upside is it's trivial to implement, the downside you need different test environments (which, to my mind, is OK).

The alternative, somethat more complicated, is to add something like

// isV2 can be overridden by some unit tests.
var isV2 = cgroups.IsCgroup2UnifiedMode

to the code, and use the new isV2() everywhere you need it.

Now, you can override it on a per-test basis, something like (note this is entirely untested):

func TestThis(t *testing.T) {
  isV2Saved := isV2
  t.Cleanup(func() {isV2 = isV2Saved})

  // Tests for cgroup v1.
  isV2 = func() bool {return false}
  ....
 
  // Tests for cgroup v2.
  isV2 = func() bool {return true} 
  ....

The downsides are (1) this brings in some complexity into the test code and (2) you can'd execute such tests in parallel (if you want to, this calls for more complexity).

Overall, I'd choose the first solution, but I'm OK with the code as it is.

@kolyshkin
Copy link
Contributor

This relies on the fact that CI is executed in both cgroup v1 and cgroup v2 environments (which AFAIK is not entirely true at the moment, but this can and should be fixed).

Filed #3009. Should be easy to implement.

@mhermida mhermida deleted the missing_metrics_cgroupsv2 branch November 10, 2021 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants