Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroupv2: fix swap accounting #3001

Merged
merged 1 commit into from
Nov 9, 2021

Conversation

bobrik
Copy link
Contributor

@bobrik bobrik commented Nov 2, 2021

Since runsc v1.0.0-rc94 stats.MemoryStats.SwapUsage on cgroupv2 includes memory usage:

This results in wrong metrics:

container_memory_rss{id="/system.slice/qs-dnsv2.service"} 9.173716992e+09 1635890983582
container_memory_swap{id="/system.slice/qs-dnsv2.service"} 1.249011712e+10 1635890983582
container_memory_usage_bytes{id="/system.slice/qs-dnsv2.service"} 1.249011712e+10 1635890983582

Here my service dosn't have any swapped pages, yet it reports all memory as swapped.

Since runsc v1.0.0-rc94 `stats.MemoryStats.SwapUsage` on cgroupv2 includes memory usage:

* opencontainers/runc@b99ca25ad0f

This results in wrong metrics:

```
container_memory_rss{id="/system.slice/qs-dnsv2.service"} 9.173716992e+09 1635890983582
container_memory_swap{id="/system.slice/qs-dnsv2.service"} 1.249011712e+10 1635890983582
container_memory_usage_bytes{id="/system.slice/qs-dnsv2.service"} 1.249011712e+10 1635890983582
```

Here my service dosn't have any swapped pages, yet it reports all memory as swapped.
@google-cla google-cla bot added the cla: yes label Nov 2, 2021
@k8s-ci-robot
Copy link
Collaborator

Hi @bobrik. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobrik
Copy link
Contributor Author

bobrik commented Nov 2, 2021

I'm not really sure if it's something that needs to be fixed or if I'm expected to do the math myself at query time.

@bobrik
Copy link
Contributor Author

bobrik commented Nov 2, 2021

The metric is correct on cgroupv1, so I think the right way is to merge this fix for cgroupv2.

@bobbypage
Copy link
Collaborator

/cc @kolyshkin

@bobbypage
Copy link
Collaborator

makes sense, LGTM.

/cc @kolyshkin who made the runc change

@kolyshkin
Copy link
Contributor

Yeah this is weird. cgroup v1 reports mem+swap in this field, so we had to make cgroup v2 report the same. Not sure if there's a better way to address this.

The patch LGTM though.

@bobbypage
Copy link
Collaborator

Thanks, LGTM!

@bobbypage
Copy link
Collaborator

/ok-to-test

@bobbypage
Copy link
Collaborator

/retest

@bobbypage bobbypage merged commit 3c6e309 into google:master Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants