Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong memory limit stats from podman's remote stats API #14676

Closed
pjknkda opened this issue Jun 21, 2022 · 4 comments · Fixed by #14677
Closed

Wrong memory limit stats from podman's remote stats API #14676

pjknkda opened this issue Jun 21, 2022 · 4 comments · Fixed by #14677
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@pjknkda
Copy link

pjknkda commented Jun 21, 2022

Description

podman remote stats API reports wrong "memory_limit" stats for memory-limited container when the container is launched with crun. I am not sure whether the issue comes from podman or crun, but decided to report here because the issue is gone if I change the runtime to runc.

How to reproduce

  1. Run rootful podman API daemon

    podman system service -t 0 tcp:127.0.0.1:12345
    
  2. Launch a container with latest crun runtime with memory limit

    podman run --rm -it -m 512m --runtime=/crun-1.4.5-linux-amd64 --name test busybox
    
  3. Check the output from podman's remote stats API

    curl http://127.0.0.1:12345/containers/test/stats?stream=false
    
  4. Confirm that memory_stats.limit is wrongly reported.

    {
        "read": "2022-06-21T05:27:32.81008054Z",
        "preread": "0001-01-01T00:00:00Z",
        "pids_stats": ...,
        "blkio_stats": ...,
        "num_procs": ...,
        "storage_stats": ...,
        "cpu_stats": ...,,
        "precpu_stats": ...,
        "memory_stats": {
            "usage": 589824,
            "max_usage": 18446744073709551615,
            "limit": 18446744073709551615
        },
        "name": "test",
        "Id": ...,
        "networks": ...
    }
    
  5. Repeat the same procedure with runc runtime

    podman run --rm -it -m 512m --runtime=/home/ubuntu/runc-1.1.3.amd64 --name test busybox
    
  6. Confirm that memory_stats.limit is correctly reported.

    {
        "read": "2022-06-21T05:33:54.710659699Z",
        "preread": "0001-01-01T00:00:00Z",
        "pids_stats": ...,
        "blkio_stats": ...,
        "num_procs": ...,
        "storage_stats": ...,
        "cpu_stats": ...,,
        "precpu_stats": ...,
        "memory_stats": {
            "usage": 335872,
            "max_usage": 536870912,
            "limit": 536870912
        },
        "name": "test",
        "Id": ...,
        "networks": ...
    }
    

Environment

root@machine:/home/ubuntu# podman info
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 16
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: {maksed}
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.0-1004-aws
  linkmode: dynamic
  logDriver: journald
  memFree: 1502658560
  memTotal: 67403698176
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.6.1
  swapFree: 0
  swapTotal: 0
  uptime: {maksed}
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: {maksed}
    paused: {maksed}
    running: {maksed}
    stopped: {maksed}
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: {maksed}
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.17.3
  OsArch: linux/amd64
  Version: 3.4.4
@giuseppe giuseppe transferred this issue from containers/crun Jun 21, 2022
@giuseppe giuseppe transferred this issue from containers/podman Jun 21, 2022
@flouthoc
Copy link
Collaborator

Hi @pjknkda , Thanks for creating the issue.

One difference i found is that in case of cgroupv2 crun creates cgroup with a suffix while runc creates without suffix hence crun write memory limit to the subgroup while runc on the original parent but podman only reads the max memory limit from the parent cgroup.

I am unable to find why do we have a suffix for cgroupv2 but following diff fixes issues for me. The reason it shows correct value from CLI is the CLI fetchs max limit from container spec instead of cgroup. I can open a PR for this but waiting for @giuseppe to confirm

diff --git a/src/libcrun/cgroup-systemd.c b/src/libcrun/cgroup-systemd.c
index 4931fce..04fa943 100644
--- a/src/libcrun/cgroup-systemd.c
+++ b/src/libcrun/cgroup-systemd.c
@@ -885,9 +885,6 @@ find_systemd_subgroup (json_map_string_string *annotations, int cgroup_mode)
       return annotation;
     }
 
-  if (cgroup_mode == CGROUP_MODE_UNIFIED)
-    return "container";
-
   return NULL;
 }

@giuseppe
Copy link
Member

I am looking at it, I am still not 100% sure where this should be fixed.

@giuseppe giuseppe transferred this issue from containers/crun Jun 21, 2022
giuseppe added a commit to giuseppe/libpod that referenced this issue Jun 21, 2022
use the memory limit specified for the container instead of reading it
from the cgroup.  It is not reliable to read it from the cgroup since
the container could have been moved to a different cgroup and in
general the OCI runtime might create a sub-cgroup (like crun does).

Closes: containers#14676

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

opened a PR #14677

gbraad pushed a commit to gbraad-redhat/podman that referenced this issue Jul 13, 2022
use the memory limit specified for the container instead of reading it
from the cgroup.  It is not reliable to read it from the cgroup since
the container could have been moved to a different cgroup and in
general the OCI runtime might create a sub-cgroup (like crun does).

Closes: containers#14676

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@Gaurang2811
Copy link

I am getting this error with v4.1.1: #16007

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 13, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants