Skip to content

runc 1.2.0-rc.1 -- "There's a frood who really knows where his towel is."

Pre-release
Pre-release
Compare
Choose a tag to compare
@cyphar cyphar released this 03 Apr 11:13
· 402 commits to main since this release
v1.2.0-rc.1
275e6d8

This is the first release candidate for the 1.2.0 branch of runc. It includes
all patches and bugfixes included in runc 1.1 patch releases (up to and
including 1.1.12). A fair few new features have been added, and some changes
have been made which may affect users. Please help us thoroughly test this
release before we release 1.2.0.

runc now requires a minimum of Go 1.20 to compile.

NOTE: runc currently will not work properly when compiled with Go 1.22 or
newer. This is due to some unfortunate glibc behaviour that Go 1.22
exacerbates in a way that results in containers not being able to start on
some systems. See this issue for more information.

Breaking

  • Several aspects of how mount options work has been adjusted in a way that
    could theoretically break users that have very strange mount option strings.
    This was necessary to fix glaring issues in how mount options were being
    treated. The key changes are:

    • Mount options on bind-mounts that clear a mount flag are now always
      applied. Previously, if a user requested a bind-mount with only clearing
      options (such as rw,exec,dev) the options would be ignored and the
      original bind-mount options would be set. Unfortunately this also means
      that container configurations which specified only clearing mount options
      will now actually get what they asked for, which could break existing
      containers (though it seems unlikely that a user who requested a specific
      mount option would consider it "broken" to get the mount options they
      asked foruser who requested a specific mount option would consider it
      "broken" to get the mount options they asked for). This also allows us to
      silently add locked mount flags the user did not explicitly request to be
      cleared
      in rootless mode, allowing for easier use of bind-mounts for
      rootless containers. (#3967)

    • Container configurations using bind-mounts with superblock mount flags
      (i.e. filesystem-specific mount flags, referred to as "data" in
      mount(2), as opposed to VFS generic mount flags like MS_NODEV) will
      now return an error. This is because superblock mount flags will also
      affect the host mount (as the superblock is shared when bind-mounting),
      which is obviously not acceptable. Previously, these flags were silently
      ignored so this change simply tells users that runc cannot fulfil their
      request rather than just ignoring it. (#3990)

    If any of these changes cause problems in real-world workloads, please open
    an issue
    so we
    can adjust the behaviour to avoid compatibility issues.

Added

  • runc has been updated to OCI runtime-spec 1.2.0, and supports all Linux
    features with a few minor exceptions. See
    docs/spec-conformance.md
    for more details.
  • runc now supports id-mapped mounts for bind-mounts (with no restrictions on
    the mapping used for each mount). Other mount types are not currently
    supported. This feature requires MOUNT_ATTR_IDMAP kernel support (Linux
    5.12 or newer) as well as kernel support for the underlying filesystem used
    for the bind-mount. See mount_setattr(2) for a list of
    supported filesystems and other restrictions. (#3717, #3985, #3993)
  • Two new mechanisms for reducing the memory usage of our protections against
    CVE-2019-5736 have been introduced:
    • runc-dmz is a minimal binary (~8K) which acts as an additional execve
      stage, allowing us to only need to protect the smaller binary. It should
      be noted that there have been several compatibility issues reported with
      the usage of runc-dmz (namely related to capabilities and SELinux). As
      such, this mechanism is opt-in and can be enabled by running runc
      with the environment variable RUNC_DMZ=true (setting this environment
      variable in config.json will have no effect). This feature can be
      disabled at build time using the runc_nodmz build tag. (#3983, #3987)
    • contrib/memfd-bind is a helper daemon which will bind-mount a memfd copy
      of /usr/bin/runc on top of /usr/bin/runc. This entirely eliminates
      per-container copies of the binary, but requires care to ensure that
      upgrades to runc are handled properly, and requires a long-running daemon
      (unfortunately memfds cannot be bind-mounted directly and thus require a
      daemon to keep them alive). (#3987)
  • runc will now use cgroup.kill if available to kill all processes in a
    container (such as when doing runc kill). (#3135, #3825)
  • Add support for setting the umask for runc exec. (#3661)
  • libct/cg: support SCHED_IDLE for runc cgroupfs. (#3377)
  • checkpoint/restore: implement --manage-cgroups-mode=ignore. (#3546)
  • seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by
    default. (#3588)
  • libct/cg/sd: use systemd v240+ new MAJOR:* syntax. (#3843)
  • Support CFS bandwidth burst for CPU. (#3749, #3145)
  • Support time namespaces. (#3876)
  • Reduce the runc binary size by ~11% by updating
    github.com/checkpoint-restore/go-criu. (#3652)
  • Add --pidfd-socket to runc run and runc exec to allow for management
    processes to receive a pidfd for the new process, allowing them to avoid pid
    reuse attacks. (#4045)

Deprecated

  • runc option --criu is now ignored (with a warning), and the option will
    be removed entirely in a future release. Users who need a non-standard
    criu binary should rely on the standard way of looking up binaries in
    $PATH. (#3316)
  • runc kill option -a is now deprecated. Previously, it had to be specified
    to kill a container (with SIGKILL) which does not have its own private PID
    namespace (so that runc would send SIGKILL to all processes). Now, this is
    done automatically. (#3864, #3825)
  • github.com/opencontainers/runc/libcontainer/user is now deprecated, please
    use github.com/moby/sys/user instead. It will be removed in a future
    release. (#4017)

Changed

  • When Intel RDT feature is not available, its initialization is skipped,
    resulting in slightly faster runc exec and runc run. (#3306)
  • runc features is no longer experimental. (#3861)
  • libcontainer users that create and kill containers from a daemon process
    (so that the container init is a child of that process) must now implement
    a proper child reaper in case a container does not have its own private PID
    namespace, as documented in container.Signal. (#3825)
  • Sum anon and file from memory.stat for cgroupv2 root usage,
    as the root does not have memory.current for cgroupv2.
    This aligns cgroupv2 root usage more closely with cgroupv1 reporting.
    Additionally, report root swap usage as sum of swap and memory usage,
    aligned with v1 and existing non-root v2 reporting. (#3933)
  • Add swapOnlyUsage in MemoryStats. This field reports swap-only usage.
    For cgroupv1, Usage and Failcnt are set by subtracting memory usage
    from memory+swap usage. For cgroupv2, Usage, Limit, and MaxUsage
    are set. (#4010)
  • libcontainer users that create and kill containers from a daemon process
    (so that the container init is a child of that process) must now implement
    a proper child reaper in case a container does not have its own private PID
    namespace, as documented in container.Signal. (#3825)
  • libcontainer: container.Signal no longer takes an all argument. Whether
    or not it is necessary to kill all processes in the container individually
    is now determined automatically. (#3825, #3885)
  • seccomp: enable seccomp binary tree optimization. (#3405)
  • runc run/runc exec: ignore SIGURG. (#3368)
  • Remove tun/tap from the default device allowlist. (#3468)
  • runc --root non-existent-dir list now reports an error for non-existent
    root directory. (#3374)

Fixed

  • In case the runc binary resides on tmpfs, runc init no longer re-execs
    itself twice. (#3342)
  • Our seccomp -ENOSYS stub now correctly handles multiplexed syscalls on
    s390 and s390x. This solves the issue where syscalls the host kernel did not
    support would return -EPERM despite the existence of the -ENOSYS stub
    code (this was due to how s390x does syscall multiplexing). (#3474)
  • Remove tun/tap from the default device rules. (#3468)
  • specconv: avoid mapping "acl" to MS_POSIXACL. (#3739)
  • libcontainer: fix private PID namespace detection when killing the
    container. (#3866, #3825)
  • systemd socket notification: fix race where runc exited before systemd
    properly handled the READY notification. (#3291, #3293)
  • The -ENOSYS seccomp stub is now always generated for the native
    architecture that runc is running on. This is needed to work around some
    arguably specification-incompliant behaviour from Docker on architectures
    such as ppc64le, where the allowed architecture list is set to null. This
    ensures that we always generate at least one -ENOSYS stub for the native
    architecture even with these weird configs. (#4219)

Removed

  • In order to fix performance issues in the "lightweight" bindfd protection
    against CVE-2019-5736, the temporary ro bind-mount of
    /proc/self/exe has been removed. runc now creates a binary copy in all
    cases. See the above notes about memfd-bind and runc-dmz as well as
    contrib/cmd/memfd-bind/README.md for more information about how this
    (minor) change in memory usage can be further reduced. (#3987, #3599, #2532,
    #3931)
  • libct/cg: Remove EnterPid (a function with no users). (#3797)
  • libcontainer: Remove {Pre,Post}MountCmds which were never used and are
    obsoleted by more generic container hooks. (#3350)

Static Linking Notices

The runc binary distributed with this release are statically linked with
the following GNU LGPL-2.1 licensed libraries, with runc acting
as a "work that uses the Library":

The versions of these libraries were not modified from their upstream versions,
but in order to comply with the LGPL-2.1 (§6(a)), we have attached the
complete source code for those libraries which (when combined with the attached
runc source code) may be used to exercise your rights under the LGPL-2.1.

However we strongly suggest that you make use of your distribution's packages
or download them from the authoritative upstream sources, especially since
these libraries are related to the security of your containers.


Thanks to the following contributors who made this release possible:

Signed-off-by: Aleksa Sarai cyphar@cyphar.com