Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime should WARN / ignore capabilities that cannot be granted #1094

Merged
merged 1 commit into from
Mar 26, 2021

Conversation

thaJeztah
Copy link
Member

(probably) supersedes #1071 "Proposal: define a "ALL_CAPS" pseudo-capability to grant all capabilities"
Relates to:

Proposal: runtime should ignore capabilities that cannot be granted

Currently, the specification requires runtimes to produce a (fatal) error if a
container configuration requests capabilities that cannot be granted (either
the capability is "unknown" to the runtime, not supported by the kernel version
in use, or not available in the environment that the runtime operates in).

This causes problems in situations where the runtime is running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

  • Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
    capabilities. Docker 20.10.0 ("higher level runtime") shipped with
    an updated list of capabilities, and when creating a "privileged" container,
    would determine what capabilities are known by the kernel in use, and request
    all those capabilities (by including them in the container config).
    However, runc did not yet have an updated list of capabilities, and therefore
    reject the container specification, producing an error because the new
    capabilities were "unknown".
  • When running nested containers, for example, when running docker-in-docker,
    the "inner" container may be using a more recent version of docker than the
    "outer" container. In this situation, the "outer" container may be missing
    capabilities that the inner container expects to be supported (based on
    kernel version). However, starting the container would fail, because the OCI
    runtime could not grant those capabilities (them not being available in the
    environment it's running in).

Workarounds, and motivation

In the current situation, responsibility of detection what capabilities are
supported is left to the "higher level" runtimes. As an example, containerd
recently added code to dynamically adjust the list of requested capabilities
by attempting to detect which capabilities are available in the environment
it's running. This is only a partial solution, as it will not address
mismatches between the list of capabilities known by the higher-level and
lower-level runtime (which cannot be detected).

Not only does this workaround only provide a partial fix, it also introduces
additional complexity in every higher-level runtime.

Proposal: WARN (but otherwise ignore) capabilities that cannot be granted

This patch changes the specification to have runtimes WARN (but otherwise
ignore) capabilities that are requested in the container config, but cannot
be granted.

Moving this responsibility to the lower-level (OCI) runtime makes more sense,
as the OCI runtime already is responsible for interacting with the kernel
(detecting what capabilities are supported, and performing conversion), and
only the lower-level runtime itself knows what capabilities it supports itself.
Making the lower-level runtime responsible for handling "unknown" or "unavailable"
capabilities keeps the logic central.

Impact on security

Given that capabilities is an "allow-list", ignoring unknown capabilities will
not impose a security risk; worst case, a container does not get all requested
capabilities granted and as a result, some actions may fail.

Backward-compatibility

Changing this behavior should be backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as capsh / libcap to get the list of actual capabilities in the
container.

Currently, the specification requires runtimes to produce a (fatal) error if a
container configuration requests capabilities that cannot be granted (either
the capability is "unknown" to the runtime, not supported by the kernel version
in use, or not available in the environment that the runtime operates in).

This causes problems in situations where the runtime is running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

Workarounds, and motivation
-------------------------------------

In the current situation, responsibility of detection what capabilities are
supported is left to the "higher level" runtimes. As an example, containerd
recently added code to dynamically adjust the list of requested capabilities
by attempting to detect which capabilities are available in the environment
it's running. This is only a partial solution, as it will not address
mismatches between the list of capabilities _known_ by the higher-level and
lower-level runtime (which cannot be detected).

Not only does this workaround only provide a *partial* fix, it also introduces
additional complexity in every higher-level runtime.

Proposal: WARN (but otherwise ignore) capabilities that cannot be granted
-------------------------------------

This patch changes the specification to have runtimes WARN (but otherwise
ignore) capabilities that are requested in the container config, but cannot
be granted.

Moving this responsibility to the lower-level (OCI) runtime makes more sense,
as the OCI runtime _already_ is responsible for interacting with the kernel
(detecting what capabilities are supported, and performing conversion), and
only the lower-level runtime itself knows what capabilities it supports itself.
Making the lower-level runtime responsible for handling "unknown" or "unavailable"
capabilities keeps the logic central.

Impact on security
-------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities will
not impose a security risk; worst case, a container does not get all requested
capabilities granted and as a result, some actions may fail.

Backward-compatibility
-------------------------------------

Changing this behavior should be backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member Author

@thaJeztah thaJeztah changed the title Proposal: runtime should ignore capabilities that cannot be granted Proposal: runtime should WARN / ignore capabilities that cannot be granted Mar 9, 2021
@giuseppe
Copy link
Member

giuseppe commented Mar 9, 2021

I think this is a good idea.

PS: container engines could use the numerical value for capabilities instead of their name, a safe approach would probably be to list both and allow duplicates.

@thaJeztah
Copy link
Member Author

container engines could use the numerical value for capabilities instead of their name, a safe approach would probably be to list both and allow duplicates.

Ah, yes, you mentioned that idea on #1071 (comment) as well. I think accepting "both" would not be a bad idea. It should not be "enforced" on the higher-level tools to MUST specify numeric values (IMO), as that would enforce such runtimes to "know" about how to translate, and I can envision many situations where it's desirable to keep some form of abstraction.
I think we could address that in a separate proposal.

@thaJeztah
Copy link
Member Author

FWIW; I've also been playing with the idea to propose treating seccomp/apparmor/selinux profiles in a similar fashion ("higher level runtime: here's some profiles in the config" -> "lower-level runtime: ACK; I'll apply whatever I support"), but that one is likely more "tricky" as "not supporting" means "less secure than requested". I can still open a proposal for that (mostly to discuss what would be an acceptable approach)

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (not sure why caps were MUST in the first place)

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (IANAM)

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Not granting a capability that the user requested will be unexpected (hence the warning), but won't make their container any less secure (ironically the opposite, likely to their consternation), so I'm +1 on this change. 😄

@tianon
Copy link
Member

tianon commented Mar 13, 2021

(Do you have a runc PR in-progress so we can see what a concrete implementation of this change might look like? 😄)

@thaJeztah
Copy link
Member Author

Do you have a runc PR in-progress so we can see what a concrete implementation of this change might look like? 😄

No, I don't (I opened this proposal because it kept dropping off my list, and it came up in a chat 😁).

I can have a look if I have some time (but others feel free to beat me to it 😇)

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM(nb)

thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 15, 2021
…ities

This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 15, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 15, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 15, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member Author

Do you have a runc PR in-progress so we can see what a concrete implementation of this change might look like? 😄

No, I don't (I opened this proposal because it kept dropping off my list, and it came up in a chat 😁).

I can have a look if I have some time (but others feel free to beat me to it 😇)

@tianon opened opencontainers/runc#2854

Copy link

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 17, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 17, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 19, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 19, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 19, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member Author

@vbatts @crosbymichael @cyphar @mrunalp ptal; any concerns with this change? 🤗

thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 22, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@tianon
Copy link
Member

tianon commented Mar 26, 2021

I'm satisfied, thank you! Conceptually this makes sense, the security-related impact is low (as noted), and we've got a solid implementation showing an approach for implementing it. 👍

(I'm going to merge now, which I'm sure will bring out any dissenting votes 🙈 ❤️)

@tianon tianon merged commit 1c3f411 into opencontainers:master Mar 26, 2021
@thaJeztah thaJeztah deleted the warn_caps branch March 26, 2021 20:17
@thaJeztah
Copy link
Member Author

Thanks!

thaJeztah added a commit to thaJeztah/runc that referenced this pull request Mar 26, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
giuseppe added a commit to giuseppe/crun that referenced this pull request Mar 29, 2021
opencontainers/runtime-spec#1094 introduced a
change where unknown capabilities must be ignored.  The runtime MUST
log a warning instead of failing with an error.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/crun that referenced this pull request Mar 29, 2021
opencontainers/runtime-spec#1094 introduced a
change where unknown capabilities must be ignored.  The runtime MUST
log a warning instead of failing with an error.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
FrankR85 pushed a commit to FrankR85/runtime-spec that referenced this pull request Apr 11, 2021
Runtime should WARN / ignore capabilities that cannot be granted

Signed-off-by: FrankR85 <frank@internetschach.de>
FrankR85 pushed a commit to FrankR85/runtime-spec that referenced this pull request Apr 11, 2021
Runtime should WARN / ignore capabilities that cannot be granted
sign: FrankR85 <frank@internetschach.de>
AdamKorcz pushed a commit to AdamKorcz/runc that referenced this pull request May 22, 2021
This updates handling of capabilities to match the updated runtime specification,
in opencontainers/runtime-spec#1094.

Prior to that change, the specification required runtimes to produce a (fatal)
error if a container configuration requested capabilities that could not be
granted (either the capability is "unknown" to the runtime, not supported by the
kernel version in use, or not available in the environment that the runtime
operates in).

This caused problems in situations where the runtime was running in a restricted
environment (for example, docker-in-docker), or if there is a mismatch between
the list of capabilities known by higher-level runtimes and the OCI runtime.

Some examples:

- Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
  capabilities. Docker 20.10.0 ("higher level runtime") shipped with
  an updated list of capabilities, and when creating a "privileged" container,
  would determine what capabilities are known by the kernel in use, and request
  all those capabilities (by including them in the container config).
  However, runc did not yet have an updated list of capabilities, and therefore
  reject the container specification, producing an error because the new
  capabilities were "unknown".
- When running nested containers, for example, when running docker-in-docker,
  the "inner" container may be using a more recent version of docker than the
  "outer" container. In this situation, the "outer" container may be missing
  capabilities that the inner container expects to be supported (based on
  kernel version). However, starting the container would fail, because the OCI
  runtime could not grant those capabilities (them not being available in the
  environment it's running in).

WARN (but otherwise ignore) capabilities that cannot be granted
--------------------------------------------------------------------------------

This patch changes the handling to WARN (but otherwise ignore) capabilities that
are requested in the container config, but cannot be granted, alleviating higher
level runtimes to detect what capabilities are supported (by the kernel, and
in the current environment), as well as avoiding failures in situations where
the higher-level runtime is aware of capabilities that are not (yet) supported
by runc.

Impact on security
--------------------------------------------------------------------------------

Given that `capabilities` is an "allow-list", ignoring unknown capabilities does
not impose a security risk; worst case, a container does not get all requested
capabilities granted and, as a result, some actions may fail.

Backward-compatibility
--------------------------------------------------------------------------------

This change should be fully backward compatible. Higher-level runtimes that
already dynamically adjust the list of requested capabilities can continue to do
so. Runtimes that do not adjust will see an improvement (containers can start
even if some of the requested capabilities are not granted). Container processes
MAY fail (as described in "impact on security"), but users can debug this
situation either by looking at the warnings produces by the OCI runtime, or using
tools such as `capsh` / `libcap` to get the list of actual capabilities in the
container.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah changed the title Proposal: runtime should WARN / ignore capabilities that cannot be granted runtime should WARN / ignore capabilities that cannot be granted Oct 8, 2021
@AkihiroSuda AkihiroSuda mentioned this pull request Jan 24, 2023
@AkihiroSuda AkihiroSuda added this to the v1.1.0 milestone Feb 1, 2023
@AkihiroSuda AkihiroSuda mentioned this pull request Jun 26, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants