Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime-config-linux: Separate mknod from cgroups #298

Merged

Conversation

wking
Copy link
Contributor

@wking wking commented Jan 12, 2016

This is #99 rebased onto the current master, because the idea seems
to have new life
. It leaves mknod entries in linux.devices
and moves cgroups entries into linux.resources.devices. Quiet
thread on the issue here.

This makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't. Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes. This will also make it easy to drop either
portion (mknod or cgroups) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup (#101).

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux
. The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here. But I would also be fine leaving it with the other
mounts in config-linux.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.

@wking
Copy link
Contributor Author

wking commented Jan 12, 2016

@mrunalp, you were going to write up some more motivation for this? It's also possible that you intended to post that motivation to the thread, and that I'm jumping the gun with this reroll. If so, feel free to table/close this PR until the thread has reached a consensus.

@mrunalp mrunalp added this to the v0.3.0 milestone Jan 13, 2016
@mrunalp
Copy link
Contributor

mrunalp commented Jan 15, 2016

We can continue using this PR. i have responded to the mailing list discussion.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 20, 2016

There are two use cases for this separation:

  1. The user doesn't want to specify any cgroup configuration. This is easy for the other values but the device cgroups are tied up the device configuration.
  2. Running containers under a non-host usernamespace is an issue, because one can't modify the device cgroups in such a scenario.

So, separation will be good for consistency as well as technical reasons.

@@ -77,93 +77,59 @@ There is a limit of 5 mappings which is the Linux kernel hard limit.

## Devices

`devices` is an array specifying the list of devices to be created in the container.
`devices` is an array specifying the list of devices that the MUST be available in the container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can devices be setup using hooks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Jan 20, 2016 at 10:15:41AM -0800, Vish Kannan wrote:

-devices is an array specifying the list of devices to be created in the container.
+devices is an array specifying the list of devices that the MUST be available in the container.

Can devices be setup using hooks?

Yeah. Or with bind-mounts, or by putting the device nodes in the
rootfs (#98). My attempt to summarize the pushback for dropping the
mknod section is in 1.

@wking wking force-pushed the separate-device-cgroups-from-mknod branch from dfb5dad to 3f8105e Compare January 20, 2016 19:08
@wking
Copy link
Contributor Author

wking commented Jan 20, 2016

Rerolled dfb5dad3f8105e adding @vishh's ‘omitempty’s [1,2,3].

@wking
Copy link
Contributor Author

wking commented Jan 20, 2016

In today's meeting there was talk about where default-device creation
happens in the container-setup timeline 1, and whether/how config
authors should be able to opt-out of default-device creation 2.
Addressing either of those seems orthogonal to the split in this PR
3, but I'd thought I'd mention them here in case anyone else
disagrees about their orthogonality.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 25, 2016

Needs rebase.

@wking
Copy link
Contributor Author

wking commented Jan 25, 2016

On Mon, Jan 25, 2016 at 11:57:44AM -0800, Mrunal Patel wrote:

Needs rebase.

I'll wait until #284 lands.

@wking wking force-pushed the separate-device-cgroups-from-mknod branch from 3f8105e to 81f726e Compare January 27, 2016 18:38
@wking
Copy link
Contributor Author

wking commented Jan 27, 2016

On Mon, Jan 25, 2016 at 12:27:43PM -0800, W. Trevor King wrote:

Mon, Jan 25, 2016 at 11:57:44AM -0800, Mrunal Patel:

Needs rebase.

I'll wait until #284 lands.

Rebased around #284 with 3f8105e81f726e.

$ diff -u <(git show 3f8105e) <(git show 81f726e)

shows no changes outside of diff context.

@@ -115,93 +107,59 @@ There is a limit of 5 mappings which is the Linux kernel hard limit.

## Devices

`devices` is an array specifying the list of devices to be created in the container.
`devices` is an array specifying the list of devices that the MUST be available in the container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/that the/that/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Jan 27, 2016 at 10:47:11AM -0800, Mrunal Patel wrote:

-devices is an array specifying the list of devices to be created in the container.
+devices is an array specifying the list of devices that the MUST be available in the container.

s/that the/that/

Fixed in 81f726ea855883.

@wking wking force-pushed the separate-device-cgroups-from-mknod branch from 81f726e to a855883 Compare January 27, 2016 18:53
@@ -228,6 +186,46 @@ You can configure a container's cgroups via the `resources` field of the Linux c
Do not specify `resources` unless limits have to be updated.
For example, to run a new process in an existing container without updating limits, `resources` need not be specified.

#### Device whitelist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it an access list instead of a whitelist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Jan 27, 2016 at 12:13:03PM -0800, Mrunal Patel wrote:

+#### Device whitelist

Should we call it an access list instead of a whitelist?

I'm just echoing the kernel docs [1](which moved! [2]). I've pushed
a855883ddd56d8 updating the link target.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitelist typically carries a positive connotation. In this case, this whitelist can be used to deny access to devices as well.

* **`path`** *(string, required)* - full path to device inside container.
* **`major, minor`** *(int64, required)* - [major, minor numbers][devices] for the device.
* **`fileMode`** *(uint32, required)* - file mode for the device.
You can also control access to devices [with cgroups](#device-whitelist).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the following fields required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Jan 27, 2016 at 01:13:14PM -0800, Vish Kannan wrote:

+* fileMode (uint32, required) - file mode for the device.

Are the following fields required?

I'm fine making fileMode, uid, and gid optional. Would they default
to the effective uid/gid/umask of the runtime process? Of the
container process as it will be when it execs any user-configured
arguments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults can also depend on it setgid bit is set on the parent dir I think.
But it shouldn't matter from the Spec level.

On Wed, Jan 27, 2016 at 1:31 PM W. Trevor King notifications@github.com
wrote:

In config-linux.md
#298 (comment):

-* permissions (string, optional) - cgroup permissions for device. A composition of r (read), w (write), and m (mknod).

-* fileMode (uint32, optional) - file mode for device file

-* uid (uint32, optional) - uid of device owner

-* gid (uint32, optional) - gid of device owner

-fileMode, uid and gid are required if path is given and are otherwise not allowed.
+* type (char, required) - type of device: c, b, u or p.

  • More info in [mknod(1)][mknod.1].
    +* path (string, required) - full path to device inside container.
    +* major, minor (int64, required) - [major, minor numbers][devices] for the device.
    +* fileMode (uint32, required) - file mode for the device.
  • You can also control access to devices with cgroups.

On Wed, Jan 27, 2016 at 01:13:14PM -0800, Vish Kannan wrote: > +*
fileMode (uint32, required) - file mode for the device. > + You can
also control access to devices with cgroups. Are the
following fields required?
I'm fine making fileMode, uid, and gid optional. Would they default to the
effective uid/gid/umask of the runtime process? Of the container process as
it will be when it execs any user-configured arguments?


Reply to this email directly or view it on GitHub
https://github.com/opencontainers/specs/pull/298/files#r51053200.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was saying was that, it is possible to use the struct without
specifying this field. Hence it is optional.
On Wed, Jan 27, 2016 at 1:35 PM Vishnu Kannan vishnuk@google.com wrote:

Defaults can also depend on it setgid bit is set on the parent dir I
think. But it shouldn't matter from the Spec level.

On Wed, Jan 27, 2016 at 1:31 PM W. Trevor King notifications@github.com
wrote:

In config-linux.md
#298 (comment):

-* permissions (string, optional) - cgroup permissions for device. A composition of r (read), w (write), and m (mknod).

-* fileMode (uint32, optional) - file mode for device file

-* uid (uint32, optional) - uid of device owner

-* gid (uint32, optional) - gid of device owner

-fileMode, uid and gid are required if path is given and are otherwise not allowed.
+* type (char, required) - type of device: c, b, u or p.

  • More info in [mknod(1)][mknod.1].
    +* path (string, required) - full path to device inside container.
    +* major, minor (int64, required) - [major, minor numbers][devices] for the device.
    +* fileMode (uint32, required) - file mode for the device.
  • You can also control access to devices with cgroups.

On Wed, Jan 27, 2016 at 01:13:14PM -0800, Vish Kannan wrote: > +*
fileMode (uint32, required) - file mode for the device. > + You can
also control access to devices with cgroups. Are the
following fields required?
I'm fine making fileMode, uid, and gid optional. Would they default to
the effective uid/gid/umask of the runtime process? Of the container
process as it will be when it execs any user-configured arguments?


Reply to this email directly or view it on GitHub
https://github.com/opencontainers/specs/pull/298/files#r51053200.

Copy link
Contributor Author

@wking wking Jan 27, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

fileMode, uid, and gid are optional, because mknod(2) doesn't need
them and specifies the handling when they aren't set [6,7].
Similarly, major/minor numbers are only required for S_IFCHR and
S_IFBLK [6].  I've left off wording about required runtime behavior
for unset values, because I'd rather address that with a blanket rule
[8].

For the cgroup, access is optional because the kernel docs show an
example that doesn't write an access field to the devices.deny file
[9].  The current kernel docs don't go into much detail on this
behavior (I expect unset and 'rwm' are equivalent), but if the kernel
doesn't need a value written, the spec should get out of the way and
allow users to not specify a value.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.  The cgroup link is new since 2016-01-13 [10].

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <20151005194655.GN28418@odin.tremily.us>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)
[6]: http://man7.org/linux/man-pages/man2/mknod.2.html#DESCRIPTION
[7]: https://github.com/opencontainers/specs/pull/298/files#r51053835
[8]: opencontainers#285 (comment)
[9]: https://kernel.org/doc/Documentation/cgroup-v1/devices.txt
[10]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8

Signed-off-by: W. Trevor King <wking@tremily.us>
@wking wking force-pushed the separate-device-cgroups-from-mknod branch from ddd56d8 to 7d5b027 Compare January 27, 2016 21:55
```json
"devices": [
{
"allow": false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this allow and access and how they work together. Can you explain better to me?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allow is devices.allow vs devices.deny and rwm is read, write, mknod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Jan 27, 2016 at 03:30:42PM -0800, Tõnis Tiigi wrote:

+* allow (boolean, required) - whether the entry is allowed or denied.

I think allow is devices.allow vs devices.deny and 'rwm` is
read, write, mknod.

That's right. What wording should I add to make it more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, i guess i get what would happen

@crosbymichael
Copy link
Member

LGTM

1 similar comment
@mrunalp
Copy link
Contributor

mrunalp commented Jan 28, 2016

LGTM

mrunalp pushed a commit that referenced this pull request Jan 28, 2016
runtime-config-linux: Separate mknod from cgroups
@mrunalp mrunalp merged commit 608cb7b into opencontainers:master Jan 28, 2016
@wking wking deleted the separate-device-cgroups-from-mknod branch January 28, 2016 04:09
wking added a commit to wking/nmbug-oci that referenced this pull request Jan 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants