Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

stage1: apply seccomp isolators #2753

Merged
merged 1 commit into from
Aug 2, 2016

Conversation

lucab
Copy link
Member

@lucab lucab commented Jun 7, 2016

This commit introduces support for appc seccomp isolators.
Two isolators are currently specified and implemented, a "retain-set"
mode (whitelist) and a "remove-set" one (blacklist).
If no seccomp isolators are specified, rkt will apply its own default
whitelist.


  • TODO: docs - seccomp guide

@lucab
Copy link
Member Author

lucab commented Jun 7, 2016

BTW, this tries to enable a "sane" seccomp whitelist by default. It looks like most of the testsuite is ok with that, except for some buggy interactions with non-root app (which so far I failed to trace down, thus the WIP).

@alban alban added this to the v1.8.0 milestone Jun 7, 2016
seccompIsolators++
// By appc spec, only one seccomp isolator per app is allowed
if seccompIsolators > 1 {
return nil, fmt.Errorf("too many seccomp isolators specified (%d)", seccompIsolators)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%d will always be 2 because we return from the function before incrementing to 3 or more.

@lucab lucab mentioned this pull request Jun 8, 2016
@lucab lucab modified the milestones: v1.9.0, v1.8.0 Jun 9, 2016
@lucab
Copy link
Member Author

lucab commented Jun 10, 2016

Probably reached the end of the tunnel: since we drop CAP_SYS_ADMIN, we need to have NO_NEW_PRIVS set. Systemd does that for us, but I was tricked by unversioned docs: this feature only appeared in v230.

@lucab lucab force-pushed the to-upstream/seccomp-isolators branch from 637dbd9 to 839d356 Compare June 10, 2016 11:07
"arch_prctl",
"modify_ldt",
"chroot",
"clone", // this is args-filtered by docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does args-filtered mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seccomp filters (in kernel) allows to match on syscall arguments too, but seccomp filters (in systemd) do not. The note is there as we are dropping the args-filtering part originally present in docker.

@lucab lucab force-pushed the to-upstream/seccomp-isolators branch from 839d356 to b7633d2 Compare June 10, 2016 12:35
@lucab lucab changed the title [WIP] stage1: apply seccomp isolators stage1: apply seccomp isolators Jun 10, 2016
@lucab
Copy link
Member Author

lucab commented Jun 10, 2016

Still blocked on appc, but the non-root case is fine now and whitelisting by default doesn't trip anything in the testsuite.

@grantseltzer
Copy link

Would this apply the seccomp whitelist on a per-container basis? If so perhaps functionality to load a new default seccomp configuration at runtime bootup, instead of having to load the same one per container would be more useful.

@alban
Copy link
Member

alban commented Jun 20, 2016

@grantseltzer Yes, each container can have its own seccomp whitelist. Are you concerned about the time taken to setup the seccomp, or the memory used by the seccomp program? I assume that both should be very small. Also, rkt does not have a daemon where the containers are started from, so there is noruntime bootup.

@grantseltzer
Copy link

@alban I misspoke when I said runtime, I certainly am more familiar with docker's architecture than rkt's.

My concern was more about convenience. If someone wanted to start 15 containers, all with the same seccomp whitelist, it would be easier to set a default in a central location instead of loading the whitelist each time you start a container.

I still need to familiarize myself more with Rkt's architecture but could this be accomplished at the stage 1 level?

@alban
Copy link
Member

alban commented Jun 22, 2016

@grantseltzer with this PR, the whitelist cannot be given on the command line but is specified in the image manifest (in the image tarball). Users don't have to type the whitelist 15 times when they run 15 containers. So I think it is fine for convenience.

@grantseltzer
Copy link

@alban Ah, gotcha. Thanks for clearing that up. I'll do more research before I start making suggestions next time :)

@jonboulle
Copy link
Contributor

I landed the appc/spec change; what's next with this one?

@lucab
Copy link
Member Author

lucab commented Jun 22, 2016

@jonboulle a tagged bump on appc/spec, to be serialized (perhaps after) with conflicting #2735. I'm reworking this with appc changes as we speak.

@iaguis iaguis modified the milestones: v1.11.0, v1.10.0 Jul 6, 2016
@@ -273,6 +273,36 @@ func generatePodManifest(cfg PrepareConfig, dir string) ([]byte, error) {
ra.App.Isolators = append(ra.App.Isolators, *isolator)
}

// Override seccomp isolators via --seccomp CLI switch
if seccomp := app.SeccompFilter; seccomp != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not app.SeccompFilter != ""?

@iaguis
Copy link
Member

iaguis commented Jul 7, 2016

I was doing some tests and found that trying to run journalctl resulted on SIGSYS. Investigating a bit more it seemed that getrandom() was the syscall that wasn't working even though it's in the white list.

Running rkt with --debug shows this:

[/usr/lib64/systemd/system/debian.service:39] Failed to parse system call, ignoring: chown32
[/usr/lib64/systemd/system/debian.service:39] Failed to parse system call, ignoring: copy_file_range
[/usr/lib64/systemd/system/debian.service:39] Failed to parse system call, ignoring: execveat
[/usr/lib64/systemd/system/debian.service:39] Failed to parse system call, ignoring: fadvise64_64
[/usr/lib64/systemd/system/debian.service:39] Failed to parse system call, ignoring: fchown32
[/usr/lib64/systemd/system/debian.service:39] Failed to parse system call, ignoring: fcntl64
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: fstat64
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: fstatat64
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: fstatfs64
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: ftruncate64
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getegid32
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: geteuid32
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getgid32
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getgroups32
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getrandom
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getresgid32
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getresuid32
[/usr/lib64/systemd/system/debian.service:40] Failed to parse system call, ignoring: getuid32
[/usr/lib64/systemd/system/debian.service:41] Failed to parse system call, ignoring: ipc
[/usr/lib64/systemd/system/debian.service:41] Failed to parse system call, ignoring: lchown32
[/usr/lib64/systemd/system/debian.service:41] Failed to parse system call, ignoring: _llseek
[/usr/lib64/systemd/system/debian.service:41] Failed to parse system call, ignoring: lstat64
[/usr/lib64/systemd/system/debian.service:41] Failed to parse system call, ignoring: memfd_create
[/usr/lib64/systemd/system/debian.service:41] Failed to parse system call, ignoring: mmap2
[/usr/lib64/systemd/system/debian.service:42] Failed to parse system call, ignoring: _newselect
[/usr/lib64/systemd/system/debian.service:42] Failed to parse system call, ignoring: recv
[/usr/lib64/systemd/system/debian.service:42] Failed to parse system call, ignoring: renameat2
[/usr/lib64/systemd/system/debian.service:42] Failed to parse system call, ignoring: sched_getattr
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: sched_setattr
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: seccomp
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: send
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: sendfile64
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setfsgid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setfsuid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setgid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setgroups32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setregid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setresgid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setresuid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setreuid32
[/usr/lib64/systemd/system/debian.service:43] Failed to parse system call, ignoring: setuid32
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: sigreturn
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: socketcall
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: stat64
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: statfs64
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: truncate64
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: ugetrlimit
[/usr/lib64/systemd/system/debian.service:44] Failed to parse system call, ignoring: waitpid

@lucab guessed that libseccomp is too old in coreos-stage1 and it seems to be the cause, it's 2.1.1 which was released in October 2013 (!).

@lucab
Copy link
Member Author

lucab commented Jul 7, 2016

libseccomp 2.1.1 doesn't have getrandom support, thus this failure. The situation with systemd output is even more unfortunate though, as systemd doesn't distinguish between invalid and pseudo syscalls (both returning negative values): https://github.com/systemd/systemd/blob/master/src/core/load-fragment.c#L2432

@lucab
Copy link
Member Author

lucab commented Jul 7, 2016

I asked for an updated libseccomp for stage1-coreos coreos/bugs#1451

EDIT: already there in latest alpha

@alban alban modified the milestones: v1.12.0, v1.11.0 Jul 20, 2016
@lucab
Copy link
Member Author

lucab commented Jul 20, 2016

A fix for systemd seccomp behavior also landed in v231: systemd/systemd#3701

@iaguis
Copy link
Member

iaguis commented Jul 26, 2016

Any chances of getting this for v1.12.0 or do we want to wait for v231?

@lucab
Copy link
Member Author

lucab commented Aug 2, 2016

@iaguis rebased to address your review, PTAL.

This commit introduces support for appc seccomp isolators.
Two isolatore are currently specified and implemented: a "retain-set"
mode (whitelist) and a "remove-set" one (blacklist).
If no seccomp isolator is specified, rkt will apply its own default
whitelist.
@lucab lucab force-pushed the to-upstream/seccomp-isolators branch from 7619f9f to b641f38 Compare August 2, 2016 13:07
@iaguis
Copy link
Member

iaguis commented Aug 2, 2016

I wonder if we should not pass the (pseudo)syscalls not recognized by the libseccomp we use in the coreos flavor so users don't get warning messages with --debug:

[/usr/lib64/systemd/system/etcd.service:39] Failed to parse system call, ignoring: chown32
[/usr/lib64/systemd/system/etcd.service:39] Failed to parse system call, ignoring: fadvise64_64
[/usr/lib64/systemd/system/etcd.service:39] Failed to parse system call, ignoring: fchown32
[/usr/lib64/systemd/system/etcd.service:39] Failed to parse system call, ignoring: fcntl64
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: fstat64
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: fstatat64
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: fstatfs64
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: ftruncate64
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: getegid32
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: geteuid32
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: getgid32
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: getgroups32
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: getresgid32
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: getresuid32
[/usr/lib64/systemd/system/etcd.service:40] Failed to parse system call, ignoring: getuid32
[/usr/lib64/systemd/system/etcd.service:41] Failed to parse system call, ignoring: ipc
[/usr/lib64/systemd/system/etcd.service:41] Failed to parse system call, ignoring: lchown32
[/usr/lib64/systemd/system/etcd.service:41] Failed to parse system call, ignoring: _llseek
[/usr/lib64/systemd/system/etcd.service:41] Failed to parse system call, ignoring: lstat64
[/usr/lib64/systemd/system/etcd.service:41] Failed to parse system call, ignoring: mmap2
[/usr/lib64/systemd/system/etcd.service:42] Failed to parse system call, ignoring: _newselect
[/usr/lib64/systemd/system/etcd.service:42] Failed to parse system call, ignoring: recv
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: send
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: sendfile64
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setfsgid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setfsuid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setgid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setgroups32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setregid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setresgid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setresuid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setreuid32
[/usr/lib64/systemd/system/etcd.service:43] Failed to parse system call, ignoring: setuid32
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: sigreturn
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: socketcall
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: stat64
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: statfs64
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: truncate64
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: ugetrlimit
[/usr/lib64/systemd/system/etcd.service:44] Failed to parse system call, ignoring: waitpid

@iaguis
Copy link
Member

iaguis commented Aug 2, 2016

One question, LGTM otherwise.

Do we wanna include the docs here or in a new PR?

@lucab
Copy link
Member Author

lucab commented Aug 2, 2016

The debug noise should go away with the systemd patch above (ie. once we get v231), so I'd prefer not to touch the default set.

Regarding doc, I'll queue up a separate PR.

@iaguis
Copy link
Member

iaguis commented Aug 2, 2016

The debug noise should go away with the systemd patch above (ie. once we get v231), so I'd prefer not to touch the default set.

Fair enough

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants