Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsenter: correctly handle pidns orphaning #976

Closed
wants to merge 7 commits into from
Closed

nsenter: correctly handle pidns orphaning #976

wants to merge 7 commits into from

Commits on Oct 4, 2016

  1. nsenter: specify namespace type in setns()

    This avoids us from running into cases where libcontainer thinks that a
    particular namespace file is a different type, and makes it a fatal
    error rather than causing broken functionality.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    ed053a7 View commit details
    Browse the repository at this point in the history
  2. nsenter: guarantee correct user namespace ordering

    Depending on your SELinux setup, the order in which you join namespaces
    can be important. In general, user namespaces should *always* be joined
    and unshared first because then the other namespaces are correctly
    pinned and you have the right priviliges within them. This also is very
    useful for rootless containers, as well as older kernels that had
    essentially broken unshare(2) and clone(2) implementations.
    
    This also includes huge refactorings in how we spawn processes for
    complicated reasons that I don't want to get into because it will make
    me spiral into a cloud of rage. The reasoning is in the giant comment in
    clone_parent. Have fun.
    
    In addition, because we now create multiple children with CLONE_PARENT,
    we cannot wait for them to SIGCHLD us in the case of a death. Thus, we
    have to resort to having a child kindly send us their exit code before
    they die. Hopefully this all works okay, but at this point there's not
    much more than we can do.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    2cd9c31 View commit details
    Browse the repository at this point in the history
  3. nsenter: set {uid,gid} explicitly around namespace creation

    In general, it is a bad idea to be unmapped inside a user namespace at
    any point (especially when euid=[kuid 0]) as it can lead to security
    vulnerabilities. Also, in certain SELinux setups you must also be mapped
    in your user namespace when unsharing other namespaces.
    
    Deal with all of this by parsing the {uid,gid}maps and then setresuid(2)
    to the right user before and after the critical unshare(CLONE_NEWUSER)
    (as well as dealing with setns(2) by changing user to the owner of the
    namespace file we're joining).
    
    Fixes: CVE-2015-8709
    Reported-by: Andrey Vagin <avagin@virtuozzo.com>
    Reported-by: Mrunal Patel <mpatel@redhat.com>
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    c87bc5f View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2016

  1. libcontainer: system: add GetProcessState and GetProcessParent

    Similar to the already existing proc functions, these just get different
    fields from the relevant place in /proc/<pid>/stat. This is part of the
    nsenter rewrite patchset (namely the pidns-orphaning part).
    
    A nice extension to this would be to use inotify and make
    libcontainer/system/proc.go entirely chan based.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    a27fa6a View commit details
    Browse the repository at this point in the history
  2. nsenter: ensure exec'd processes are orphaned correctly

    Due to how parent processes are treated in relation to PID namespaces
    and zombie reaping[1], it is necessary for attaching (setns) processes
    to double-fork inside the PID namespace to orphan themselves. In
    addition, this also changes the PID passing code to use the PID
    namespace translation features of SCM_CREDENTIALS.
    
    [1]: https://lwn.net/Articles/532748/
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    7b5349d View commit details
    Browse the repository at this point in the history
  3. [wip] libcontainer: handle exit of exec'd processes

    Because we are no longer the parents of exec'd processes, we need to
    handle exiting and such things quite differently. There are two main
    cases we need to deal with:
    
    1. Now that the exec'd process is no longer a child of runC, we cannot
       wait4 the process. This means that we lose some information (the exit
       code, and signals when it dies), which means that we have to
       explicitly handle the death by polling /proc/<pid>/stat.
    
    2. In addition, because of how the above solution works, we also have to
       include a wait4 goroutine to clean up all of the nsenter processes
       which would be cleaned up by a simple process.Wait() but are no
       longer that simple in the new setup.
    
    All in all, I'm very sorry. I hope that people debugging this in the
    future will forgive our hubris.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    247d355 View commit details
    Browse the repository at this point in the history
  4. tests: add integration test to verify orphaning

    This ensures that we don't regress on the current setup, where we are
    correctly setting up processes using `runc exec` such that the process'
    parent is correctly reset to PID 1 inside the container.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    cyphar committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    ddc663e View commit details
    Browse the repository at this point in the history