Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add '-p' flag to get signalled when parent dies #114

Merged
merged 3 commits into from
Apr 21, 2018
Merged

Add '-p' flag to get signalled when parent dies #114

merged 3 commits into from
Apr 21, 2018

Conversation

pks-t
Copy link
Contributor

@pks-t pks-t commented Apr 8, 2018

Add a new flag '-p', which sets up the parent death signal to SIGKILL.
This will cause the kernel to send us a SIGKILL as soon as the direct
parent process dies. This is useful e.g. in combination with unshare(1)
from util-linux when using PID namespaces. When unshare forks the child,
which is about to become PID 1, killing the unshare parent will not
cause the child to exit. When executing the command

$ unshare --pid --fork tini -- <prog>

then killing unshare will not cause tini to be killed. Since util-linux
v2.32, unshare has an option "--kill-child=" that will set up
the parent death signal for the forked process. This does not help
though in case either SELinux or AppArmor are in use and credentials of
the forked process change (e.g. by changing its UID), as these LSMs will
clear the parent death signal again. The following example would trigger
that situation:

$ unshare --pid --fork setpriv --reuid user tini -s -- <prog>

The parent death signal will get reset by the LSMs as soon as setpriv
switchets its user ID to that of "user", and killing unshare will again
not result in tini being killed. The new '-p' flag helps that exact
scenario:

$ unshare --pid --fork setpriv --reuid user tini -s -p -- <prog>

As soon as unshare is getting killed, tini will get signalled SIGKILL
and exit as well, tearing down with it.


For now, this is only implemented as a flag. Intent is to first get to know whether the project is interested in that feature at all. If so and if you think the signal that is being sent by the kernel should be configurable, I can extend it further to accept an optional signal

@krallin
Copy link
Owner

krallin commented Apr 16, 2018

Does this need to be included in Tini to work?

In particular, would the following not work:

unshare --pid --fork setpriv --reuid user -- setdeathsignal -- tini -s -p -- <prog>

(where setdeathsignal is a hypothetical program that calls prctl(PR_SET_PDEATHSIG, SIGKILL) then exec's argv?)

@krallin
Copy link
Owner

krallin commented Apr 16, 2018

Ah, it looks like that might actually be the approach you're following there: util-linux/util-linux@23f54ce

(as an aside: sorry about the delay in getting back to you here, I was out on vacation this week)

@pks-t
Copy link
Contributor Author

pks-t commented Apr 16, 2018

No worries, it's fine.

The thing is, it's not really possible to implement this as part of another program here in all situations. While it is possible to use the approach I implement in setpriv(1) from util-linux in some situations, it gets unusable as soon as AppArmor or SELinux LSMs get involved. Switching profiles, in this case from setpriv to the executed program, will cause the parent death signal to be reset again when those LSMs are in use. So it's not a full solution to the problem I'm trying to solve (at least not as far as I know).

That being said, take this PR with a grain of salt. I could totally relate to you not wanting to accept that PR -- after all, it's debatable whether killing the init process should be part of an init manager itself. I'd obviously be happy to have it accepted as it solves a real issue I have, but yeah.

@krallin
Copy link
Owner

krallin commented Apr 17, 2018

Switching profiles, in this case from setpriv to the executed program, will cause the parent death signal to be reset again when those LSMs are in use. So it's not a full solution to the problem I'm trying to solve (at least not as far as I know).

So, just to be clear: what I'm suggesting above is that you can set the parent death signal after you drop privileges using setpriv, using a separate program. That program can then spawn Tini.

In other words, rather than have Tini set the parent death signal, add another process in your "call chain" that will do it, then exec Tini.

E.g. use this:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/prctl.h>

int main(int argc, char *argv[]) {
	if (prctl(PR_SET_PDEATHSIG, SIGTERM) != 0) {
		perror("prctl");
		return 1;
	}

	execvp(argv[1], &argv[1]);

	perror("execvp");
	return 1;
}

Then, if I run:

gcc setdeathsignal.c -o setdeathsignal &&  \
  sudo unshare --pid --fork -- \
  setpriv --reuid thomas -- \
  ./setdeathsignal \
  /tmp/tini-build/tini -vvv -s -- sleep 1000

Then, as expected, if I send a SIGKILL to the unshare process, Tini does receive a SIGTERM:

[DEBUG tini (1)] Passing signal: 'Terminated'

As long as the death signal is not discarded across the exec boundary from setdeathsignal to tini, this should work. So, from what you're describing, it sounds like as long as setdeathsignal and tini have the same LSM profile (?), this should work... but let me know if I'm misunderstanding here!

That being said, take this PR with a grain of salt. I could totally relate to you not wanting to accept that PR

My rule of thumb when reviewing a PR is generally to consider whether the thing being added can be done without Tini doing it, to ensure we keep Tini lean. For example, I wouldn't add switching users to Tini, because you can fairly easily chain Tini with su-exec, gosu, setpriv, but I did merge process-group-killing, because you can't really do this without doing signal forwarding and orphan reaping.

In this particular case... I don't really know the answer yet 😮, as I'm not super familiar with LSMs and when they might reset the parent death signal. If you have any reading to share on this, I'd be interested.

@krallin
Copy link
Owner

krallin commented Apr 17, 2018

Ok, so I did test this with AppArmor, and it works as long as setdeathsignal and tini use the same profile, using "inherit execute" in setdeathsignal's AppArmor profile. Glancing at the equivalent code for SELinux, it appears some similar mechanism might work (but I can't really test this right now).

I'm a little on the fence as to what the right approach here... My intuition is that this flag is only going to be relevant if you have a LSM (because if you don't have a LSM, you might as well use the program I pasted above), but I'm not entirely certain how complex (or feasible) implementing the AppArmor profile I described above is (or doing the same with SELinux).

@pks-t
Copy link
Contributor Author

pks-t commented Apr 17, 2018

Unfortunately, I can't really share the observation you've made with "inherit execute" retaining the parent death signal.

Simplest case with setpriv built from master, which includes my recently accepted "--pdeathsig" option:

$ setpriv --pdeathsig KILL setpriv --dump
uid: 0
euid: 0
gid: 0
egid: 0
Supplementary groups: 0,0,1,2,3,4,6,10,11,20,26,27
no_new_privs: 0
Inheritable capabilities: [none]
Ambient capabilities: [none]
Capability bounding set: chown,dac_override,dac_read_search,fowner,fsetid,kill,setgid,setuid,setpcap,linux_immutable,net_bind_service,net_broadcast,net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot,sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource,sys_time,sys_tty_config,mknod,lease,audit_write,audit_control,setfcap,mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read
Securebits: [none]
Parent death signal: [none]
AppArmor profile: /usr/bin/setpriv (enforce)

I've had the same results with all kinds of different profiles -- inherit execute is trivial to use here, as setpriv always uses the same profile. But still, the pdeathsig got reset. The profile I've basically used (trimmed down a bit to remove all kinds of capabilities, also no abstractions included):

#include <tunables/global>

/usr/bin/setpriv {
    #include <abstractions/base>
    #include <abstractions/users>
    #include <abstractions/terminal>

    /usr/bin/setpriv mr,

    # Inherent capabilities required by setpriv
    capability setpcap,
    capability setgid,
    capability setuid,

    # Capabilities granted to others
    capability chown,
   ...

    @{BINARIES} px,

    owner /proc/@{pid}/status       r,
    owner /proc/@{pid}/attr/current r,
    owner /proc/@{pid}/attr/exec    rw,
}

I've also tried to use setpriv's "--apparmor-profile /usr/bin/setpriv" with explicit "change_profile" transitions in different variations, but the pdeathsig always got reset.

I can't share any readings, as I didn't find any conclusive documentation on this. Some observations:

  • My newly added "--pdeathsig" option gets processed after all credential changes, just before the execve call. So no more credential changes happen, and by using --apparmor-profile, I'd have hoped to avoid the reset of pdeathsig due to the changing profile. So it should've worked like your described new program (but doesn't).

  • One place where the current processes pdeath signal gets reset is in apparmor_bprm_committing_creds, which is being done unconditionally whenever the processes credentials change. So nothing we can do here except trying to change all credentials before applying the pdeathsig, which is what I already tried to do

  • The second relevant place in the kernel is fs/exec.c:setup_new_exec, which resets the processes pdeathsig in case we want to do a "safe" execution. I honestly don't totally get when an excution is being considered "safe", but I think that is the place which resets the pdeathsig in my example above. I tried to use the "unsafe" label for execs, but it didn't help (except for one time where I thought it did work, but half an hour later it stopped working)

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

I think I've found the real issue here why code behaves differently for us. I've been doing these tests as root, which will additionally clear the parent death signal on execve in case LSMs are active:

  • security/commoncap.c:cap_bprm_set_creds will set bprm->cap_elevated = 1 in case you're using a setuid/setgid binary, are root or have some capabilities active.
  • fs/exec.c:setup_new_exec executes bprm->secureexec |= bprm->cap_elevated, so securexec is true in case one of the above conditions is true. Later, the same function sets current->pdeath_signal = 0 in case bprm->secureexec != 0.

So setting up the parent death signal cannot work in case you want to inherit it across an exec as soon as you're running with elevated privileges.

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

Thanks for digging into this @pks-t, that's helpful.

On the one hand, I feel like the Kernel cleaning out this flag systematically might not be 100% desirable (if your parent has permission to send you a signal, then I'd expect that their death has permission to send you that signal as well), since it essentially means that every program in existence has to have a "set the parent death signal" flag... which means I don't know if this really is something that should be fixed in Tini.

On the other hand, I understand there might be reasons why it's that way in the Kernel that I don't understand. Besides, if we add the "set the parent death signal" flag in Tini, then that essentially creates a new use case for Tini: you can use it to set the parent death signal when the Kernel won't let you, even if you're not going to use e.g. the zombie reaping "core" functionality (i.e. what you really want from Tini in this case is setting the parent death signal + forwarding said signal).

I'd be curious to better understand your use case here, but I think I'd be fine adding this flag in Tini. I'd like it to be configurable (at a minimum, with a signal number), though, since SIGKILL is probably not super helpful for cleanup (and it can't be forwarded!). I'm happy to actually work on that piece though, but let me know what you think!

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Yeah, I totally get and share your concerns here. The parent death signal is broken for a lot of use cases, with the reason being security. Accepting signals across process hierarchies can be a security risk in some scenarios, e.g. by causing your childs to dump core, having additional side channels across security boundaries, etc.

My use case here is rather simple: I use runit to manage system services and want to use PID namespaces to "containerize" all my services. My initial attempt was to simply use "setpriv --fork --pid", but this doesn't really work out in case I want to restart the service. Stopping a service in runit will simply cause runit to send SIGKILL to the immediate child, and as setpriv does not forward that signal to the process group the actual service stays running. So tini does have two purposes here in my use case: first, correctly handle child reaping for services in the PID namespace, and secondly forwarding the signal to the process group when runit sends SIGKILL.

So in case you want to accept this PR I'd be happy to further extend it by making the signal configurable, no problem at all. I'd have done so right from the beginning, but I was missing the mapping from signal name to its value (e.g. SIGKILL -> 9) to correctly parse the command line. Sure enough we could ask users to just use the actual value instead of name, but that's an awkward interface. I could use strsignal to loop over valid signal numbers to avoid hardcoding the array, though. Is there any way you'd prefer?

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Sorry, I obviously mean "nsenter --fork --pid", not "setpriv"

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

Yeah, I totally get and share your concerns here. The parent death signal is broken for a lot of use cases, with the reason being security. Accepting signals across process hierarchies can be a security risk in some scenarios, e.g. by causing your childs to dump core, having additional side channels across security boundaries, etc.

Oh I absolutely agree, but my comment here is mainly that if the parent can already signal you anyway (which is the reason why Tini would be able to forward a signal to you), then I'm not sure there is anything security gain by preventing the parent's death from signalling you.

In case there is actually elevation of privilege happening at exec, e.g. a suid binary then I agree that the parent death signal should absolutely be dropped (but in this case, the parent can't signal you either).

My use case here is rather simple: ...

Thanks for this valuable detail!

I could use strsignal to loop over valid signal numbers to avoid hardcoding the array, though. Is there any way you'd prefer?

I don't think that'd work. This returns a description of the signal, not the actual "symbolic" name. For example, strsignal(9) returns "Killed" (and I bet that's localized), not "SIGKILL" or "KILL". I think we need our own array of names and the associated signal numbers to make this work here, and then we can iterate over that (we may want to also accept signal numbers).

I can think we can stick with a basic list of signals here:

       Signal     Value     Action   Comment
       ──────────────────────────────────────────────────────────────────────
       SIGHUP        1       Term    Hangup detected on controlling terminal
                                     or death of controlling process
       SIGINT        2       Term    Interrupt from keyboard
       SIGQUIT       3       Core    Quit from keyboard
       SIGILL        4       Core    Illegal Instruction
       SIGABRT       6       Core    Abort signal from abort(3)
       SIGFPE        8       Core    Floating-point exception
       SIGKILL       9       Term    Kill signal
       SIGSEGV      11       Core    Invalid memory reference
       SIGPIPE      13       Term    Broken pipe: write to pipe with no
                                     readers; see pipe(7)
       SIGALRM      14       Term    Timer signal from alarm(2)
       SIGTERM      15       Term    Termination signal
       SIGUSR1   30,10,16    Term    User-defined signal 1
       SIGUSR2   31,12,17    Term    User-defined signal 2
       SIGCHLD   20,17,18    Ign     Child stopped or terminated
       SIGCONT   19,18,25    Cont    Continue if stopped
       SIGSTOP   17,19,23    Stop    Stop process
       SIGTSTP   18,20,24    Stop    Stop typed at terminal
       SIGTTIN   21,21,26    Stop    Terminal input for background process
       SIGTTOU   22,22,27    Stop    Terminal output for background process

src/tini.c Outdated
"SIGWINCH",
"SIGIO",
"SIGPWR",
"SIGSYS",
Copy link
Owner

@krallin krallin Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little more verbose... but I'd very much prefer having an array that doesn't actually depend on ordering to be safer. In other words, I'd rather have this be an array of [signame, signum] structs, rather than have the signum be implicitly provided by the string's position in the array.

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

I've pushed a fixup commit (which should be squashed first, but this makes review easier).

We should also understand signal names from POSIX.1-2001, I'd say, where the list already comes nearly complete. As there are only three non-POSIX calls here between 1 and 31, I just opted to have the complete list here as it makes the implementation simpler. I can remove those additional signals though, if you want.

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

Thanks @pks-t - I left a comment on there. I understand the simplicity of including all signals... but as commented there, I'm not a huge fan of implicitly providing the signal number through the position in the array — I'd be a little more comfortable with the name / number pairs being structs.

Notably, that approach also lets us add aliases, such that we could have both KILL and SIGKILL point to 9 😄

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Fair enough. Amended to use an array. For "KILL" vs "SIGKILL" -- I actually wouldn't handle that by adding more entries to the array, but we could just skip the "SIG" prefix and compare then. Otherwise, we'd have a minimum of two entries for each signal.

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

For "KILL" vs "SIGKILL" -- I actually wouldn't handle that by adding more entries to the array, but we could just skip the "SIG" prefix and compare then. Otherwise, we'd have a minimum of two entries for each signal.

Fair enough, and thanks! I think that, as is, this will do!

We probably want to add tests for this. The test suite is a little messy — would you like me to handle that part?

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Yeah, just wanted to query you about the test. Writing a test is easy enough, but writing the right test here is what I'm not sure about yet. We'd need to spawn tini as child of another process which by itself wouldn't kill tini if it got killed and then test that tini still gets the configured signal. Any ideas?

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

The tests right now spawn a bunch of Python processes that do things, I think we could essentially have the test script spawn a few of those like so:

  • A python process that spawns Tini with the following arguments: tini -p SIGUSR1 -- some other_python_process MYPID, where MYPID is that process's PID.
  • Tini that spawns that other_python_process.
  • Upon spawing, other_python_process sends a SIGKILL to the PID passed as argument.
  • Then we just need to check that other_python_process in the chain got a SIGUSR1 (we can even just have that process wait with a timeout and exit with 1 if it doesn't get SIGTERM, or exit with 0 when it does get it).

Thoughts?

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Ah, I only saw the smoke tests -- the actual tests do in fact look a bit messy. So feel free to pick writing tests up in case you can get something done quickly, otherwise I'll do so. Just let me know.

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

Ah, another small request: can you add this flag to the help:

tini/src/tini.c

Lines 190 to 226 in 5b117de

void print_usage(char* const name, FILE* const file) {
fprintf(file, "%s (%s)\n", basename(name), TINI_VERSION_STRING);
#if TINI_MINIMAL
fprintf(file, "Usage: %s PROGRAM [ARGS] | --version\n\n", basename(name));
#else
fprintf(file, "Usage: %s [OPTIONS] PROGRAM -- [ARGS] | --version\n\n", basename(name));
#endif
fprintf(file, "Execute a program under the supervision of a valid init process (%s)\n\n", basename(name));
fprintf(file, "Command line options:\n\n");
fprintf(file, " --version: Show version and exit.\n");
#if TINI_MINIMAL
#else
fprintf(file, " -h: Show this help message and exit.\n");
#if HAS_SUBREAPER
fprintf(file, " -s: Register as a process subreaper (requires Linux >= 3.4).\n");
#endif
fprintf(file, " -v: Generate more verbose output. Repeat up to 3 times.\n");
fprintf(file, " -w: Print a warning when processes are getting reaped.\n");
fprintf(file, " -g: Send signals to the child's process group.\n");
fprintf(file, " -e EXIT_CODE: Remap EXIT_CODE (from 0 to 255) to 0.\n");
fprintf(file, " -l: Show license and exit.\n");
#endif
fprintf(file, "\n");
fprintf(file, "Environment variables:\n\n");
#if HAS_SUBREAPER
fprintf(file, " %s: Register as a process subreaper (requires Linux >= 3.4)\n", SUBREAPER_ENV_VAR);
#endif
fprintf(file, " %s: Set the verbosity level (default: %d)\n", VERBOSITY_ENV_VAR, DEFAULT_VERBOSITY);
fprintf(file, "\n");
}

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

Nevermind, I'm an idiot, you already did!

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

We should probably add an example in the help to clarify that we expect e.g. SIGTERM or SIGINT as opposed to TERM or INT.

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

What about a simple wording like that:

diff --git a/src/tini.c b/src/tini.c
index fd72240..3753867 100644
--- a/src/tini.c
+++ b/src/tini.c
@@ -242,7 +242,7 @@ void print_usage(char* const name, FILE* const file) {
 #if HAS_SUBREAPER
 	fprintf(file, "  -s: Register as a process subreaper (requires Linux >= 3.4).\n");
 #endif
-	fprintf(file, "  -p SIGNAL: Trigger SIGNAL when parent dies.\n");
+	fprintf(file, "  -p SIGNAL: Trigger SIGNAL when parent dies, e.g. \"-p SIGKILL\".\n");
 	fprintf(file, "  -v: Generate more verbose output. Repeat up to 3 times.\n");
 	fprintf(file, "  -w: Print a warning when processes are getting reaped.\n");
 	fprintf(file, "  -g: Send signals to the child's process group.\n");

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

@pks-t I pushed tests to your branch.

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

@pks-t yes that's perfect

pks-t and others added 2 commits April 19, 2018 16:25
Add a new flag '-p', which sets up the parent death signal to `SIGKILL`.
This will cause the kernel to send us a `SIGKILL` as soon as the direct
parent process dies. This is useful e.g. in combination with unshare(1)
from util-linux when using PID namespaces. When unshare forks the child,
which is about to become PID 1, killing the unshare parent will not
cause the child to exit. When executing the command

    $ unshare --pid --fork tini -- <prog>

then killing unshare will not cause tini to be killed. Since util-linux
v2.32, unshare has an option "--kill-child=<SIGNAL>" that will set up
the parent death signal for the forked process. This does not help
though in case either SELinux or AppArmor are in use and credentials of
the forked process change (e.g. by changing its UID), as these LSMs will
clear the parent death signal again. The following example would trigger
that situation:

    $ unshare --pid --fork setpriv --reuid user tini -s -- <prog>

The parent death signal will get reset by the LSMs as soon as `setpriv`
switchets its user ID to that of "user", and killing unshare will again
not result in tini being killed. The new '-p' flag helps that exact
scenario:

    $ unshare --pid --fork setpriv --reuid user tini -s -p SIGKILL -- <prog>

As soon as unshare is getting killed, tini will get signalled SIGKILL
and exit as well, tearing down <prog> with it.
@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Thanks a lot for these tests! I've added the example and squashed in the fixup commits

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

Thanks! Finally, we should update the README. I'll push another commit to your branch if that works?

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

Sure, feel free to do so. Just remember to fetch first from my updated branch :)

@krallin
Copy link
Owner

krallin commented Apr 19, 2018

@pks-t updated the README. Let me know if you see anything off there (notably, I added you to the contributors list, but let me know if you don't want that).

Thanks!

@pks-t
Copy link
Contributor Author

pks-t commented Apr 19, 2018

I'm fine with being listed in the contributors list. The updated README looks fine to me, as well!

@krallin krallin merged commit e972ea5 into krallin:master Apr 21, 2018
krallin added a commit that referenced this pull request Apr 21, 2018
- Configure pdeathsignal (#114)
- Environment variable for -g (#101)
@krallin
Copy link
Owner

krallin commented Apr 21, 2018

@pks-t I released this today

@pks-t pks-t deleted the pks/pdeathsignal branch April 21, 2018 13:37
@pks-t
Copy link
Contributor Author

pks-t commented Apr 21, 2018

Thanks a lot for this discussion and collaboration!

solarkennedy pushed a commit to Netflix-Skunkworks/tini that referenced this pull request Sep 15, 2020
solarkennedy pushed a commit to Netflix-Skunkworks/tini that referenced this pull request Sep 15, 2020
- Configure pdeathsignal (krallin#114)
- Environment variable for -g (krallin#101)
@Dstoney
Copy link

Dstoney commented Dec 16, 2022

ENV TINI_VERSION v0.17.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini.asc /tini.asc
RUN gpg --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys bc1q29ntmxnjans44pxq482zdcqxchyehta3afrf6h
&& gpg --verify /tini.asc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants