Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New version of the overlay, designed together with Gorka #10

Merged
merged 5 commits into from
Mar 11, 2015
Merged

New version of the overlay, designed together with Gorka #10

merged 5 commits into from
Mar 11, 2015

Conversation

skarnet
Copy link
Contributor

@skarnet skarnet commented Mar 11, 2015

Here it is, as convened.

skarnet added 5 commits March 7, 2015 10:24
 - Dependencies: latest execline/s6/s6-portable-utils at this time
 - Interactive CMDs should now work properly with or without terminals. If not, play with s6-setsid a bit more.
 - The scandir is now in /var/run/s6/service, because it gets written into at run-time
 - User service directories are now listed in /etc/services.d and will be *copied* to /var/run/s6/service during init-stage2
 - Yeah, I really hate writing to /, and you should hate it, too.
 - Init order:
   * our own services in /etc/s6/service are copied to /var/run/s6/service (stage 1)
   * s6-svscan is run
   * scripts in /etc/cont-init.d are run (stage 2)
   * services in /etc/services.d are copied to /var/run/s6/service and picked up by s6-svscan
   * If a CMD was given, it is run.
   i.e. the whole stage 2 is executed whether a CMD was given or not, so CMDs are always executed in a properly initted environment.
 - Since all the user stuff is started in stage 2, we can guarantee all the dependencies for stage 1, so no fix-attrs in stage 1. Stage 1 is now extremely short, as it should be.
 - Can't escape the fix-attrs in stage 2 though, but provided a chunk of execline to do the functionality. (Probably very slow.)
 - Plus a bunch of tiny fixes, mostly cosmetic.
 - modify init-stage2 to use it properly, by reading everything in /etc/fix-attrs.d
 - the original fix descriptions are now a file in /etc/fix-attrs.d
 - minor fixes/changes to init-stage2
 - make /usr/bin/with-* executable (!)
jprjr added a commit that referenced this pull request Mar 11, 2015
New version of the overlay, designed together with Gorka
@jprjr jprjr merged commit 003d994 into just-containers:master Mar 11, 2015
@glerchundi
Copy link
Member

thanks!

@jprjr
Copy link
Member

jprjr commented Mar 11, 2015

Just wanted to comment on how awesome this is! Can't wait to start trying it out in my containers

@glerchundi
Copy link
Member

yes it is, i participated just being the executor arm! great work Laurent.

@dreamcat4
Copy link

@skarnet Also very grateful. Thank you +1.

Once a new tarball is built, I shall test again previous SIGINT issue to see if it is gone. As that part is different now in your new code. Sorry I don't understand exactly how it works. I will also take a simple measurement of the extra startup initialisation delay, incurred by using s6. Which we can compare to 'not any s6', and also our previous version (Gorka's old fixattrs way).

Again, many thanks.

@dreamcat4
Copy link

Hmm. With this new version I get an error that didn't happen before:

time docker run dreamcat4/s6test
if: fatal: child crashed with signal 13

!!!!!
 init-stage2 failed.
 Logs are in /var/log/s6-uncaught-logs
!!!!!

# there are no logs
id@emachines-e520:~/docker-images$ docker exec -it trusting_ardinghelli ls -lsa /var/log/s6-uncaught-logs/
total 8
     4 drwx--S---    2 65534    65534         4096 Mar 11 12:29 .
     4 drwxr-xr-x    4 root     root          4096 Mar 11 18:26 ..

Dockerfile: [EDIT] Show missing lines that were already in Dockefile.

FROM busybox:ubuntu-14.04
ADD https://github.com/glerchundi/container-s6-overlay-builder/releases/download/v1.7.0/s6-overlay-1.7.0-linux-amd64.tar.gz /s6-overlay.tar.gz
RUN gunzip -c /s6-overlay.tar.gz | tar -xf - -C / && rm /s6-overlay.tar.gz
ADD myprogram.sh /myprogram.sh
RUN chmod +x     /myprogram.sh
ENTRYPOINT ["/init","/myprogram.sh"]
CMD ["arg1","arg2","arg3"]

Where the myprogram.sh test script currently looks like this:

#!/bin/sh
exit 0

It just's exit 0 immediately. Nothing else.

@glerchundi
Copy link
Member

it seems like u need to include myprogram.sh inside your container, and give execution perms.

@dreamcat4
Copy link

@glerchundi Ah sorry. I didn't copy-paste here fully my Dockerfile - it actually already had the missing statements to begin with.

ADD myprogram.sh /myprogram.sh
RUN chmod +x     /myprogram.sh

... so the error message must be for some other reason.

@glerchundi
Copy link
Member

thanks for pointing this out @dreamcat4.

This is a bug, in busybox nobody/nogroup use 99/99 as uid/gid instead of using the common ones which are 65534 for both (http://www.linfo.org/uid.html).

@skarnet we need to fix s6-uncaught-logs and s6-folderd log directories in runtime :(

Although, this doesn't happen with gliderlabs/alpine.

@jprjr
Copy link
Member

jprjr commented Mar 11, 2015

I'm not so sure it's the UID and GID numbers. It looks like dreamcat's image is missing the nobody and daemon users entirely.

After adding them, I'm still getting a crash. As far as I can tell, I think s6-echo is being killed due to a broken pipe which causes the crash.

Here's the /etc/passwd file I made:

root:x:0:0:root:/:/bin/sh
nobody:x:1:1:nogroup:/:/bin/sh
daemon:x:2:2:daemon:/:/bin/sh

And the /etc/group file I made:

root:x:0:
nogroup:x:1:
daemon:x:2:

Here's some output from strace. These process IDs are high because I've been running this a couple of times:

591   writev(1, [{"[fix-attrs.d] applying owners & "..., 53}, {NULL, 0}], 2) = -1 EPIPE (Broken pipe)
591   --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=591, si_uid=0} ---
591   +++ killed by SIGPIPE +++
589   <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0, NULL) = 591
589   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=591, si_status=SIGPIPE, si_utime=0, si_stime=0} ---
589   writev(2, [{"if: fatal: child crashed with si"..., 40}, {NULL, 0}], 2) = -1 EPIPE (Broken pipe)
589   --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=589, si_uid=0} ---
589   +++ killed by SIGPIPE +++
587   <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0, NULL) = 589
587   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=589, si_status=SIGPIPE, si_utime=0, si_stime=0} ---
587   writev(2, [{"if: fatal: child crashed with si"..., 40}, {NULL, 0}], 2) = 40
587   exit_group(141)                   = ?
587   +++ exited with 141 +++
584   <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 141}], 0, NULL) = 587
584   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=587, si_status=141, si_utime=0, si_stime=0} ---
584   execve("/usr/bin/s6-echo", ["s6-echo", "--", "\n!!!!!\n init-stage2 failed.\n Log"...], ["PATH=/usr/bin:/usr/sbin:/bin:/sb"...]) =
584   writev(1, [{"\n!!!!!\n init-stage2 failed.\n Log"..., 73}, {NULL, 0}], 2) = 73
584   exit_group(0)                     = ?

@jprjr
Copy link
Member

jprjr commented Mar 11, 2015

Yeah, it looks like that's the problem. I just tried commenting out all the s6-echos from init-stage2 and it seemed to work

@dreamcat4
Copy link

I try Johns temporary disablement - to allow other testing in meantime.

FYI, the seem to be quite a few s6 processes running at time of the crash:

$ docker exec -it  trusting_ardinghelli ps -a
PID   USER     COMMAND
    1 root     s6-svscan -t0 /var/run/s6/service
   21 root     s6-supervise s6-fdholderd/log
   22 root     s6-supervise s6-fdholderd
   23 root     s6-supervise s6-svscan-log
12341 root     s6-envuidgid nobody s6-applyuidgid -Uz -- exec -c s6-log -bp -- t /var/log/s6-uncaught-logs
12342 root     s6-envuidgid daemon s6-fdholder-daemon -U -1 -i rules -- /var/run/s6/fdholderd-socket
12344 root     s6-notifywhenup s6-envuidgid daemon s6-fdholder-daemon -U -1 -i rules -- /var/run/s6/fdholderd-socket
12345 root     s6-envuidgid nobody s6-applyuidgid -Uz -- exec -c -- s6-log -- t /var/log/s6-fdholderd
12356 root     ps -a

I'm not saying that isn't normal / correct / justified. It just seems like quite a few things are going on.

@dreamcat4
Copy link

OK. Like @jprjr said, I comment out all of the lines containing s6-echo, just in the file init-stage2. And re-run again:

time docker run dreamcat4/s6test
if: fatal: child crashed with signal 11


real    0m9.966s
user    0m0.020s
sys 0m0.004s
id@emachines-e520:~/docker-images$ 

@skarnet
Copy link
Contributor Author

skarnet commented Mar 12, 2015

I really dislike the idea of having to run fix-attrs in stage 1. Stage 1 should be absolutely minimal and independent from everything; if it depends on stuff the user can mount and modify, that's not right.

The catch-all logging directory is not supposed to be sticky: those logs are not supposed to live from one invocation to another. In a "normal" system, I mount a tmpfs in stage 1, and put the catch-all logging directory in it.

I guess the solution for Docker is to recreate /var/log/s6-uncaught-logs for every invocation, in stage 1.
And document that the logs there are temporary and will not survive the next start of the container.

  • remove the /var/log/s6-uncaught-logs directory from the overlay
  • in /etc/s6/init/init-stage1, right after (or before) creating /var/run/s6/uncaught-logs-fifo: add

if { s6-rmrf /var/log/s6-uncaught-logs }
if { s6-mkdir -p /var/log/s6-uncaught-logs }
if { s6-envuidgid nobody s6-chown -U /var/log/s6-uncaught-logs }
if { s6-chmod 2700 /var/log/s6-uncaught-logs }

If there's no guaranteed user "nobody" in the image's /etc/passwd, then find another user to run the logger as. (Same with "daemon" for s6-fdholderd.)
If there's no guaranteed user at all in the image, then screw it and run things as root.

As for the s6-fdholderd log, it can go entirely. s6-fdholderd is barely logging anything, so it can go into the catch-all logs without trouble.

  • remove the /var/log/s6-fdholderd and /etc/s6/service/s6-fdholderd/log directories from the overlay.

(edit: typo)

@glerchundi
Copy link
Member

I agree with you, i'm already modifying this so expect a new version which solves @dreamcat4 issues soon.

@glerchundi
Copy link
Member

@dreamcat4 can you test against v1.7.1?

it should fix this issue and have only these running processes:

root         1  1.2  0.0    192     4 ?        Ss+  11:26   0:00 s6-svscan -t0 /var/run/s6/service
root        15  0.0  0.0    196     4 ?        S+   11:26   0:00 s6-supervise s6-svscan-log
root        16  0.0  0.0    196     4 ?        S+   11:26   0:00 s6-supervise s6-fdholderd
nobody      17  0.0  0.0    256     4 ?        Ss   11:26   0:00 s6-log -bp -- t /var/log/s6-uncaught-logs

thanks.

@dreamcat4
Copy link

@glerchundi i'm not entirely sure if is OK yet (your 1.7.1). Because there's a new error message now:

time docker run dreamcat4/s6test
s6-envuidgid: fatal: unknown user: nobody

real    0m1.957s
user    0m0.016s
sys 0m0.008s
id@emachines-e520:~/docker-images$ 

@skarnet
Copy link
Contributor Author

skarnet commented Mar 12, 2015

Honestly, images without even a "nobody" user should be fixed.
Unix is all about users. Assuming everything in a container will run as a single user is unnecessarily restrictive, and a major security flaw.

Is there a way we can modify existing files from an overlay ? As in, I'd like to add lines in /etc/passwd to define s6-specific users, but without clobbering the existing image's /etc/passwd.

@dreamcat4
Copy link

Well is there any reason we can't just fail over to be using a hardcoded uid number of the user nobody... which seems to be 65534 on ubuntu and alpine linux? I mean, if user nobody was not assigned a uid number which is different to 65534.

It was my understanding that you don't necessarily need usernames in unix - as they end up just being translated into int numbers anyway.

@jprjr
Copy link
Member

jprjr commented Mar 12, 2015

I wouldn't recommend using a uid if you can't find the user.

What's weird is the official buildroot-based busybox image has those users.

root:x:0:0:root:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:100:sync:/bin:/bin/sync
mail:x:8:8:mail:/var/spool/mail:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
operator:x:37:37:Operator:/var:/bin/sh
haldaemon:x:68:68:hald:/:/bin/sh
dbus:x:81:81:dbus:/var/run/dbus:/bin/sh
ftp:x:83:83:ftp:/home/ftp:/bin/sh
nobody:x:99:99:nobody:/home:/bin/sh
sshd:x:103:99:Operator:/var:/bin/sh
default:x:1000:1000:Default non-root user:/home/default:/bin/sh

I think the right answer here is to open a bug against busybox:ubuntu-14.04 about missing important users.

@dreamcat4
Copy link

@jprjr - Yes no doubt we should tell the busybox image maintainers about that issue in their image.

However the underlying probelm still remains, if we are not fixing this ourselves - and even if busybox fix their image. Because some people want to build from scratch and run their own pre-compiled binary without anything else in there. Which may mean no /etc/passwd file or anything else that can be relied upon.

Indeed. The whole point of why I am using the busybox image to test s6 with is because it is missing so many expected things. To try and show these things up and give us a chance to find errors like this as early as we possibly can. It goes back to the idea of making sure the overlay does not depend on anything in the distro (and brings all tools, everything it needs with it).

I try to solve with just minor changes - replacing if {nobody} with ifelse {nobody} {} {uid} programs statement in the execline script.

If the first command fails that means the specified user name does not exist - so it would be most often likely be safe to use inside a docker image which was already built. Because users are usually added at build time, rather than run-time (although we cannot be sure of that). However using a uid of 65534 is one of the highest numbers available. Which keeps it in an informally 'unsafe' range of uid numbers for users who should know better not to use for themselves.

So the code I tried to fix this issue with now look like this:

/home/id/docker-images/s6test/s6/etc/s6/init/init-stage1:
   34: ifelse { s6-envuidgid nobody s6-chown -U /var/log/s6-uncaught-logs } {}
   35         { s6-chown -u 65534 -g 65534 /var/log/s6-uncaught-logs }

But it still crash, with message:

time docker run dreamcat4/s6test
s6-envuidgid: fatal: unknown user: nobody
if: fatal: child crashed with signal 13

!!!!!
 init-stage2 failed.
 Logs are in /var/log/s6-uncaught-logs
!!!!!

Sorry - I don't understand why it do that. If the 1st {} fail shouldn't it just excute the 3nd {} else block? Like cmd1 && no-op || cmd2 ?

@jprjr
Copy link
Member

jprjr commented Mar 12, 2015

Well, the documentation of ifelse says that if prog1 crashes (which s6-envuidgid does, since the user doesn't exist), ifelse will crash, too.

You can pass a flag -X to all the if* programs to treat a crash as a failure exit, instead of a crash. There's also flag to negate the test, so if the envuidgid crashes, it treats that like a "success" and executes prog2 - that avoid a no-op block. I think you'd want something like

foreground
{ 
  if -X -n
  { s6-envuidgid nobody s6-chown -U /var/log/s6-uncaught-logs  }
  s6-chown -u 65534 -g 65534 /var/log/s6-uncaught-logs
}

@glerchundi
Copy link
Member

@dreamcat4 I don't think we need to take care of that basic things! Indeed, busybox HAS /etc/passwd and /etc/group with common users&groups but the tag you were using (ubuntu-14.04) in your tests don't. It's weird.

$> docker run busybox cat /etc/passwd | grep nobody
nobody:x:99:99:nobody:/home:/bin/sh
$> docker run busybox cat /etc/group | grep nogroup
nogroup:x:99:

So, for now I wouldn't try to correct it, because I do not think it's a problem yet. Lets see if you get issues with this when we really publish to the real world! :-)

@skarnet
Copy link
Contributor Author

skarnet commented Mar 13, 2015

I understand that an image would want to be as small as possible and not come with its own complete POSIX environment, but missing a uid database is nonsensical. I've run boxes without /bin/sh - this very overlay runs without /bin/sh until the user stuff starts - but I can't run a box without /etc/passwd.

Users who want an s6-enabled image should understand that it's also about security, and security means one uid per service, and that means having an /etc/passwd, or whatever uid database they prefer, with enough uids.

To get things working ASAP, I just assumed "daemon" and "nobody" were present everywhere, and I very much hope that assumption is true, but the right solution would be to be able to add our own users to the image's /etc/passwd at image build time. Is it possible ? Or is the "original image + overlay" mix performed at run-time ?

@glerchundi
Copy link
Member

Ok, i'm not going to be the blocker guy and turn back. If you all three feel that we need to deal with this problem, let's do it.

@skarnet i would like to do it in build time, but that would cause another issues. Depending in the stage in which people includes our overlay, this will override /etc/{passwd, group} with the problems that would follow. As our overlay can just deploy static files in the destination image the solution would be to perform these actions in runtime:

  • check if nobody/nogroup exists
  • create them if they didn't exist.

What do you think?

Moving this to thread to a new issue. #15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants