Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not recording non-empty directories is problematic #103

Closed
maleadt opened this issue Apr 20, 2021 · 6 comments · Fixed by #106
Closed

Not recording non-empty directories is problematic #103

maleadt opened this issue Apr 20, 2021 · 6 comments · Fixed by #106

Comments

@maleadt
Copy link

maleadt commented Apr 20, 2021

I'm tarring a rootfs directory that looks as follows:

$ ls -la                                                                                                                                                         
total 100K
drwxr-xr-x  17 tim tim 4,0K apr 16 16:48 ./
drwxr-xr-x 320 tim tim  32K apr 16 19:42 ../
lrwxrwxrwx   1 tim tim    7 apr 16 16:48 bin -> usr/bin/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 boot/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 dev/
drwxr-xr-x  37 tim tim 4,0K apr 16 16:48 etc/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 home/
lrwxrwxrwx   1 tim tim    7 apr 16 16:48 lib -> usr/lib/
lrwxrwxrwx   1 tim tim    9 apr 16 16:48 lib32 -> usr/lib32/
lrwxrwxrwx   1 tim tim    9 apr 16 16:48 lib64 -> usr/lib64/
lrwxrwxrwx   1 tim tim   10 apr 16 16:48 libx32 -> usr/libx32/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 media/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 mnt/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 opt/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 proc/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 root/
drwxr-xr-x   4 tim tim 4,0K apr 16 16:48 run/
lrwxrwxrwx   1 tim tim    8 apr 16 16:48 sbin -> usr/sbin/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 srv/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 sys/
drwxr-xr-x   2 tim tim 4,0K apr 16 16:48 tmp/
drwxr-xr-x  13 tim tim 4,0K apr 16 16:48 usr/
drwxr-xr-x  11 tim tim 4,0K apr 16 16:48 var/

I'll be focusing on the usr directory, which contains more directories and files (i.e. it's not empty):

$ ls -la usr                                                                                                                                                     
total 60K
drwxr-xr-x 13 tim tim 4,0K apr 16 16:48 ./
drwxr-xr-x 17 tim tim 4,0K apr 16 16:48 ../
drwxr-xr-x  2 tim tim  12K apr 16 16:48 bin/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 games/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 include/
drwxr-xr-x 20 tim tim 4,0K apr 16 16:48 lib/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 lib32/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 lib64/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 libx32/
drwxr-xr-x 10 tim tim 4,0K apr 16 16:48 local/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 sbin/
drwxr-xr-x 42 tim tim 4,0K apr 16 16:48 share/
drwxr-xr-x  2 tim tim 4,0K apr 16 16:48 src/

If I create a tarball with Tar.jl and GNU tar, and then list the tarball's contents using tar -tvf, there's some differences. One difference is that Tar.jl's tarball doesn't list the usr and bin directories themselves, because it doesn't encode those directories' metadata.

GNU Tar:

drwxr-xr-x tim/tim           0 2021-04-16 16:48 ./usr/
drwxr-xr-x tim/tim           0 2021-04-16 16:48 ./usr/bin/
-r-xr-xr-x tim/tim        7786 2021-04-16 16:48 ./usr/bin/lcf

Tar.jl:

-rwxr-xr-x 0/0            7786 1970-01-01 01:00 usr/bin/lcf

As per the README, this is (currently) by design. That does introduce issues with some tools, though, so I figured I'd create an issue documenting those. One of those tools is Docker, whose docker import tool can read a rootfs tarball to create a layer. However, lacking an entry for those directories, the permissions on those directories are "wrong", essentially rendering the created image useless for non-root uses:

julia> open(`docker import -`, "w", stdout) do io
         Tar.create("/home/tim/Julia/depot/artifacts/e0c005e71308ffa5b7a858ce435c1083fbb8702d", io)
       end
sha256:5d377e83f9734b0e4f4558b301bce3b045952cada76ac4c145e779fee5e37765
Process(`docker import -`, ProcessExited(0))

shell> docker run 5d377e83f9734b0e4f4558b301bce3b045952cada76ac4c145e779fee5e37765 ls -la /
total 56
drwxr-xr-x   1 root root 4096 Apr 20 10:55 .
drwxr-xr-x   1 root root 4096 Apr 20 10:55 ..
-rwxr-xr-x   1 root root    0 Apr 20 10:55 .dockerenv
lrwxrwxrwx   1 root root    7 Jan  1  1970 bin -> usr/bin
drwxr-xr-x   2 root root 4096 Jan  1  1970 boot
drwxr-xr-x   5 root root  340 Apr 20 10:55 dev
drw-------   1 root root 4096 Apr 20 10:55 etc
drwxr-xr-x   2 root root 4096 Jan  1  1970 home
lrwxrwxrwx   1 root root    7 Jan  1  1970 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Jan  1  1970 lib32 -> usr/lib32
lrwxrwxrwx   1 root root    9 Jan  1  1970 lib64 -> usr/lib64
lrwxrwxrwx   1 root root   10 Jan  1  1970 libx32 -> usr/libx32
drwxr-xr-x   2 root root 4096 Jan  1  1970 media
drwxr-xr-x   2 root root 4096 Jan  1  1970 mnt
drwxr-xr-x   2 root root 4096 Jan  1  1970 opt
dr-xr-xr-x 319 root root    0 Apr 20 10:55 proc
drw-------   2 root root 4096 Apr 20 10:41 root
drw-------   4 root root 4096 Apr 20 10:41 run
lrwxrwxrwx   1 root root    8 Jan  1  1970 sbin -> usr/sbin
drwxr-xr-x   2 root root 4096 Jan  1  1970 srv
dr-xr-xr-x  13 root root    0 Apr 20 10:55 sys
drwxr-xr-x   2 root root 4096 Jan  1  1970 tmp
drw-------  13 root root 4096 Apr 20 10:41 usr
drw-------  11 root root 4096 Apr 20 10:41 var

shell> docker run 5d377e83f9734b0e4f4558b301bce3b045952cada76ac4c145e779fee5e37765 ls -la /usr
total 60
drw------- 13 root root  4096 Apr 20 10:41 .
drwxr-xr-x  1 root root  4096 Apr 20 10:55 ..
drw-------  2 root root 12288 Apr 20 10:41 bin
drwxr-xr-x  2 root root  4096 Jan  1  1970 games
drwxr-xr-x  2 root root  4096 Jan  1  1970 include
drw------- 20 root root  4096 Apr 20 10:41 lib
drwxr-xr-x  2 root root  4096 Jan  1  1970 lib32
drw-------  2 root root  4096 Apr 20 10:41 lib64
drwxr-xr-x  2 root root  4096 Jan  1  1970 libx32
drw------- 10 root root  4096 Apr 20 10:41 local
drw-------  2 root root  4096 Apr 20 10:41 sbin
drw------- 42 root root  4096 Apr 20 10:41 share
drwxr-xr-x  2 root root  4096 Jan  1  1970 src

As you can see, the permissions of usr and usr/bin are too restrictive. At the same time, usr/src is fine because it's an empty directory which is part of the tarball. Maybe the 'only if empty' rule should be extended to all directories so that tools like docker import don't stumble on Tar.jl's tarballs?

Note that docker import is probably to blame here too, as unpacking Tar.jl's tarballs using GNU Tar does result in proper permissions (but with weird timestamps as only usr/src was recorded by Tar.jl with a zero timestamp):

$ ls -la                                                                                                                                                                                                                   
total 0
drwxr-xr-x 17 tim  tim   460 apr 20 12:40 ./
drwxrwxrwt 28 root root  900 apr 20 13:01 ../
lrwxrwxrwx  1 tim  tim     7 jan  1  1970 bin -> usr/bin/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 boot/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 dev/
drwxr-xr-x 37 tim  tim  1,7K apr 20 12:40 etc/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 home/
lrwxrwxrwx  1 tim  tim     7 jan  1  1970 lib -> usr/lib/
lrwxrwxrwx  1 tim  tim     9 jan  1  1970 lib32 -> usr/lib32/
lrwxrwxrwx  1 tim  tim     9 jan  1  1970 lib64 -> usr/lib64/
lrwxrwxrwx  1 tim  tim    10 jan  1  1970 libx32 -> usr/libx32/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 media/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 mnt/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 opt/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 proc/
drwxr-xr-x  2 tim  tim    80 apr 20 12:40 root/
drwxr-xr-x  4 tim  tim   100 apr 20 12:40 run/
lrwxrwxrwx  1 tim  tim     8 jan  1  1970 sbin -> usr/sbin/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 srv/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 sys/
drwxr-xr-x  2 tim  tim    40 jan  1  1970 tmp/
drwxr-xr-x 13 tim  tim   260 apr 20 12:40 usr/
drwxr-xr-x 11 tim  tim   260 apr 20 12:40 var/

$ ls -la usr                                                                                                                                                                                                               
total 0
drwxr-xr-x 13 tim tim  260 apr 20 12:40 ./
drwxr-xr-x 17 tim tim  460 apr 20 12:40 ../
drwxr-xr-x  2 tim tim 8,5K apr 20 12:40 bin/
drwxr-xr-x  2 tim tim   40 jan  1  1970 games/
drwxr-xr-x  2 tim tim   40 jan  1  1970 include/
drwxr-xr-x 20 tim tim  440 apr 20 12:40 lib/
drwxr-xr-x  2 tim tim   40 jan  1  1970 lib32/
drwxr-xr-x  2 tim tim   60 apr 20 12:40 lib64/
drwxr-xr-x  2 tim tim   40 jan  1  1970 libx32/
drwxr-xr-x 10 tim tim  220 apr 20 12:40 local/
drwxr-xr-x  2 tim tim 2,5K apr 20 12:40 sbin/
drwxr-xr-x 42 tim tim  840 apr 20 12:40 share/
drwxr-xr-x  2 tim tim   40 jan  1  1970 src/
@StefanKarpinski
Copy link
Member

This could be changed at the cost of two things:

  1. Slightly bigger tarballs (not really a big deal)
  2. Breaking reproducibility with past versions of Tar.create

I don't know if we're relying the tarballs that we generate being reproducible anywhere. cc @staticfloat?

@StefanKarpinski
Copy link
Member

Oh, is there a umask set when you're doing this?

@maleadt
Copy link
Author

maleadt commented Apr 21, 2021

Nope, just the default 022.

@staticfloat
Copy link
Contributor

No, we don't need reproducibility yet. And if we did, we would likely check it through tree hashing, and not through sha hashing.

StefanKarpinski added a commit that referenced this issue Apr 22, 2021
Issue #103 points out that omitting directory entries for non-empty
directories can confuse some tools that consume tarballs, including
docker, which applies overly restrictive permissions to directories
which are not explicitly included in a tarball.

This commit changes `Tar.create` and `Tar.rewrite` to produce tarballs
which include explicit directory entries for all (non-root) directories.
This changes Tar.jl's "canonical format", which is, by design one-to-one
with git tree hashes. However, it does not seem like anyone currently
depends on that exact reproducibility and it seems worth making this
change in order to avoid confusing external consumers of our tarballs.

Closes #103.
@StefanKarpinski
Copy link
Member

Ok, I think we should go ahead and change this now, so I've made a PR which I'm going to merge.

@StefanKarpinski
Copy link
Member

Oh, in the process I discovered a set of obscure bugs which I fixed: #105. I discovered this by accidentally changing the tarballs we produce to include each directory after the contents of that directory, which none of the tarball consuming functions handled correctly, each broken in a slightly different way. So we inadvertently also made this package more robust.

StefanKarpinski added a commit that referenced this issue Apr 22, 2021
Issue #103 points out that omitting directory entries for non-empty
directories can confuse some tools that consume tarballs, including
docker, which applies overly restrictive permissions to directories
which are not explicitly included in a tarball.

This commit changes `Tar.create` and `Tar.rewrite` to produce tarballs
which include explicit directory entries for all (non-root) directories.
This changes Tar.jl's "canonical format", which is, by design one-to-one
with git tree hashes. However, it does not seem like anyone currently
depends on that exact reproducibility and it seems worth making this
change in order to avoid confusing external consumers of our tarballs.

Closes #103.
StefanKarpinski added a commit that referenced this issue Apr 22, 2021
Issue #103 points out that omitting directory entries for non-empty
directories can confuse some tools that consume tarballs, including
docker, which applies overly restrictive permissions to directories
which are not explicitly included in a tarball.

This commit changes `Tar.create` and `Tar.rewrite` to produce tarballs
which include explicit directory entries for all (non-root) directories.
This changes Tar.jl's "canonical format", which is, by design one-to-one
with git tree hashes. However, it does not seem like anyone currently
depends on that exact reproducibility and it seems worth making this
change in order to avoid confusing external consumers of our tarballs.

Closes #103.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants