Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerfile: eliminate dependency on dest directory for COPY #2414

Closed
tonistiigi opened this issue Oct 14, 2021 · 7 comments · Fixed by #2596
Closed

dockerfile: eliminate dependency on dest directory for COPY #2414

tonistiigi opened this issue Oct 14, 2021 · 7 comments · Fixed by #2596

Comments

@tonistiigi
Copy link
Member

In MergeOp #2335 we are adding capability that COPY layers can be rebased and reused via --cache-from even if cache for previous layers gets invalidated. All this works remotely with blobs in the registry. You can rebase an image on top of another image without the layers ever being downloaded or uploaded.

In dockerfile frontend every copy(src, dest) will change to merge(dest, copy(src, scratch()).

In order for a copy to work on remote objects only, it can not access any individual paths from the destination directory.

The problem with this is the behavior in the case when the destination directory does not exist. In that case, new dir is created currently with new properties but if it exists then nothing is changed about the directory.

Eg. when we have a Dockerfile

FROM alpine
COPY foo a/b/c/foo

and after change a new Dockerfile

FROM alpine
RUN mkdir -p a/b/c && chmod -Rf a/b/c 0600
COPY foo a/b/c/foo

If we rebase the copy layer blob directly it would be wrong as the layer already contains directory a/b/c with perm 0755 that would overwrite the previous layer. While if the second file runs directly then a/b/c would remain 0600.

Cases where we can solve this problem

When USER is root and no --chown/chmod is set we can fix this by never putting records for the implied parent dirs in the tarball that COPY created. The tarball will only contain one record a/b/c/foo. When the image is pulled, a container runtime like docker will fill in the missing directories for a/b/c with default configuration when they do not exist.

In order to make this work, we need to log the actual changes COPY made so we can exclude the implied parent directories when making a tarball. Started with that in tonistiigi/fsutil#113

Cases that can't be solved

When COPY contains --chown=username there is no way this copy can be rebased with remote objects only. The username to uid mapping is in the parent image and the only way to check if it has changed is to extract the image and read /etc/passwd. This is unfortunate as this mapping pretty much never changes but don't see any solutions.

Cases that could be solved with some additional syntax

COPY --chown=uid and COPY --chmod=non-default-perms would not work by default. We can't just exclude the implied parents as docker would only create these parents with default perms/user. While in Dockerfile, unfortunately, the rule is that implied parents also get these chown/chmod values (what doesn't really make any sense but we can't just break it and I don't want to create v2 just for this).

We could allow rebases with these COPY instructions if there would be some additional (opt-in) syntax(eg. new flag) where the user either confirms that COPY should not create implied parent directories or that it should always create them(up to a point). We need to eliminate the need to stat the destination directory in order to determine what the resulting state should be. From user's standpoint they almost always already know if the directory already exists or should be created. Ideally, it would be something that we could at least write a linter rule and suggest all users to always use this syntax.

Suggestions?

@sipsma @thaJeztah @crazy-max @AkihiroSuda @aaronlehmann

@thaJeztah
Copy link
Member

While in Dockerfile, unfortunately, the rule is that implied parents also get these chown/chmod values (what doesn't really make any sense but we can't just break it and I don't want to create v2 just for this).

I think this is only partially true. The classic builder did not do this (well, not consistently; only the final parent path would be chown'ed). The original proposal for that was to add a --parents option #396 (comment) / moby/moby#35639, moby/moby#38710 (comment)

But it looks like BuildKit and the classic builder differ in this respect (so already a breaking change?), see moby/moby#42819;

FROM busybox:latest
RUN mkdir -p /home/example-user/foo && chown -R 123:456 /home/example-user
COPY --chown=234:567 baz/. /home/example-user/foo/bar/baz
RUN find /home -exec printf '{} | ' \; -exec stat -c 'u:%u g:%g' {} \;
mkdir baz
echo bla > baz/something.txt

Without BuildKit (DOCKER_BUILDKIT=0 docker build --no-cache --progress=plain -f Dockerfile .)

/home | u:65534 g:65534
/home/example-user | u:123 g:456
/home/example-user/foo | u:123 g:456
/home/example-user/foo/bar | u:0 g:0
/home/example-user/foo/bar/baz | u:234 g:567
/home/example-user/foo/bar/baz/something.txt | u:234 g:567

With BuildKit (DOCKER_BUILDKIT=1 docker build --no-cache --progress=plain -f Dockerfile .)

/home | u:65534 g:65534
/home/example-user | u:123 g:456
/home/example-user/foo | u:123 g:456
/home/example-user/foo/bar | u:234 g:567
/home/example-user/foo/bar/baz | u:234 g:567
/home/example-user/foo/bar/baz/something.txt | u:234 g:567

Note that the classic builder isn't correct here either (if we follow the rules outlined in docker cp), as;

  • COPY --chown=foo:bar /foo/bar/baz/. /foo/bar/baz should create parent path (/foo/bar/baz) if missing, then copy . into it. chown should only be applied to the copied content(?) so the content of /foo/bar/baz
  • COPY --chown=foo:bar /foo/bar/baz /foo/bar/baz should create parent path (/foo/bar) if missing, then copy baz (and content) into it. chown should only be applied to the copied content(?) so baz and it's content.

@sipsma
Copy link
Collaborator

sipsma commented Oct 14, 2021

Just to double check, I'm assuming in your examples you mean to be specifying:

COPY --from=bar foo a/b/c/foo

Otherwise (without --from=bar) I don't see that this issue would exist since the cache would get invalidated between the runs due to the source of the copy changing (it's alpine in the first example and alpine.Run(...) in the second).

But with --from=bar I agree that we will need your idea of creating copy blobs w/out parent directories.

In terms of when --chown=username is specified, I agree there's no way to do this that doesn't involve unlazying the merge when the image holding /etc/passwd changes.

I think there also might be a really awful corner case with something like this:

FROM base
COPY --from=foo /etc/* /etc/
COPY --from=bar --chown=user /a /b

The first copy there might override the /etc/passwd of base and thus impact what user is in the second copy. So I feel like we may need to just disable the merge-optimizations for COPY statements that have --chown=username entirely for now.

I don't have any great ideas, but will throw out a not-great one in case it spurs something better from anyone else. One way to get at least some of the benefits of merge might be to model this:

FROM base
COPY --from=foo --chown=user1 /a /b
COPY --from=bar --chown=user2 /c /d

as separate steps for the merge and the chowns:

layer1 := Merge(base, Scratch().File(Copy(foo, /a, /b)), Scratch.File(Copy(bar, /c, /d)))
layer2 := layer1.File(Chown(/b, user1)).File(Chown(/d, user2))

I don't think Chown exists as a standalone FileOp today, but imagine it's a file op that just changes the owner of files. If we did this, then we'd still have to unlazy layers if one of the merge inputs changed in order to re-run the chowns, but at least you get the other benefits of merge in terms of those inputs not invalidating each other (which results in wasted time repeating copies, new blobs being created and having to be re-exported, etc.).

On the other hand, I believe calling chown on those files would duplicate them in the blobs as tar doesn't have the equivalent of overlay's metacopy (right?). It should be possible to optimize this in buildkitd by doing something similar to the lazy FileOp we discussed previously @tonistiigi, but as we found that's a huge rabbit hole of it's own... So overall, this route might lead to some benefits but would be pretty complex.

@tonistiigi
Copy link
Member Author

@thaJeztah Interesting. Not even sure which makes more sense. Legacy builder leaks less info but logically it is even weirder that only some parts of the path are created with passed permissions.

For this issue though it doesn't make a difference as there is still issue with the last part of the path.

FROM busybox:latest
RUN mkdir -p /home/example-user && chown -R 123:456 /home/example-user
COPY --chown=234:567 baz/. /home/example-user/
COPY --chown=234:567 baz/. /home/another-user/

RUN find /home -exec printf '{} | ' \; -exec stat -c 'u:%u g:%g' {} \;
#8 [5/5] RUN find /home -exec printf '{} | ' ; -exec stat -c 'u:%u g:%g' {} ;
#8 sha256:aaee8642f9ea3db7c987d5f3701db84ca621fb72d5dcbbcd225505b9640dd92e
#8 0.057 /home | u:65534 g:65534
#8 0.058 /home/another-user | u:234 g:567
#8 0.060 /home/another-user/foo | u:234 g:567
#8 0.061 /home/example-user | u:123 g:456
#8 0.063 /home/example-user/foo | u:234 g:567
 ---> Running in 7c621a11dc5c
/home | u:65534 g:65534
/home/another-user | u:234 g:567
/home/another-user/foo | u:234 g:567
/home/example-user | u:123 g:456
/home/example-user/foo | u:234 g:567

The issue in here is that although the two copies were identical the owner of the dest path depends on if the path existed or not.

@sipsma

Otherwise (without --from=bar) I don't see that this issue would exist since the cache would get invalidated between the runs due to the source of the copy changing (it's alpine in the first example and alpine.Run(...) in the second).

We are copying from "build context" without --from, not from alpine or alpine.Run. So if context has not changed it should behave the same.

I think there also might be a really awful corner case with something like this:

😢 This is really bad. This isn't even just disabling lazy pull but removing all possible merge optimizations as the merge would need to be mounted locally between merges.

I think this is such a contrived case that I'm willing to ignore it. I can't imagine any actual use case where it would make sense to use a copy to change the mapping of the username used in next copy. Or we can check if the destination of copy is /etc or /etc/passwd (I know it doesn't protect against symlinks). Or maybe like a #reset-user directive in the comment for COPY if someone really wants the COPY operation to change the behavior of the next --chown flag.

layer2 := layer1.File(Chown(/b, user1)).File(Chown(/d, user2))

I'm not sure I understand this. These Chown calls would duplicate files into another layer and they would still require everything to be mounted that is even worse than when just requiring the dest directory to be mounted.

@sipsma
Copy link
Collaborator

sipsma commented Oct 15, 2021

We are copying from "build context" without --from, not from alpine or alpine.Run. So if context has not changed it should behave the same.

Oh duh... I forgot how that syntax works. Makes sense it would be an issue in that case too.

I think this is such a contrived case that I'm willing to ignore it. I can't imagine any actual use case where it would make sense to use a copy to change the mapping of the username used in next copy. Or we can check if the destination of copy is /etc or /etc/passwd (I know it doesn't protect against symlinks). Or maybe like a #reset-user directive in the comment for COPY if someone really wants the COPY operation to change the behavior of the next --chown flag.

I looked around and found a few real world examples. Here's a couple:

  1. https://sourcegraph.com/github.com/Shopify/kubeaudit/-/blob/Dockerfile
  2. https://sourcegraph.com/github.com/cds-snc/covid-alert-server/-/blob/Dockerfile
  3. https://sourcegraph.com/github.com/argoproj-labs/argocd-autopilot/-/blob/Dockerfile
    • This one is the most interesting because it's used on an image that isn't based on scratch

So yeah it's obscure but not completely unheard of.

Another variation on your suggestions would be a flag for COPY like --userfrom=<stage> that allows you to specifically say where to read /etc/passwd from. No good options though.

I'm not sure I understand this. These Chown calls would duplicate files into another layer and they would still require everything to be mounted that is even worse than when just requiring the dest directory to be mounted.

We'd need FileOp to be lazily implemented for both Copy and the hypothetical Chown. Then I think you could optimize it so that when you unlazy the top-level Chown you'd see that you can "compile" the stack of copies+merge+chown operation into one operation. Like I said, not a great idea due to being extremely complicated, just the only thing I could think of to salvage this case right now. We're probably better off just leaving it unoptimized for now IMO.

@tonistiigi
Copy link
Member Author

So yeah it's obscure but not completely unheard of.

Luckily at least for all of these cases, the /etc/passwd case can be determined from COPY arguments.

@thaJeztah
Copy link
Member

@thaJeztah Interesting. Not even sure which makes more sense. Legacy builder leaks less info but logically it is even weirder that only some parts of the path are created with passed permissions.

Yes, I expect the legacy behavior to be a combination of "intentional" and "things we overlooked" (I think changing ownership of the "last parent dir" is likely a bug where we just chown "last path element that is a dir").

The TL;DR (and reason I posted that info) is that the behavior is currently under-defined, and already inconsistent, so (IMO) it's ok to change behavior (and properly define it), even if that could result in "breaking" changes (within reason of course, but I think most cases would be quite corner-case).

For this issue though it doesn't make a difference as there is still issue with the last part of the path.

Yes, it's an interesting problem,

Really random comment; can we learn from git? The situation feels (somewhat) similar; IIUC, git only stores files, not directories; how does it handle metadata of directories?

I think there also might be a really awful corner case with something like this:

😢 This is really bad. This isn't even just disabling lazy pull but removing all possible merge optimizations as the merge would need to be mounted locally between merges.

Yes, so IIUC, the problem here is "any case where a non-numeric user/group is used" will be problematic (as those will need to be resolved based on the current state?)

General question (apologies if it's a silly question!)

Say, we have:

FROM foo
COPY one            /some-dir/    # adds "one"
COPY one two        /some-dir/    # adds "two"
COPY one two three  /some-dir/    # adds "three"

What would a "rebase" of the last COPY result in?

FROM foo
COPY one two three  /some-dir/    # should add "one two three"; what does it do on a "rebase"?

@tonistiigi
Copy link
Member Author

Really random comment; can we learn from git? The situation feels (somewhat) similar; IIUC, git only stores files, not directories; how does it handle metadata of directories?

You need to explain more, I don't get the connection instantly. If we compare this to other cp logic then the issues start from the behavior that Dockerfile does not copy a directory but files inside it. If it would copy the directory then it would copy the metadata of the directory as well. As it copies files inside it then there is a problem with the parent directory owner/permissions breaking lots of cases and therefore historically someone decided to do a hacky patch and still copy the parent metadata, but only in certain conditions.

What would a "rebase" of the last COPY result in?

Are you asking if flattens/deduplicates the files copied in previous layers for the first example then no. Currently, this is snapshotter specific with optimized drivers not doing deduplication but for example vfs doing it due to side-effect of the diffing algorithm. With this change, this would be normalized. When user wants to flatten layers and remove duplicates they can either use RUN --mount with any other copy algorithm(like rsync) or add an extra COPY --from stage that flattens files together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants