-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dockerfile: eliminate dependency on dest directory for COPY #2414
Comments
I think this is only partially true. The classic builder did not do this (well, not consistently; only the final parent path would be chown'ed). The original proposal for that was to add a But it looks like BuildKit and the classic builder differ in this respect (so already a breaking change?), see moby/moby#42819; FROM busybox:latest
RUN mkdir -p /home/example-user/foo && chown -R 123:456 /home/example-user
COPY --chown=234:567 baz/. /home/example-user/foo/bar/baz
RUN find /home -exec printf '{} | ' \; -exec stat -c 'u:%u g:%g' {} \;
Without BuildKit (
With BuildKit (
Note that the classic builder isn't correct here either (if we follow the rules outlined in
|
Just to double check, I'm assuming in your examples you mean to be specifying:
Otherwise (without But with In terms of when I think there also might be a really awful corner case with something like this:
The first copy there might override the /etc/passwd of I don't have any great ideas, but will throw out a not-great one in case it spurs something better from anyone else. One way to get at least some of the benefits of merge might be to model this:
as separate steps for the merge and the chowns:
I don't think On the other hand, I believe calling chown on those files would duplicate them in the blobs as tar doesn't have the equivalent of overlay's |
@thaJeztah Interesting. Not even sure which makes more sense. Legacy builder leaks less info but logically it is even weirder that only some parts of the path are created with passed permissions. For this issue though it doesn't make a difference as there is still issue with the last part of the path.
The issue in here is that although the two copies were identical the owner of the dest path depends on if the path existed or not.
We are copying from "build context" without
😢 This is really bad. This isn't even just disabling lazy pull but removing all possible merge optimizations as the merge would need to be mounted locally between merges. I think this is such a contrived case that I'm willing to ignore it. I can't imagine any actual use case where it would make sense to use a copy to change the mapping of the username used in next copy. Or we can check if the destination of copy is
I'm not sure I understand this. These |
Oh duh... I forgot how that syntax works. Makes sense it would be an issue in that case too.
I looked around and found a few real world examples. Here's a couple:
So yeah it's obscure but not completely unheard of. Another variation on your suggestions would be a flag for
We'd need FileOp to be lazily implemented for both Copy and the hypothetical Chown. Then I think you could optimize it so that when you unlazy the top-level Chown you'd see that you can "compile" the stack of copies+merge+chown operation into one operation. Like I said, not a great idea due to being extremely complicated, just the only thing I could think of to salvage this case right now. We're probably better off just leaving it unoptimized for now IMO. |
Luckily at least for all of these cases, the |
Yes, I expect the legacy behavior to be a combination of "intentional" and "things we overlooked" (I think changing ownership of the "last parent dir" is likely a bug where we just chown "last path element that is a dir"). The TL;DR (and reason I posted that info) is that the behavior is currently under-defined, and already inconsistent, so (IMO) it's ok to change behavior (and properly define it), even if that could result in "breaking" changes (within reason of course, but I think most cases would be quite corner-case).
Yes, it's an interesting problem, Really random comment; can we learn from
Yes, so IIUC, the problem here is "any case where a non-numeric user/group is used" will be problematic (as those will need to be resolved based on the current state?) General question (apologies if it's a silly question!) Say, we have: FROM foo
COPY one /some-dir/ # adds "one"
COPY one two /some-dir/ # adds "two"
COPY one two three /some-dir/ # adds "three" What would a "rebase" of the last FROM foo
COPY one two three /some-dir/ # should add "one two three"; what does it do on a "rebase"? |
You need to explain more, I don't get the connection instantly. If we compare this to other cp logic then the issues start from the behavior that Dockerfile does not copy a directory but files inside it. If it would copy the directory then it would copy the metadata of the directory as well. As it copies files inside it then there is a problem with the parent directory owner/permissions breaking lots of cases and therefore historically someone decided to do a hacky patch and still copy the parent metadata, but only in certain conditions.
Are you asking if flattens/deduplicates the files copied in previous layers for the first example then no. Currently, this is snapshotter specific with optimized drivers not doing deduplication but for example vfs doing it due to side-effect of the diffing algorithm. With this change, this would be normalized. When user wants to flatten layers and remove duplicates they can either use |
In
MergeOp
#2335 we are adding capability thatCOPY
layers can be rebased and reused via--cache-from
even if cache for previous layers gets invalidated. All this works remotely with blobs in the registry. You can rebase an image on top of another image without the layers ever being downloaded or uploaded.In dockerfile frontend every
copy(src, dest)
will change tomerge(dest, copy(src, scratch())
.In order for a copy to work on remote objects only, it can not access any individual paths from the destination directory.
The problem with this is the behavior in the case when the destination directory does not exist. In that case, new dir is created currently with new properties but if it exists then nothing is changed about the directory.
Eg. when we have a Dockerfile
and after change a new Dockerfile
If we rebase the copy layer blob directly it would be wrong as the layer already contains directory
a/b/c
with perm 0755 that would overwrite the previous layer. While if the second file runs directly thena/b/c
would remain 0600.Cases where we can solve this problem
When
USER
is root and no--chown/chmod
is set we can fix this by never putting records for the implied parent dirs in the tarball thatCOPY
created. The tarball will only contain one recorda/b/c/foo
. When the image is pulled, a container runtime like docker will fill in the missing directories fora/b/c
with default configuration when they do not exist.In order to make this work, we need to log the actual changes
COPY
made so we can exclude the implied parent directories when making a tarball. Started with that in tonistiigi/fsutil#113Cases that can't be solved
When
COPY
contains--chown=username
there is no way this copy can be rebased with remote objects only. The username to uid mapping is in the parent image and the only way to check if it has changed is to extract the image and read/etc/passwd
. This is unfortunate as this mapping pretty much never changes but don't see any solutions.Cases that could be solved with some additional syntax
COPY --chown=uid
andCOPY --chmod=non-default-perms
would not work by default. We can't just exclude the implied parents asdocker
would only create these parents with default perms/user. While in Dockerfile, unfortunately, the rule is that implied parents also get these chown/chmod values (what doesn't really make any sense but we can't just break it and I don't want to create v2 just for this).We could allow rebases with these
COPY
instructions if there would be some additional (opt-in) syntax(eg. new flag) where the user either confirms thatCOPY
should not create implied parent directories or that it should always create them(up to a point). We need to eliminate the need to stat the destination directory in order to determine what the resulting state should be. From user's standpoint they almost always already know if the directory already exists or should be created. Ideally, it would be something that we could at least write a linter rule and suggest all users to always use this syntax.Suggestions?
@sipsma @thaJeztah @crazy-max @AkihiroSuda @aaronlehmann
The text was updated successfully, but these errors were encountered: