-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore build args after optimize. Fixes #1910, #1912. #1915
Restore build args after optimize. Fixes #1910, #1912. #1915
Conversation
0c3b97d
to
731068f
Compare
731068f
to
1dfda2b
Compare
@imjasonh Any chance you could approve me so I can get the tests running here (I'd also love a review since it fixes a few ugly caching bugs, but being able to run the tests would already be great). |
Absolutely! Thanks for working on this, tests running now, I'll review shortly🙏 |
Lovely, thanks! If you have any questions on why I did things the way I did please let me know. I am a noob in GO and finding that slice issue in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this change! Is there any way we can test this, in such a way that before this change the test would fail?
This would help demonstrate how this change helps fix an issue, and help guard against future regressions.
@@ -227,6 +227,11 @@ func (s *stageBuilder) optimize(compositeKey CompositeCache, cfg v1.Config) erro | |||
if !s.opts.Cache { | |||
return nil | |||
} | |||
var buildArgs = s.args.Clone() | |||
// Restore build args back to their original values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to restore them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to to illustrate it with an example.
Take
ARG ARG1=val1
ARG ARG2=val2
as an example. Going through those two stages optimize
will generate the following cache keys:
'' -- starting empty
ARG ARG1=val1 results in roughly 'ARG ARG1=val1'
ARG ARG2=val2 results now in 'ARG1=val1 ARG ARG2=val2'
As you can see the ARG1
will become an env variable and as such is part of the following cache keys (this is defined in the dockerfile spec that args become envs for the subsequent calls).
Now after optimize()
the s.args
really contains ARG1
& ARG2
(if you look further down in optimize
the meta commands are actually executed and now change the state).
Now if we do not restore the build args to the state when optimize started, the actual build steps will start with ARG1
& ARG2
prepopulated. This has two issues, taking the above Dockerfiles the cache keys will now roughly look like this:
'ARG1=val1 ARG2=val2' -- not starting empty because build args already defined
ARG ARG1=val1 results in roughly 'ARG1=val1 ARG2=val2 ARG ARG1=val1'
ARG ARG2=val2 results now in 'ARG1=val1 ARG2=val2 ARG ARG2=val2'
as you can see the cache key is now completely different, so this cached layer will never be found (because optimize looked for a different layer due to the different cache key). -- This can be seen in #1910 (comment) ; during the actual build TEST is always set even if not executed yet -- which is also the root cause of #1912
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjasonh, as I understand it, the optimize
function actually executes the meta-only commands in order to properly determine the environment state for the purposes of generating a meaningful cache key (build.go, line 283.
I assume this was leaving the s.args
member in a dirty state for the rest of the execution of the build
method, and therefore was the cause of the first problem fixed by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct.
pkg/dockerfile/buildargs.go
Outdated
filtered := b.FilterAllowed(envs) | ||
return append(envs, filtered...) | ||
resultEnv = append(resultEnv, filtered...) | ||
sort.Strings(resultEnv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I'm not totally convinced that it's safe to sort env vars. I understand why we'd want that for cache key generation, but FOO=x FOO=b echo $FOO
has demonstrably different results if you sort the env vars. The output of those should produce different cached results as well.
Am I misunderstanding the problem that this solves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are asking a good question, one that I already feared and I do not have a perfect answer. I do agree that this might break somehow somewhere. The reason why I need it sorted (at least for the cache key) is that I need a stable order.
If you look into FilterAllowed
it iterates over GetAllAllowed()
which is a map[string]string
and as such at least the resulting filtered
variable is not ordered. So as a compromise we can call the sort on filtered
only to be stable there. I think this is partially the behavior that @vinman1 is seeing in #1910 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjasonh, @apollo13, instead of sorting the result list here which would affect all callers of the ReplacementEnvs
function (for which there seems to be many), couldn't the sort be done in the populateCompositeKey
function in pkg/executor/build.go
just before line 184 when the result is used next?
That way, we know that the sorting will only be applied in the use cases of a cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that before, but I think just sorting filtered
in here should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to look at how Docker itself uses FilterAllowed
: https://github.com/moby/moby/blob/459d0dfbbb51fb2423a43655e6c62368ec0f36c9/builder/dockerfile/dispatchers.go#L363
prependEnvOnCmd
also seems to sort the filtered envs:
https://github.com/moby/moby/blob/459d0dfbbb51fb2423a43655e6c62368ec0f36c9/builder/dockerfile/dispatchers.go#L436
If that's the case, I feel quite a bit more comfortable with this approach, since, whether it results in incorrect behavior or not, it matches Docker's behavior.
+1 to @vinman1's comment about moving this behavior to as narrow and isolated a place as possible, to avoid unexpected effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a new commit moving the sort into populateCompositeKey
I wish, I was hoping that you could tell me where similar tests can be found so I can dig around there. I have a few minimal Dockerfiles that show the issues; but not sure how to rewrite those to tests. |
To provide a few examples; executing kaniko like this (without my patch):
with the following Dockerfile:
always outputs hi -> no caching of the run layer. Running it with:
will echo ups when running with a cache enabled, even though $TESTNOTSET should not be set. I'd love to somehow have those two files as test. |
I'm afraid it's the blind leading the blind here, I'm not very familiar with all the workings of this project either 😕 In the absence of new CI tests (and assuming existing tests pass) I'm fine taking your word that this is the more correct behavior. If anybody is able to figure out where to add a test for this though I'll owe you a beer. 🍻 |
That is great, I promise to add tests once I know how / where :) Beer would be nice if we ever manage to meet in person someday somewhere. |
Fixes #1910
Fixes #1912
Description
After executing metadata commands
optimize()
needs to restore the build args sobuild()
does not start with build args that it should not yet know.I do not have found the correct location for the tests, hints welcome, thank you!
Submitter Checklist
See the contribution guide for more details.
Reviewer Notes
Release Notes
Describe any changes here so maintainer can include it in the release notes, or delete this block.