Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore build args after optimize. Fixes #1910, #1912. #1915

Merged
merged 2 commits into from
Feb 9, 2022

Conversation

apollo13
Copy link
Contributor

@apollo13 apollo13 commented Feb 2, 2022

Fixes #1910
Fixes #1912

Description

After executing metadata commands optimize() needs to restore the build args so build() does not start with build args that it should not yet know.

I do not have found the correct location for the tests, hints welcome, thank you!

Submitter Checklist

  • Includes unit tests
  • Adds integration tests if needed.

See the contribution guide for more details.

Reviewer Notes

  • The code flow looks good.
  • Unit tests and or integration tests added.

Release Notes

Describe any changes here so maintainer can include it in the release notes, or delete this block.

- Fix cache key calculation involving ARG and ENV

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 8, 2022

@imjasonh Any chance you could approve me so I can get the tests running here (I'd also love a review since it fixes a few ugly caching bugs, but being able to run the tests would already be great).

@apollo13 apollo13 marked this pull request as ready for review February 8, 2022 17:39
@imjasonh
Copy link
Collaborator

imjasonh commented Feb 8, 2022

Absolutely! Thanks for working on this, tests running now, I'll review shortly🙏

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 8, 2022

Lovely, thanks! If you have any questions on why I did things the way I did please let me know. I am a noob in GO and finding that slice issue in ReplacementEnvs cost me quite a bit :) I still haven't figured out how and where to add tests for this, any existing test that I could copy and start from would be great!

Copy link
Collaborator

@imjasonh imjasonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change! Is there any way we can test this, in such a way that before this change the test would fail?

This would help demonstrate how this change helps fix an issue, and help guard against future regressions.

@@ -227,6 +227,11 @@ func (s *stageBuilder) optimize(compositeKey CompositeCache, cfg v1.Config) erro
if !s.opts.Cache {
return nil
}
var buildArgs = s.args.Clone()
// Restore build args back to their original values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to restore them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to to illustrate it with an example.

Take

ARG ARG1=val1
ARG ARG2=val2

as an example. Going through those two stages optimize will generate the following cache keys:

'' -- starting empty
ARG ARG1=val1 results in roughly 'ARG ARG1=val1'
ARG ARG2=val2 results now in 'ARG1=val1 ARG ARG2=val2'

As you can see the ARG1 will become an env variable and as such is part of the following cache keys (this is defined in the dockerfile spec that args become envs for the subsequent calls).

Now after optimize() the s.args really contains ARG1 & ARG2 (if you look further down in optimize the meta commands are actually executed and now change the state).

Now if we do not restore the build args to the state when optimize started, the actual build steps will start with ARG1 & ARG2 prepopulated. This has two issues, taking the above Dockerfiles the cache keys will now roughly look like this:

'ARG1=val1 ARG2=val2' -- not starting empty because build args already defined
ARG ARG1=val1 results in roughly 'ARG1=val1 ARG2=val2 ARG ARG1=val1'
ARG ARG2=val2 results now in 'ARG1=val1 ARG2=val2 ARG ARG2=val2'

as you can see the cache key is now completely different, so this cached layer will never be found (because optimize looked for a different layer due to the different cache key). -- This can be seen in #1910 (comment) ; during the actual build TEST is always set even if not executed yet -- which is also the root cause of #1912

This comment was marked as outdated.

This comment was marked as outdated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjasonh, as I understand it, the optimize function actually executes the meta-only commands in order to properly determine the environment state for the purposes of generating a meaningful cache key (build.go, line 283.

I assume this was leaving the s.args member in a dirty state for the rest of the execution of the build method, and therefore was the cause of the first problem fixed by this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct.

filtered := b.FilterAllowed(envs)
return append(envs, filtered...)
resultEnv = append(resultEnv, filtered...)
sort.Strings(resultEnv)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not totally convinced that it's safe to sort env vars. I understand why we'd want that for cache key generation, but FOO=x FOO=b echo $FOO has demonstrably different results if you sort the env vars. The output of those should produce different cached results as well.

Am I misunderstanding the problem that this solves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are asking a good question, one that I already feared and I do not have a perfect answer. I do agree that this might break somehow somewhere. The reason why I need it sorted (at least for the cache key) is that I need a stable order.

If you look into FilterAllowed it iterates over GetAllAllowed() which is a map[string]string and as such at least the resulting filtered variable is not ordered. So as a compromise we can call the sort on filtered only to be stable there. I think this is partially the behavior that @vinman1 is seeing in #1910 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjasonh, @apollo13, instead of sorting the result list here which would affect all callers of the ReplacementEnvs function (for which there seems to be many), couldn't the sort be done in the populateCompositeKey function in pkg/executor/build.go just before line 184 when the result is used next?

That way, we know that the sorting will only be applied in the use cases of a cache key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that before, but I think just sorting filtered in here should be fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to look at how Docker itself uses FilterAllowed: https://github.com/moby/moby/blob/459d0dfbbb51fb2423a43655e6c62368ec0f36c9/builder/dockerfile/dispatchers.go#L363

prependEnvOnCmd also seems to sort the filtered envs:
https://github.com/moby/moby/blob/459d0dfbbb51fb2423a43655e6c62368ec0f36c9/builder/dockerfile/dispatchers.go#L436

If that's the case, I feel quite a bit more comfortable with this approach, since, whether it results in incorrect behavior or not, it matches Docker's behavior.

+1 to @vinman1's comment about moving this behavior to as narrow and isolated a place as possible, to avoid unexpected effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new commit moving the sort into populateCompositeKey

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 8, 2022

Is there any way we can test this, in such a way that before this change the test would fail?

I wish, I was hoping that you could tell me where similar tests can be found so I can dig around there. I have a few minimal Dockerfiles that show the issues; but not sure how to rewrite those to tests.

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 8, 2022

To provide a few examples; executing kaniko like this (without my patch):

docker run -v `pwd`:/workspace gcr.io/kaniko-project/executor:bde904349e9b7797d14f12c7a2657bf79145d18e-debug --dockerfile /workspace/Dockerfile --context dir:///workspace/ --destination 10.0.0.6:5000/test --cache=true --cache-copy-layers=true

with the following Dockerfile:

FROM alpine as build
ARG TEST=123
RUN echo hi

always outputs hi -> no caching of the run layer.

Running it with:

FROM alpine as build
ARG TEST=123
RUN echo $TESTNOTSET
ARG TESTNOTSET=ups

will echo ups when running with a cache enabled, even though $TESTNOTSET should not be set.

I'd love to somehow have those two files as test.

@imjasonh
Copy link
Collaborator

imjasonh commented Feb 9, 2022

I wish, I was hoping that you could tell me where similar tests can be found so I can dig around there. I have a few minimal Dockerfiles that show the issues; but not sure how to rewrite those to tests.

I'm afraid it's the blind leading the blind here, I'm not very familiar with all the workings of this project either 😕

In the absence of new CI tests (and assuming existing tests pass) I'm fine taking your word that this is the more correct behavior. If anybody is able to figure out where to add a test for this though I'll owe you a beer. 🍻

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 9, 2022

That is great, I promise to add tests once I know how / where :) Beer would be nice if we ever manage to meet in person someday somewhere.

@imjasonh imjasonh merged commit ef97636 into GoogleContainerTools:main Feb 9, 2022
@apollo13 apollo13 deleted the restore-build-args branch February 9, 2022 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants