Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't recreate cached layers #715

Closed

Conversation

eyusupov
Copy link

This caching optimization reuses existing layer from a cache image for a RUN command instead of unpacking it and recreating.
It also helps with multi-stage build caching issues like this #575. The issue is that when recreating cached layers, some attributes like timestamps are not recreated and the digest for the new first stage image was different, breaking caching for subsequent stages.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@eyusupov eyusupov force-pushed the improve-caching branch 2 times, most recently from 09538bc to b64f234 Compare July 16, 2019 17:02
@eyusupov
Copy link
Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@eyusupov eyusupov force-pushed the improve-caching branch 5 times, most recently from ff453ac to 463c64c Compare July 17, 2019 11:37
pkg/commands/workdir.go Outdated Show resolved Hide resolved
@eyusupov eyusupov force-pushed the improve-caching branch 5 times, most recently from 0ec0374 to 89bba94 Compare July 29, 2019 18:52
@eyusupov
Copy link
Author

There is an integration test failing for CMD command due to ArgsEscaped flag in the metadata having incorrect value. It doesn't seem related to my changes and it seems to be failing on master locally for me too. Am I missing something or is it failing for others too?
Here's the output I'm getting on master (I commented out all tests other than Dockerfile_test_cmd fwiw)

--- FAIL: TestRun (0.00s)
    --- FAIL: TestRun/test_Dockerfile_test_cmd (0.92s)
        integration_test.go:224: diff = [
              {
                "Image1": "172.17.0.2:5000/docker-dockerfile_test_cmd",
                "Image2": "172.17.0.2:5000/kaniko-dockerfile_test_cmd",
                "DiffType": "File",
                "Diff": {
                  "Adds": null,
                  "Adds": null,
                  "Dels": null,
                  "Mods": null
                }
              },
              {
                "Image1": "172.17.0.2:5000/docker-dockerfile_test_cmd",
                "Image2": "172.17.0.2:5000/kaniko-dockerfile_test_cmd",
                "DiffType": "Metadata",
                "Diff": {
                  "Adds": [
                    "ArgsEscaped: true"
                  ],
                  "Dels": [
                    "ArgsEscaped: false"
                  ]
                }
              }
            ]
        integration_test.go:227: json: cannot unmarshal string into Go struct field .Adds of type integration.fileDiff
        integration_test.go:227: []integration.diffOutput differ (-got, +want): {[]integration.diffOutput}[1].Diff.Adds[0->?]:
                -: integration.fileDiff{}
                +: <non-existent>
            {[]integration.diffOutput}[1].Diff.Dels[0->?]:
                -: integration.fileDiff{}
                +: <non-existent>

@tejal29
Copy link
Member

tejal29 commented Aug 9, 2019

@eyusupov Thanks for the PR. Can you please add some tests?

@eyusupov
Copy link
Author

Will do soon!

This caching optimization reuses existing layer from a cache image for a
RUN command instead of unpacking it and recreating.  It also helps with
multi-stage build caching issues - when recreating cached layers, some
attributes like timestamps are not recreated and the digest for the new
first stage image was different, breaking caching for subsequent stages.
Defer FS state setup until the first cache miss.  Cached commands are
now uniformly handled by extracting the layer from the cached image.
This allows skipping layers unpacking uniformly when they are not
required for the stage build.
It needs unpacked FS to check if the workdir exists.
Also it can break caching keys if the workdir did not exist.
@tejal29
Copy link
Member

tejal29 commented Sep 27, 2019

@eyusupov This looks great! Please add tests and we will try to get this in!

@eyusupov
Copy link
Author

eyusupov commented Sep 27, 2019 via email

@strax
Copy link

strax commented Nov 7, 2019

@eyusupov Is there anything I could do to help with this? We too are affected by #575 and it would be great to get these changes shipped.

@raijinsetsu
Copy link

This is huge for my team. The step of creating the new archive from exactly the same content as the prior archive is causing a significant slow-down in our build pipeline due to the number of files (this is a LARGE NodeJS project).

@cvgw
Copy link
Contributor

cvgw commented Feb 7, 2020

I'm going to close this. #882 will be merged in favor

@cvgw cvgw closed this Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants