Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache docker.getDependencies and skip inspecting remote images with old manifest #4896

Merged
merged 7 commits into from
Nov 10, 2020

Conversation

tejal29
Copy link
Contributor

@tejal29 tejal29 commented Oct 10, 2020

Unblock #3367 by

  • warning users any files mentioned in ONBULD [COPY/ADD] triggers defined in the base image will not be watched.
  • continue processing local dependencies for COPY/ADD command.

In this PR, for iterative watch mode (dev, debug), show a warning to user if they depend on an Image pushed with older v1 Manifest.
In watch mode,
Skaffold relies on go-containerregistry to retrieve a remote image to check ifONBUILD triggers exists and parse onbuild copy/add commands.
However, go-containerregistry does not support retrieving image with old manifests and throw an unsupported media type error reported by the user in #3367 ( see google/go-containerregistry#377) for more context.
Note: docker pull can pull the older v1 manifests images and hence deploy as well build could succeed if ONBUILD triggers are configured to just run executables in base image

In this PR,

  1. show a warning to user when we encounter an error retrieving images with old manifests.
  2. Provide them with an actionable item to quit the current watch mode. Run docker pull to retrieve the image locally and re-run
WARN[0002] Could not retrieve image retrieving image "library/ruby:2.3.0" pushed with the deprecated manifest v1. Ignoring files dependencies for all ONBUILD triggers. To avoid, hit Cntrl-C and run `docker pull` to fetch the specified image and retry. 
  1. Proceed with registering local file dependencies.

In non-watch modes like build, deploy and run we don't show the warning to user.

Impact of skipping and continuing with the skaffold build VS Failing

In build, deploy and run, skaffold calls GetDependencies to create a tar context which makes a call to parseOnBUILD triggers.
There is a chance, that file dependencies in a base image OnBUILD triggers might fail the build.
e.g. Consider a base image foo pushed with v1 manifest

ONBUILD COPY foo ./src

Now, the above base image defines a ONBUILD trigger to copy a file foo present in derived artifacts workspace to src/
Consider, the project workspace where foo base image is used is as follows.

|  -foo
| - Dockerfile
|- src/
   - bar

The Dockerfile contents are

From foo
COPY src/* src/

In this case, because skaffold skipped parsing the ONBUILD COPY command, file foo will not be watched or included in the context.tar
Running skaffold build for this setup will fail,

Building [leeroy-web]...
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM golang:1.12.9-alpine3.10 as builder
 ---> e0d646523991
Step 2/3: FROM gcr.io/tejal-test/foo:latest
latest: Pulling from tejal-test/foo
Digest: sha256:0804397ce1bc7aaaad11dc3a4914601975b1367b36ea199b6c81f94f1ed70320
Status: Downloaded newer image for gcr.io/tejal-test/foo:latest
# Executing 1 build trigger
unable to stream build output: COPY failed: stat /var/lib/docker/tmp/docker-builder006024094/foo: no such file or directory
➜  microservices git:(fix_onbuild_parsing) ✗

So the impact of this change is, we are now failing later (attempting to perform a build) than fast. However, by failing fast, we are also failing for cases where these ONBUILD triggers just RUN executables in base image vs COPY a file from derived artifact's workspace.

If we do well on translating this build error to give users more help e.g.

unable to stream build output: COPY failed: stat /var/lib/docker/tmp/docker-builder006024094/foo: no such file or directory
Skaffold encountered a warning earlier while retrieving `ONBUILD` triggers for " gcr.io/tejal-test/foo:latest"  pushed with older manifest.  Run `docker pull gcr.io/tejal-test/foo:latest` and try again. 

Skaffold build, after a pull is successful.

microservices git:(fix_onbuild_parsing) ✗ docker pull gcr.io/tejal-test/foo && ../../out/skaffold build
Successfully tagged leeroy-web:v1.15.0-31-g80ad6813b-dirty

Future Work

  1. Keep track of such warning and probably at end of command failure, show these warning.

Testing this out

Trying this out.

  1. Modify any examples to use this image FROM library/ruby:2.3.0 in their Dockerfile

  2. Run skaffold dev

../../out/skaffold dev
Listing files to watch...
 - skaffold-example
WARN[0002] Could not retrieve image library/ruby:2.3.0 pushed with the deprecated manifest v1. Ignoring files dependencies for all ONBUILD triggers. To avoid, hit Cntrl-C and run `docker pull` to fetch the specified image and retry. 
Generating tags...
 - skaffold-example -> skaffold-example:v1.16.0-16-g222a0cf0c-dirty
Checking cache...
 - skaffold-example: Not found. Building
Found [minikube] context, using local docker daemon.
Building [skaffold-example]...
Sending build context to Docker daemon  3.072kB
Step 1/9 : FROM golang:1.12.9-alpine3.10 as builder
...
Step 5/9 : FROM library/ruby:2.3.0
....
Starting deploy...
 - pod/getting-started created
Waiting for deployments to stabilize...
Deployments stabilized in 39.550276ms
Press Ctrl+C to exit

Generating tags...
 - skaffold-example -> skaffold-example:v1.16.0-16-g222a0cf0c-dirty
Checking cache...
 - skaffold-example: Not found. Building
Found [minikube] context, using local docker daemon.
Building [skaffold-example]...
Sending build context to Docker daemon  3.072kB
Step 1/9 : FROM golang:1.12.9-alpine3.10 as builder
....
Successfully built d5fe1c8d506a
Successfully tagged skaffold-example:v1.16.0-16-g222a0cf0c-dirty
Tags used in deployment:
 - skaffold-example -> skaffold-example:d5fe1c8d506a4e643f7d3c0c0b08b11135452749a2d92f25da5868c3f4c66eec
Starting deploy...


For skaffold build: no warning is shown. In my case, the build passes as ONBUILD trigger for base image ruby just runs a command.

Note:

  1. You see a warning indicating there was error fetching an image with older manifest
  2. Caching docker.getDependencies only prints the warning once for a dev loop.

@tejal29 tejal29 requested a review from a team as a code owner October 10, 2020 03:06
@tejal29 tejal29 requested a review from ilya-zuyev October 10, 2020 03:06
@google-cla google-cla bot added the cla: yes label Oct 10, 2020
@nkubala
Copy link
Contributor

nkubala commented Oct 21, 2020

I don't think we should be throwing the same error three times, even if it's only on the first cycle of the dev loop - our output is already getting cluttered enough as it is.

also, this error doesn't seem particularly useful...what should a user do when they see this? if skaffold is resolving this itself by pulling the image, it seems to me like we should knock the severity of this log message to INFO or DEBUG, and restart the dev loop to see if the issue has fixed itself.

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 21, 2020

also, this error doesn't seem particularly useful...what should a user do when they see this? if skaffold is resolving this itself by pulling the image, it seems to me like we should knock the severity of this log message to INFO or DEBUG, and restart the dev loop to see if the issue has fixed itself.

skaffold does nothing to resolve it. skaffold dev first iteration currently fails.

It self-recovers because when application is deployed, the images with old manifest will be retrieved.

Another approach we could do is

  1. When we see an image with old manifest, explicitly pull the docker image using docker pull command and retry computing file dependencies.
  2. Warn the user but let them know its ok and skaffold will recover itself in the next iteration.
    e.g. change current log message
WARN[0002] could not retrieve ONBUILD image library/ruby:2.3.0. Will ignore files dependencies for all ONBUILD triggers 

to

WARN[0002] could not retrieve ONBUILD image library/ruby:2.3.0. Will ignore files dependencies for all ONBUILD triggers. These files will be watched in the next dev iteration. 

And follow up with a code change to re-compute build dependencies for artifact.

WDYT?

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 21, 2020

I don't think we should be throwing the same error three times, even if it's only on the first cycle of the dev loop - our output is already getting cluttered enough as it is.

yeah it sucks. Do you suggest we refactor the code and cache the results of GetDependencies so we don't recompute the same thing thrice?

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 22, 2020

And follow up with a code change to re-compute build dependencies for artifact.

We covered this in today's standup. Currently, we don't re-compute build dependencies, sync dependencies at end of dev loop. In case user add a new file which matches sync rules or is part of build, it never gets watched.

@tejal29 tejal29 force-pushed the fix_onbuild_parsing branch from c23d49a to 80ad681 Compare October 22, 2020 18:39
@codecov
Copy link

codecov bot commented Oct 22, 2020

Codecov Report

Merging #4896 (00557b0) into master (0bbcd9a) will increase coverage by 0.01%.
The diff coverage is 84.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4896      +/-   ##
==========================================
+ Coverage   72.15%   72.17%   +0.01%     
==========================================
  Files         363      363              
  Lines       12777    12811      +34     
==========================================
+ Hits         9219     9246      +27     
- Misses       2874     2877       +3     
- Partials      684      688       +4     
Impacted Files Coverage Δ
pkg/skaffold/docker/parse.go 84.06% <33.33%> (-1.41%) ⬇️
pkg/skaffold/docker/dependencies.go 75.00% <81.25%> (+2.69%) ⬆️
pkg/skaffold/errors/err_map.go 90.56% <90.19%> (+3.38%) ⬆️
pkg/skaffold/errors/errors.go 97.36% <100.00%> (+0.49%) ⬆️
pkg/skaffold/docker/image.go 80.61% <0.00%> (-1.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bbcd9a...00557b0. Read the comment docs.

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 22, 2020

TODO: Don't merge. I am seeing some race conditions.

@tejal29 tejal29 force-pushed the fix_onbuild_parsing branch from 80ad681 to d8d3b56 Compare November 3, 2020 19:12
@tejal29 tejal29 changed the title [skaffold dev] Warn and continue when ONBUILD image could not be retrieved Cache docker.getDependencies and skip inspecting remote docker images if they can Nov 3, 2020
@tejal29 tejal29 changed the title Cache docker.getDependencies and skip inspecting remote docker images if they can Cache docker.getDependencies and skip inspecting remote images with old manifest Nov 3, 2020
@tejal29 tejal29 force-pushed the fix_onbuild_parsing branch from cbe3ee2 to 222a0cf Compare November 3, 2020 22:33
@@ -42,30 +54,48 @@ func NormalizeDockerfilePath(context, dockerfile string) (string, error) {
return filepath.Abs(rel)
}

// GetDependencies finds the sources dependencies for the given docker artifact.
// GetDependencies finds the sources dependency for the given docker artifact.
// All paths are relative to the workspace.
func GetDependencies(ctx context.Context, workspace string, dockerfilePath string, buildArgs map[string]*string, cfg Config) ([]string, error) {
Copy link
Contributor

@gsquared94 gsquared94 Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is doing exactly what SyncStore provides should we use that instead of adding new package dependencies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or modify the SyncStore implementation to this since this looks to be using fewer underlying constructs.

Copy link
Contributor

@gsquared94 gsquared94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but maybe we should change the implementation of SyncStore to the way you did with a single singleflight.Group{} and sync.Map and use that struct. Would help the next person wanting something similar.

@tejal29 tejal29 force-pushed the fix_onbuild_parsing branch from 222a0cf to 00557b0 Compare November 6, 2020 19:53
@tejal29
Copy link
Contributor Author

tejal29 commented Nov 6, 2020

LGTM, but maybe we should change the implementation of SyncStore to the way you did with a single singleflight.Group{} and sync.Map and use that struct. Would help the next person wanting something similar.

I started making making changes to SyncStore and they did end up being a lot more than i thought.
will address those in a follow up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants