Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dependencies between build artifacts - 1 #4828

Merged
merged 27 commits into from
Oct 9, 2020

Conversation

gsquared94
Copy link
Contributor

@gsquared94 gsquared94 commented Sep 28, 2020

Related: #4713
Part 1 of several PRs.

Implements sections of the design:

  • Config schema
  • Config validation
  • Build controller(only parity with existing code)
  • Build logs reporting

The current set of changes do not modify the output behavior.

Couldn't reopen the previous PR as I force pushed the branch.

@gsquared94 gsquared94 requested a review from a team as a code owner September 28, 2020 15:08
@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@gsquared94 gsquared94 changed the title Support build artifact dependencies - 1 dependencies between build artifacts - 1 Sep 28, 2020
@gsquared94 gsquared94 changed the title dependencies between build artifacts - 1 Support dependencies between build artifacts - 1 Sep 28, 2020
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@codecov
Copy link

codecov bot commented Sep 28, 2020

Codecov Report

Merging #4828 into master will increase coverage by 0.10%.
The diff coverage is 86.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4828      +/-   ##
==========================================
+ Coverage   71.85%   71.96%   +0.10%     
==========================================
  Files         356      357       +1     
  Lines       12215    12327     +112     
==========================================
+ Hits         8777     8871      +94     
- Misses       2786     2799      +13     
- Partials      652      657       +5     
Impacted Files Coverage Δ
pkg/skaffold/build/cluster/cluster.go 26.66% <0.00%> (ø)
pkg/skaffold/build/gcb/cloud_build.go 0.00% <0.00%> (ø)
pkg/skaffold/build/local/prune.go 73.49% <0.00%> (-2.76%) ⬇️
pkg/skaffold/schema/latest/config.go 100.00% <ø> (ø)
cmd/skaffold/app/cmd/dev.go 72.97% <20.00%> (-8.28%) ⬇️
cmd/skaffold/app/cmd/build.go 80.00% <44.44%> (-9.19%) ⬇️
pkg/skaffold/build/result.go 85.36% <85.36%> (ø)
pkg/skaffold/build/model.go 90.47% <90.47%> (ø)
pkg/skaffold/build/scheduler.go 94.23% <94.23%> (ø)
pkg/skaffold/build/local/local.go 60.41% <100.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68eab61...8ac4e67. Read the comment docs.

pkg/skaffold/build/model.go Outdated Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Outdated Show resolved Hide resolved
pkg/skaffold/build/model.go Outdated Show resolved Hide resolved
pkg/skaffold/build/model.go Outdated Show resolved Hide resolved
pkg/skaffold/build/model.go Outdated Show resolved Hide resolved
pkg/skaffold/build/ordered.go Outdated Show resolved Hide resolved
pkg/skaffold/build/ordered.go Outdated Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Outdated Show resolved Hide resolved
pkg/skaffold/schema/latest/config.go Outdated Show resolved Hide resolved
pkg/skaffold/schema/latest/config.go Outdated Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Outdated Show resolved Hide resolved
pkg/skaffold/build/ordered.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Few thoughts,

  1. Add a test case for requires to make sure this works or instructions for reviewers to check it out.
  2. I feel, the waitForDependencies should be called in a forever loop. Not sure how it is monitoring if all dependent artifacts are build.

pkg/skaffold/build/ordered.go Outdated Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Outdated Show resolved Hide resolved

func (c *scheduler) run(ctx context.Context, a *latest.Artifact, build func() (string, error), onSuccess func(tag string), onError func(err error)) {
dag := c.dagMap[a.ImageName]
err := dag.waitForDependencies(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should happen in a forever loop right until all dependent artifacts are build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal at InOrder function as I removed this class. Also my explanation below

@gsquared94
Copy link
Contributor Author

  1. Add a test case for requires to make sure this works or instructions for reviewers to check it out.

there are success and failure test cases in scheduler_test.go

  1. I feel, the waitForDependencies should be called in a forever loop. Not sure how it is monitoring if all dependent artifacts are build.

it tries reading from every channel that represents each of its dependencies. the single for loop over the dependencies slice will complete only when all of these channels are closed. So it doesn't need a forever loop.

gsquared94 and others added 2 commits October 7, 2020 00:34
Co-authored-by: Brian de Alwis <bsd@acm.org>
@gsquared94
Copy link
Contributor Author

IT added. PTAL @briandealwis, @tejal29

pkg/skaffold/build/model.go Outdated Show resolved Hide resolved
pkg/skaffold/build/model.go Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Outdated Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Outdated Show resolved Hide resolved
pkg/skaffold/build/scheduler.go Show resolved Hide resolved
}

func (l *logAggregatorImpl) GetWriter() (io.WriteCloser, error) {
if err := l.checkCapacity(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition would never happen as channel capacity is size of artifacts. Can we drop this condition to make things simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep it. I think that logAggregator can be reused somewhere else if we have a need for concurrent runs for something. So I wanted to keep it safe, otherwise there won't be any error messages if writer requests exceed the buffer capacity and the code would deadlock.

pkg/skaffold/build/result.go Show resolved Hide resolved
@tejal29
Copy link
Contributor

tejal29 commented Oct 7, 2020

IT added. PTAL @briandealwis, @tejal29

Looks really great. Some nit changes to get rid of if-else branches.

Looking forward to try this out now

@tejal29
Copy link
Contributor

tejal29 commented Oct 7, 2020

testing notes:
Example dependency graph

 image 3 -> image2  -> image1
  image 4 
  1. checkout & make
hub pr checkout 4828
make
cd testdata/integration/build-dependencies
  1. run normal case
time ../../../out/skaffold build -d gcr.io/tejal-test

Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Checking cache...
 - image1: Not found. Building
 - image2: Not found. Building
 - image3: Not found. Building
 - image4: Not found. Building
<Followed by logs for docker build.>
12.74s user 7.35s system 30% cpu 1:06.72 total

Observations:

  • No information on if build dependency graph is being used and builds are running in parallel.
  • No indication if they are running concurrently.
    Can we have something like Building 4 artifacts in parallel
  • I can infer from the logs, build3 got build first. However i did have to scroll up.
  • sequence is correct: image 3, image 4, image 2, image 1
  1. Run with concurrency 3.
time ../../../out/skaffold build -d gcr.io/tejal-test -p concurrency-3 --cache-artifacts=false

13.07s user 7.83s system 43% cpu 47.798 total

Observations:

  • Definitely saw the build completed faster.
  • Sequence is correct: image 3, image 4, image 2, image 1
  1. Run with a build failure in image 4 which should cancel dependent artifacts image2 and image 3 build as well as already scheduled image3.
time ../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p concurrency-3
Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Building [image3]...

Building [image4]...
Sending build context to Docker daemon 

docker build: error during connect: Post "https://127.0.0.1:32774/v1.40/build?buildargs=%7B%22FAIL%22%3A%220%22%2C%22SLEEP%22%3A%229%22%7D&cachefrom=null&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&forcerm=1&labels=null&memory=0&memswap=0&networkmode=&nocache=1&rm=0&shmsize=0&t=gcr.io%2Ftejal-test%2Fimage4%3Alatest&target=&ulimits=null&version=": creating docker context: getting relative tar paths: file pattern [foo1] must match at least one file

6.09s user 3.75s system 152% cpu 6.443 total

Observations:

  • All builds were stopped and the failure for corresponding artifact is seen.
  • There is a line "Building [image 3]" artifact which did not complete due to error in image 4. Instead of closing the channel for in progress builds, should we first write something similar to
    " cancelling due to one or more builds failed"
Building [image3]...
Build cancelled due to other failures seen in other artifacts
Building [image4]...
Sending build context to Docker daemon 

docker build: error during connect: Post "https://127.0.0.1:32774/v1.40/build?buildargs=%7B%22FAIL%22%3A%220%22%2C%22SLEEP%22%3A%229%22%7D&cachefrom=null&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&forcerm=1&labels=null&memory=0&memswap=0&networkmode=&nocache=1&rm=0&shmsize=0&t=gcr.io%2Ftejal-test%2Fimage4%3Alatest&target=&ulimits=null&version=": creating docker context: getting relative tar paths: file pattern [foo1] must match at least one file
  1. Run with a build failure in image 1.
time ../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p concurrency-3
Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Building [image4]...
Sending build context to Docker daemon   2.56kB
...
Building [image3]...
Sending build context to Docker daemon   2.56kB
....
Successfully tagged gcr.io/tejal-test/image3:latest

Building [image2]...
...
Building [image1]...
Sending build context to Docker daemon 

docker build: error during connect: Post "https://127.0.0.1:32774/v1.40/build?buildargs=%7B%22FAIL%22%3A%220%22%2C%22SLEEP%22%3A%221%22%7D&cachefrom=null&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&forcerm=1&labels=null&memory=0&memswap=0&networkmode=&nocache=1&rm=0&shmsize=0&t=gcr.io%2Ftejal-test%2Fimage1%3Alatest&target=&ulimits=null&version=": creating docker context: getting relative tar paths: file pattern [foo1] must match at least one file
../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p   16.34s user 4.53s system 46% cpu 44.447 total

Observations:

  • Looks good. you see build logs for completed builds.
  1. Hitting control C:
../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p concurrency-3
Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Building [image3]...
^CTraceback (most recent call last):
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 104, in <module>
Traceback (most recent call last):
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 104, in <module>
    main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 76, in main
    main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 76, in main
    gcloud_main = _import_gcloud_main()
    gcloud_main = _import_gcloud_main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 56, in _import_gcloud_main
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 56, in _import_gcloud_main
    import googlecloudsdk.gcloud_main
    import googlecloudsdk.gcloud_main
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/gcloud_main.py", line 33, in <module>
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/gcloud_main.py", line 33, in <module>
    from googlecloudsdk.api_lib.iamcredentials import util as iamcred_util
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/iamcredentials/util.py", line 25, in <module>
    from googlecloudsdk.api_lib.iamcredentials import util as iamcred_util
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/iamcredentials/util.py", line 25, in <module>
    from googlecloudsdk.api_lib.util import apis_internal
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/util/apis_internal.py", line 28, in <module>

Building [image4]...
    from googlecloudsdk.api_lib.util import apis_internal
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/util/apis_internal.py", line 28, in <module>
    from googlecloudsdk.core import properties
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/properties.py", line 28, in <module>

    from googlecloudsdk.core import config
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/config.py", line 29, in <module>
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image1%22%3Atrue%7D%7D": context canceled 
    from googlecloudsdk.core.util import files as file_utils
    from googlecloudsdk.core import properties
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/util/files.py", line 28, in <module>
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/properties.py", line 28, in <module>
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image2%22%3Atrue%7D%7D": context canceled 
    import shutil
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/shutil.py", line 10, in <module>
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image3%22%3Atrue%7D%7D": context canceled 
    from googlecloudsdk.core import config
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/config.py", line 29, in <module>
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image4%22%3Atrue%7D%7D": context canceled 
    import fnmatch
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
    from googlecloudsdk.core.util import files as file_utils
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/util/files.py", line 28, in <module>
    import shutil
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/shutil.py", line 22, in <module>
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load_unlocked
    import bz2
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 914, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 779, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1342, in find_spec
  File "<frozen importlib._bootstrap_external>", line 911, in get_code
  File "<frozen importlib._bootstrap_external>", line 580, in _compile_bytecode
  File "<frozen importlib._bootstrap_external>", line 1314, in _get_spec
KeyboardInterrupt
  File "<frozen importlib._bootstrap_external>", line 1439, in find_spec
  File "<frozen importlib._bootstrap_external>", line 87, in _path_stat
KeyboardInterrupt

Observations:
1.Same error trace for multiple builds. In this case, both image3 and image4 show the same stack trace. Would it be a big change to just print "Build interrupted / cancelled"
2. This error log is confusing. These warning messages indicate,

WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image3%22%3Atrue%7D%7D": context canceled 
    from googlecloudsdk.core import config
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/config.py", line 29, in <module>
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image4%22%3Atrue%7D%7D": context canceled 

We are still trying to list images from docker daemon. This should have been aborted.

On master, i dont see these (ofcourse with sequential builds)

➜  microservices git:(fix-4713) ✗ skaffold build -d gcr.io/tejal-test --cache-artifacts=false
Generating tags...
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:v1.15.0-26-gb1f9e2de2
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:v1.15.0-26-gb1f9e2de2
Building [leeroy-web]...
^CTraceback (most recent call last):
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 104, in <module>
    main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 76, in main
    gcloud_main = _import_gcloud_main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 56, in _import_gcloud_main
    import googlecloudsdk.gcloud_main
There is a new version (1.15.0) of Skaffold available. Download it from:
  https://github.com/GoogleContainerTools/skaffold/releases/tag/v1.15.0

  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/gcloud_main.py", line 33, in <module>
    from googlecloudsdk.api_lib.iamcredentials import util as iamcred_util
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/iamcredentials/util.py", line 31, in <module>
    from googlecloudsdk.core.credentials import transports
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/credentials/transports.py", line 25, in <module>
    from googlecloudsdk.core.credentials import requests
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/credentials/requests.py", line 24, in <module>
    from googlecloudsdk.core import requests
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/requests.py", line 31, in <module>
    import requests
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/third_party/requests/__init__.py", line 114, in <module>
    from .models import Request, Response, PreparedRequest
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/third_party/requests/models.py", line 16, in <module>
    import encodings.idna
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/encodings/idna.py", line 3, in <module>
    import stringprep, re, codecs
  File "<frozen importlib._bootstrap>", line 988, in _find_and_load
  File "<frozen importlib._bootstrap>", line 148, in __enter__
  File "<frozen importlib._bootstrap>", line 176, in _get_module_lock
KeyboardInterrupt
➜  microservices git:(fix-4713) ✗ 

@gsquared94
Copy link
Contributor Author

run normal case

  • No indication if they are running concurrently.
    Can we have something like Building 4 artifacts in parallel

Added message:

	if concurrency > 1 {
		color.Default.Fprintf(out, "Building %d artifacts in parallel\n", concurrency)
	}

Run with a build failure in image 4 which should cancel dependent artifacts image2 and image 3 build as well as already scheduled image3

  • There is a line "Building [image 3]" artifact which did not complete due to error in image 4. Instead of closing the channel for in progress builds, should we first write something similar to
    " cancelling due to one or more builds failed"

It shouldn't output partial logs for any build actually. If it starts outputting logs for image 3 it'll wait until all logs are printed. So the actual builder needs to handle context cancellation to output such a message.

Hitting control C:
Observations:
1.Same error trace for multiple builds. In this case, both image3 and image4 show the same stack trace. Would it be a big change to just print "Build interrupted / cancelled"

We do allow redirecting build logs into files so I think it's better to show the full stack trace for both failures. The actual stack trace seems to be a failure of the individual builders to handle context cancellation properly, I can add a bug for that but don't want to complicate this PR with these fixes.

  1. This error log is confusing. These warning messages indicate,
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image3%22%3Atrue%7D%7D": context canceled 
    from googlecloudsdk.core import config
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/core/config.py", line 29, in <module>
WARN[0001] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image4%22%3Atrue%7D%7D": context canceled 

We are still trying to list images from docker daemon. This should have been aborted.

On master, i dont see these (ofcourse with sequential builds)

This was because image pruner wasn't handling context cancellation correctly. I've made this one fix. 🙂

@gsquared94
Copy link
Contributor Author

thanks @tejal29 for such detailed test notes. 👏🏽

@tejal29
Copy link
Contributor

tejal29 commented Oct 8, 2020

Added message:

	if concurrency > 1 {
		color.Default.Fprintf(out, "Building %d artifacts in parallel\n", concurrency)
	}

sounds good.

Hitting control C:
Observations:
1.Same error trace for multiple builds. In this case, both image3 and image4 show the same stack trace. Would it be a big change to just print "Build interrupted / cancelled"

We do allow redirecting build logs into files so I think it's better to show the full stack trace for both failures. The actual stack trace seems to be a failure of the individual builders to handle context cancellation properly, I can add a bug for that but don't want to complicate this PR with these fixes.

sounds good.

This was because image pruner wasn't handling context cancellation correctly. I've made this one fix. 🙂

+1

Copy link
Member

@briandealwis briandealwis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now always running builders with their output going to a pipe, so we loose colour in the build logs (since it's not os.Stdout, a TTY). If we're in concurrency=1, it would be better to send the provider writer down and skip the logAggregator.

When running with concurrency=0, it seems an image is chosen at random to follow live. Might be better to just hide the build logs and instead provide feedback on the builds as they happen and finish. This can be done as a separate PR.

integration/build_dependencies_test.go Outdated Show resolved Hide resolved
integration/build_dependencies_test.go Outdated Show resolved Hide resolved
}

// InOrder builds a list of artifacts in dependency order.
func InOrder(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact, artifactBuilder ArtifactBuilder, concurrency int) ([]Artifact, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: colour output: out is normally os.Stdout here.

@gsquared94 gsquared94 requested a review from tejal29 October 9, 2020 14:01
@tejal29
Copy link
Contributor

tejal29 commented Oct 9, 2020

Another round.

  1. skaffold dev
../../../out/skaffold dev -d gcr.io/tejal-test
defining dependencies between artifacts is not yet supported for `skaffold dev` and `skaffold debug`
  1. skaffold build with cache on
../../../out/skaffold build -d gcr.io/tejal-test
defining dependencies between artifacts is not yet supported for `skaffold build` with cache enabled. Run with `--cache-artifacts=false` flag
➜  build-dependencies git:(fix-4713) ✗ 
  1. skaffold build with cache off and concurrency on.
../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p=concurrency-3
Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Building 3 artifacts in parallel
Building [image4]...

  1. Hit control-C
➜  build-dependencies git:(fix-4713) ✗ ../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p=concurrency-3
Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Building 3 artifacts in parallel
Building [image4]...
^CTraceback (most recent call last):
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 104, in <module>
Traceback (most recent call last):
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 104, in <module>
    main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 76, in main
    main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 76, in main
    gcloud_main = _import_gcloud_main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 56, in _import_gcloud_main
    gcloud_main = _import_gcloud_main()
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/gcloud.py", line 56, in _import_gcloud_main
    import googlecloudsdk.gcloud_main
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/gcloud_main.py", line 33, in <module>
    import googlecloudsdk.gcloud_main
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/gcloud_main.py", line 33, in <module>

Building [image3]...

WARN[0003] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image1%22%3Atrue%7D%7D": context canceled 
WARN[0003] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image2%22%3Atrue%7D%7D": context canceled 
    from googlecloudsdk.api_lib.iamcredentials import util as iamcred_util
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/iamcredentials/util.py", line 31, in <module>
    from googlecloudsdk.api_lib.iamcredentials import util as iamcred_util
.....
WARN[0003] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image3%22%3Atrue%7D%7D": context canceled 
  File "/Users/tejaldesai/Downloads/google-cloud-sdk2/lib/googlecloudsdk/api_lib/iamcredentials/util.py", line 31, in <module>
WARN[0003] failed to list images: error during connect: Get "https://127.0.0.1:32774/v1.40/images/json?filters=%7B%22reference%22%3A%7B%22image4%22%3Atrue%7D%7D": context canceled
➜  build-dependencies git:(fix-4713) ✗ 

I still see the warning when pruning. I tried using errors.Is(err, context.Canceled to make sure all errors in the get unwrapped and compared. However it is not working.
Using this diff,

+++ b/pkg/skaffold/build/local/prune.go
@@ -19,6 +19,7 @@ package local
 import (
        "context"
        "sort"
+        "strings"
        "sync"
        "time"
 
@@ -158,8 +159,7 @@ func (p *pruner) collectImagesToPrune(ctx context.Context, artifacts []*latest.A
 
                imgs, err := p.listImages(ctx, a.ImageName)
                if err != nil {
-                       switch err {
-                       case context.Canceled, context.DeadlineExceeded:
+                       if strings.Contains(err.Error(), "context canceled") {
                                return nil
                        }
                        logrus.Warnf("failed to list images: %v", err)

I see the following:

➜  build-dependencies git:(fix-4713) ✗ ../../../out/skaffold build -d gcr.io/tejal-test --cache-artifacts=false -p=concurrency-3
Generating tags...
 - image1 -> gcr.io/tejal-test/image1:latest
 - image2 -> gcr.io/tejal-test/image2:latest
 - image3 -> gcr.io/tejal-test/image3:latest
 - image4 -> gcr.io/tejal-test/image4:latest
Building 3 artifacts in parallel
Building [image3]...
^C
Building [image4]...

➜  build-dependencies git:(fix-4713) ✗ 

Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one round of fixing the Warnf message on pruner.

cmd/skaffold/app/cmd/build.go Outdated Show resolved Hide resolved
integration/build_dependencies_test.go Show resolved Hide resolved
cmd/skaffold/app/cmd/dev.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge after creating issues for todos and adding them in the TODO comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants