ci: Don't use Travis caches for docker images #49284

alexcrichton · 2018-03-22T21:32:27Z

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

Caches are only updated for successful builds, meaning that if a build times
out or fails in a different location the sucessfully-created docker images
isn't always cached. While this makes sense as a general rule of caches it
hurts our use cases.
Caches are per-branch and builder which means that we don't have a separate
cache on each release channel. All our merges go through the auto branch
which means that they're all sharing the same cache, even those for merging to
master/beta. This means that PRs which switch between master/beta will keep
rebuilting and having cache misses.
Caches have historically been invaliated somewhat regularly a little more
aggressively than we'd want (I think).
We don't always need to update the contents of the cache if the Docker image
didn't change at all, and saving off the docker layers can sometimes be quite
expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the src/ci/docker/run.sh script to
download an image based on a shasum of the Dockerfile and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.

kennytm · 2018-03-22T22:10:32Z

Oh well.

[00:04:41] tidy error: /checkout/src/ci/docker/run.sh:31: line longer than 100 chars
[00:04:42] some tidy checks failed

Let's do two bors try after tidy is fixed. The first try will likely fail, but if everything goes smoothly, the second try will be successful.

alexcrichton · 2018-03-22T22:43:33Z

@bors: try

bors · 2018-03-22T22:43:42Z

⌛ Trying commit f5df04c with merge 28101e3...

ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.

bors · 2018-03-23T01:28:47Z

☀️ Test successful - status-travis
State: approved= try=True

This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating rust-lang#49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.

alexcrichton · 2018-03-23T01:34:25Z

@bors: try

Both previous builds succeeded and stored their cache globally

bors · 2018-03-23T01:34:36Z

⌛ Trying commit a09e9e9 with merge 3802c45...

ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.

bors · 2018-03-23T01:39:24Z

💔 Test failed - status-travis

alexcrichton · 2018-03-23T01:41:23Z

Looks like both builds had a cache hit:

One took about 2 minutes to download/load into docker and the other took about a minute to download/load into docker. This seems to be comparable with the last successful build which also took about a minute loading into docker and ~20s doing cache things. We'd save ~2 minutes at the end of the build however by avoiding docker history and storing things in the cache.

kennytm · 2018-03-23T02:20:44Z

@bors r+ p=1

(p=1 to avoid rollup)

bors · 2018-03-23T02:20:45Z

📌 Commit a09e9e9 has been approved by kennytm

alexcrichton · 2018-03-23T14:15:14Z

@bors: p=0

I think it's fine to not land this immediately, we've got a ton of other fixes in the queue

…ennytm ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating rust-lang#49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.

ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.

bors · 2018-03-25T02:43:07Z

⌛ Testing commit a09e9e9 with merge e5bf042...

bors · 2018-03-25T05:26:51Z

☀️ Test successful - status-appveyor, status-travis
Approved by: kennytm
Pushing e5bf042 to master...

alexcrichton requested a review from kennytm March 22, 2018 21:32

alexcrichton force-pushed the use-our-own-cache branch 3 times, most recently from 368aad3 to 739afa9 Compare March 22, 2018 21:58

alexcrichton force-pushed the use-our-own-cache branch from 739afa9 to f5df04c Compare March 22, 2018 22:43

alexcrichton force-pushed the use-our-own-cache branch from f5df04c to a09e9e9 Compare March 23, 2018 01:31

bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 23, 2018

kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 23, 2018

kennytm self-assigned this Mar 23, 2018

kennytm removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 23, 2018

kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 23, 2018

kennytm approved these changes Mar 23, 2018

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2018

alexcrichton mentioned this pull request Mar 23, 2018

Rollup of 20 pull requests #49317

Closed

bors merged commit a09e9e9 into rust-lang:master Mar 25, 2018

alexcrichton deleted the use-our-own-cache branch March 25, 2018 07:42

steffengy mentioned this pull request Apr 4, 2018

CI: We should be able to complete a build in 3 hours even without Docker image cache #49278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Don't use Travis caches for docker images #49284

ci: Don't use Travis caches for docker images #49284

alexcrichton commented Mar 22, 2018

kennytm commented Mar 22, 2018

alexcrichton commented Mar 22, 2018

bors commented Mar 22, 2018

bors commented Mar 23, 2018

alexcrichton commented Mar 23, 2018

bors commented Mar 23, 2018

bors commented Mar 23, 2018

alexcrichton commented Mar 23, 2018

kennytm commented Mar 23, 2018

bors commented Mar 23, 2018

alexcrichton commented Mar 23, 2018

bors commented Mar 25, 2018

bors commented Mar 25, 2018

ci: Don't use Travis caches for docker images #49284

ci: Don't use Travis caches for docker images #49284

Conversation

alexcrichton commented Mar 22, 2018

kennytm commented Mar 22, 2018

alexcrichton commented Mar 22, 2018

bors commented Mar 22, 2018

bors commented Mar 23, 2018

alexcrichton commented Mar 23, 2018

bors commented Mar 23, 2018

bors commented Mar 23, 2018

alexcrichton commented Mar 23, 2018

kennytm commented Mar 23, 2018

bors commented Mar 23, 2018

alexcrichton commented Mar 23, 2018

bors commented Mar 25, 2018

bors commented Mar 25, 2018