-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Don't use Travis caches for docker images #49284
Conversation
368aad3
to
739afa9
Compare
Oh well.
Let's do two bors try after tidy is fixed. The first try will likely fail, but if everything goes smoothly, the second try will be successful. |
739afa9
to
f5df04c
Compare
@bors: try |
ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.
☀️ Test successful - status-travis |
This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating rust-lang#49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.
f5df04c
to
a09e9e9
Compare
@bors: try Both previous builds succeeded and stored their cache globally |
ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.
💔 Test failed - status-travis |
Looks like both builds had a cache hit:
One took about 2 minutes to download/load into docker and the other took about a minute to download/load into docker. This seems to be comparable with the last successful build which also took about a minute loading into docker and ~20s doing cache things. We'd save ~2 minutes at the end of the build however by avoiding |
@bors r+ p=1 (p=1 to avoid rollup) |
📌 Commit a09e9e9 has been approved by |
@bors: p=0 I think it's fine to not land this immediately, we've got a ton of other fixes in the queue |
…ennytm ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating rust-lang#49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.
ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.
☀️ Test successful - status-appveyor, status-travis |
This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:
Caches are only updated for successful builds, meaning that if a build times
out or fails in a different location the sucessfully-created docker images
isn't always cached. While this makes sense as a general rule of caches it
hurts our use cases.
Caches are per-branch and builder which means that we don't have a separate
cache on each release channel. All our merges go through the
auto
branchwhich means that they're all sharing the same cache, even those for merging to
master/beta. This means that PRs which switch between master/beta will keep
rebuilting and having cache misses.
Caches have historically been invaliated somewhat regularly a little more
aggressively than we'd want (I think).
We don't always need to update the contents of the cache if the Docker image
didn't change at all, and saving off the docker layers can sometimes be quite
expensive.
For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!
The logic here is inserted directly into the
src/ci/docker/run.sh
script todownload an image based on a shasum of the
Dockerfile
and other assorted files.This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.
This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.