Skip to content

Commit e5bf042

Browse files
committed
Auto merge of #49284 - alexcrichton:use-our-own-cache, r=kennytm
ci: Don't use Travis caches for docker images This commit moves away from caching on Travis to our own caching on S3 for caching docker layers between builds. Unfortunately the Travis caches have over time had a few critical pain points: * Caches are only updated for successful builds, meaning that if a build times out or fails in a different location the sucessfully-created docker images isn't always cached. While this makes sense as a general rule of caches it hurts our use cases. * Caches are per-branch and builder which means that we don't have a separate cache on each release channel. All our merges go through the `auto` branch which means that they're all sharing the same cache, even those for merging to master/beta. This means that PRs which switch between master/beta will keep rebuilting and having cache misses. * Caches have historically been invaliated somewhat regularly a little more aggressively than we'd want (I think). * We don't always need to update the contents of the cache if the Docker image didn't change at all, and saving off the docker layers can sometimes be quite expensive. For all these reasons this commit drops the usage of Travis's built-in caching support. Instead our own caching is used by storing blobs to S3. Normally this would be a very risky endeavour but we're basically priming a cache for a cache (docker) so if we get this wrong the failure mode is longer builds, not stale caches. We'll notice that pretty quickly and hopefully fix it! The logic here is inserted directly into the `src/ci/docker/run.sh` script to download an image based on a shasum of the `Dockerfile` and other assorted files. This blob, if found, is loaded into docker and we record what layers were inserted. After docker finishes the build (hopefully quickly with lots of cache hits) we then see the sha of the final image. If it's one of the layers we loaded then there's no need to update the cache. Otherwise we upload our layers to the global cache, possibly overwriting what we previously just downloaded. This is hopefully a step towards mitigating #49278 although it doesn't completely fix it as it means we'll still probably have to retry builds that bust the cache.
2 parents 77e2bcb + a09e9e9 commit e5bf042

File tree

2 files changed

+38
-16
lines changed

2 files changed

+38
-16
lines changed

.travis.yml

+6-16
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,6 @@ matrix:
183183
if: branch = master AND type = push
184184
before_install: []
185185
install: []
186-
cache: false
187186
sudo: false
188187
script:
189188
MESSAGE_FILE=$(mktemp -t msg.XXXXXX);
@@ -201,7 +200,12 @@ env:
201200
- secure: "cFh8thThqEJLC98XKI5pfqflUzOlxsYPRW20AWRaYOOgYHPTiGWypTXiPbGSKaeAXTZoOA+DpQtEmefc0U6lt9dHc7a/MIaK6isFurjlnKYiLOeTruzyu1z7PWCeZ/jKXsU2RK/88DBtlNwfMdaMIeuKj14IVfpepPPL71ETbuk="
202201

203202
before_install:
204-
- zcat $HOME/docker/rust-ci.tar.gz | docker load || true
203+
# We'll use the AWS cli to download/upload cached docker layers, so install
204+
# that here.
205+
- if [ "$TRAVIS_OS_NAME" = linux ]; then
206+
pip install --user awscli;
207+
export PATH=$PATH:$HOME/.local/bin;
208+
fi
205209
- mkdir -p $HOME/rustsrc
206210
# FIXME(#46924): these two commands are required to enable IPv6,
207211
# they shouldn't exist, please revert once more official solutions appeared.
@@ -286,23 +290,9 @@ after_failure:
286290
# it happened
287291
- dmesg | grep -i kill
288292

289-
# Save tagged docker images we created and load them if they're available
290-
# Travis saves caches whether the build failed or not, nuke rustsrc if
291-
# the failure was while updating it (as it may be in a bad state)
292-
# https://github.com/travis-ci/travis-ci/issues/4472
293-
before_cache:
294-
- docker history -q rust-ci |
295-
grep -v missing |
296-
xargs docker save |
297-
gzip > $HOME/docker/rust-ci.tar.gz
298-
299293
notifications:
300294
email: false
301295

302-
cache:
303-
directories:
304-
- $HOME/docker
305-
306296
before_deploy:
307297
- mkdir -p deploy/$TRAVIS_COMMIT
308298
- >

src/ci/docker/run.sh

+32
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,21 @@ travis_fold start build_docker
2727
travis_time_start
2828

2929
if [ -f "$docker_dir/$image/Dockerfile" ]; then
30+
if [ "$CI" != "" ]; then
31+
cksum=$(find $docker_dir/$image $docker_dir/scripts -type f | \
32+
sort | \
33+
xargs cat | \
34+
sha512sum | \
35+
awk '{print $1}')
36+
s3url="s3://$SCCACHE_BUCKET/docker/$cksum"
37+
url="https://s3-us-west-1.amazonaws.com/$SCCACHE_BUCKET/docker/$cksum"
38+
echo "Attempting to download $s3url"
39+
set +e
40+
loaded_images=$(curl $url | docker load | sed 's/.* sha/sha/')
41+
set -e
42+
echo "Downloaded containers:\n$loaded_images"
43+
fi
44+
3045
dockerfile="$docker_dir/$image/Dockerfile"
3146
if [ -x /usr/bin/cygpath ]; then
3247
context="`cygpath -w $docker_dir`"
@@ -40,6 +55,23 @@ if [ -f "$docker_dir/$image/Dockerfile" ]; then
4055
-t rust-ci \
4156
-f "$dockerfile" \
4257
"$context"
58+
59+
if [ "$s3url" != "" ]; then
60+
digest=$(docker inspect rust-ci --format '{{.Id}}')
61+
echo "Built container $digest"
62+
if ! grep -q "$digest" <(echo "$loaded_images"); then
63+
echo "Uploading finished image to $s3url"
64+
set +e
65+
docker history -q rust-ci | \
66+
grep -v missing | \
67+
xargs docker save | \
68+
gzip | \
69+
aws s3 cp - $s3url
70+
set -e
71+
else
72+
echo "Looks like docker image is the same as before, not uploading"
73+
fi
74+
fi
4375
elif [ -f "$docker_dir/disabled/$image/Dockerfile" ]; then
4476
if [ -n "$TRAVIS_OS_NAME" ]; then
4577
echo Cannot run disabled images on travis!

0 commit comments

Comments
 (0)