Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Don't use Travis caches for docker images #49284

Merged
merged 1 commit into from
Mar 25, 2018

Conversation

alexcrichton
Copy link
Member

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

  • Caches are only updated for successful builds, meaning that if a build times
    out or fails in a different location the sucessfully-created docker images
    isn't always cached. While this makes sense as a general rule of caches it
    hurts our use cases.

  • Caches are per-branch and builder which means that we don't have a separate
    cache on each release channel. All our merges go through the auto branch
    which means that they're all sharing the same cache, even those for merging to
    master/beta. This means that PRs which switch between master/beta will keep
    rebuilting and having cache misses.

  • Caches have historically been invaliated somewhat regularly a little more
    aggressively than we'd want (I think).

  • We don't always need to update the contents of the cache if the Docker image
    didn't change at all, and saving off the docker layers can sometimes be quite
    expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the src/ci/docker/run.sh script to
download an image based on a shasum of the Dockerfile and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.

@alexcrichton alexcrichton requested a review from kennytm March 22, 2018 21:32
@alexcrichton alexcrichton force-pushed the use-our-own-cache branch 3 times, most recently from 368aad3 to 739afa9 Compare March 22, 2018 21:58
@kennytm
Copy link
Member

kennytm commented Mar 22, 2018

Oh well.

[00:04:41] tidy error: /checkout/src/ci/docker/run.sh:31: line longer than 100 chars
[00:04:42] some tidy checks failed

Let's do two bors try after tidy is fixed. The first try will likely fail, but if everything goes smoothly, the second try will be successful.

@alexcrichton
Copy link
Member Author

@bors: try

@bors
Copy link
Contributor

bors commented Mar 22, 2018

⌛ Trying commit f5df04c with merge 28101e3...

bors added a commit that referenced this pull request Mar 22, 2018
ci: Don't use Travis caches for docker images

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

* Caches are only updated for successful builds, meaning that if a build times
  out or fails in a different location the sucessfully-created docker images
  isn't always cached. While this makes sense as a general rule of caches it
  hurts our use cases.

* Caches are per-branch and builder which means that we don't have a separate
  cache on each release channel. All our merges go through the `auto` branch
  which means that they're all sharing the same cache, even those for merging to
  master/beta. This means that PRs which switch between master/beta will keep
  rebuilting and having cache misses.

* Caches have historically been invaliated somewhat regularly a little more
  aggressively than we'd want (I think).

* We don't always need to update the contents of the cache if the Docker image
  didn't change at all, and saving off the docker layers can sometimes be quite
  expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the `src/ci/docker/run.sh` script to
download an image based on a shasum of the `Dockerfile` and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.
@bors
Copy link
Contributor

bors commented Mar 23, 2018

☀️ Test successful - status-travis
State: approved= try=True

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

* Caches are only updated for successful builds, meaning that if a build times
  out or fails in a different location the sucessfully-created docker images
  isn't always cached. While this makes sense as a general rule of caches it
  hurts our use cases.

* Caches are per-branch and builder which means that we don't have a separate
  cache on each release channel. All our merges go through the `auto` branch
  which means that they're all sharing the same cache, even those for merging to
  master/beta. This means that PRs which switch between master/beta will keep
  rebuilting and having cache misses.

* Caches have historically been invaliated somewhat regularly a little more
  aggressively than we'd want (I think).

* We don't always need to update the contents of the cache if the Docker image
  didn't change at all, and saving off the docker layers can sometimes be quite
  expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the `src/ci/docker/run.sh` script to
download an image based on a shasum of the `Dockerfile` and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating rust-lang#49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.
@alexcrichton
Copy link
Member Author

@bors: try

Both previous builds succeeded and stored their cache globally

@bors
Copy link
Contributor

bors commented Mar 23, 2018

⌛ Trying commit a09e9e9 with merge 3802c45...

bors added a commit that referenced this pull request Mar 23, 2018
ci: Don't use Travis caches for docker images

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

* Caches are only updated for successful builds, meaning that if a build times
  out or fails in a different location the sucessfully-created docker images
  isn't always cached. While this makes sense as a general rule of caches it
  hurts our use cases.

* Caches are per-branch and builder which means that we don't have a separate
  cache on each release channel. All our merges go through the `auto` branch
  which means that they're all sharing the same cache, even those for merging to
  master/beta. This means that PRs which switch between master/beta will keep
  rebuilting and having cache misses.

* Caches have historically been invaliated somewhat regularly a little more
  aggressively than we'd want (I think).

* We don't always need to update the contents of the cache if the Docker image
  didn't change at all, and saving off the docker layers can sometimes be quite
  expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the `src/ci/docker/run.sh` script to
download an image based on a shasum of the `Dockerfile` and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.
@bors
Copy link
Contributor

bors commented Mar 23, 2018

💔 Test failed - status-travis

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 23, 2018
@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 23, 2018
@kennytm kennytm self-assigned this Mar 23, 2018
@kennytm kennytm removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 23, 2018
@alexcrichton
Copy link
Member Author

Looks like both builds had a cache hit:

One took about 2 minutes to download/load into docker and the other took about a minute to download/load into docker. This seems to be comparable with the last successful build which also took about a minute loading into docker and ~20s doing cache things. We'd save ~2 minutes at the end of the build however by avoiding docker history and storing things in the cache.

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 23, 2018
@kennytm
Copy link
Member

kennytm commented Mar 23, 2018

@bors r+ p=1

(p=1 to avoid rollup)

@bors
Copy link
Contributor

bors commented Mar 23, 2018

📌 Commit a09e9e9 has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2018
@alexcrichton
Copy link
Member Author

@bors: p=0

I think it's fine to not land this immediately, we've got a ton of other fixes in the queue

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Mar 23, 2018
…ennytm

ci: Don't use Travis caches for docker images

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

* Caches are only updated for successful builds, meaning that if a build times
  out or fails in a different location the sucessfully-created docker images
  isn't always cached. While this makes sense as a general rule of caches it
  hurts our use cases.

* Caches are per-branch and builder which means that we don't have a separate
  cache on each release channel. All our merges go through the `auto` branch
  which means that they're all sharing the same cache, even those for merging to
  master/beta. This means that PRs which switch between master/beta will keep
  rebuilting and having cache misses.

* Caches have historically been invaliated somewhat regularly a little more
  aggressively than we'd want (I think).

* We don't always need to update the contents of the cache if the Docker image
  didn't change at all, and saving off the docker layers can sometimes be quite
  expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the `src/ci/docker/run.sh` script to
download an image based on a shasum of the `Dockerfile` and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating rust-lang#49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.
bors added a commit that referenced this pull request Mar 25, 2018
ci: Don't use Travis caches for docker images

This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:

* Caches are only updated for successful builds, meaning that if a build times
  out or fails in a different location the sucessfully-created docker images
  isn't always cached. While this makes sense as a general rule of caches it
  hurts our use cases.

* Caches are per-branch and builder which means that we don't have a separate
  cache on each release channel. All our merges go through the `auto` branch
  which means that they're all sharing the same cache, even those for merging to
  master/beta. This means that PRs which switch between master/beta will keep
  rebuilting and having cache misses.

* Caches have historically been invaliated somewhat regularly a little more
  aggressively than we'd want (I think).

* We don't always need to update the contents of the cache if the Docker image
  didn't change at all, and saving off the docker layers can sometimes be quite
  expensive.

For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!

The logic here is inserted directly into the `src/ci/docker/run.sh` script to
download an image based on a shasum of the `Dockerfile` and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.

This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.
@bors
Copy link
Contributor

bors commented Mar 25, 2018

⌛ Testing commit a09e9e9 with merge e5bf042...

@bors
Copy link
Contributor

bors commented Mar 25, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: kennytm
Pushing e5bf042 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants