Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move macOS CI to GitHub Actions #1460

Merged
merged 5 commits into from
Jan 10, 2021
Merged

Move macOS CI to GitHub Actions #1460

merged 5 commits into from
Jan 10, 2021

Conversation

mboes
Copy link
Member

@mboes mboes commented Dec 27, 2020

GitHub Actions has macOS runners available now. So we no longer need
to pay for an expensive macOS plan from CircleCI. The new setup using
GitHub actions is also simpler, because we don't need to save and
restore a disk cache anymore. Instead, we have BuildBuddy as the
remote cache. Where the old CircleCI jobs were clocking at 1h15 using
the disk cache, we're now at 16 minutes using the remote cache.

cc @siggisim

@mboes
Copy link
Member Author

mboes commented Dec 27, 2020

The CircleCI error is expected, since we are removing the configuration file for it in this PR.

@siggisim
Copy link

This looks great, let me know if I can be helpful!

@mboes
Copy link
Member Author

mboes commented Dec 29, 2020

oh hey @siggisim! Thanks for stopping by. I was wondering about one thing. BuildBuddy has a cache view, which is very helpful, e.g. here. But does it also provide any tools to diagnose cache misses? The ideal would be to be able to download a list of actions, identified by their outputs, for which there was a cache miss.

@siggisim
Copy link

Hey @mboes! At the moment we only have the high level cache overview stats. We're working on a couple of different methods to help make debugging cache hits easier.

In the meantime, this guide shows you how to get the execution logs from two runs, sort them, and compare which inputs changed.

Copy link
Member

@aherrmann aherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for implementing this!

Just a few comments.

@aherrmann
Copy link
Member

I've also pushed an update to the mergify config. I tested it in the mergify config editor.

@aherrmann
Copy link
Member

One of the new GH actions failed on a rate limit

ERROR: Traceback (most recent call last):
	File "/private/var/tmp/_bazel_runner/d1e3bdaa0c37ce8d00ecff62b05849c3/external/rules_haskell/haskell/cabal.bzl", line 1670
		_pin_packages(repository_ctx, <1 more arguments>)
	File "/private/var/tmp/_bazel_runner/d1e3bdaa0c37ce8d00ecff62b05849c3/external/rules_haskell/haskell/cabal.bzl", line 1191, in _pin_packages
		repository_ctx.download(<3 more arguments>)
java.io.IOException: Error downloading [https://api.github.com/repos/commercialhaskell/all-cabal-hashes/git/ref/heads/hackage] to /private/var/tmp/_bazel_runner/d1e3bdaa0c37ce8d00ecff62b05849c3/external/stackage-pinning-test-unpinned/all-cabal-hashes-hackage.json: GET returned 403 rate limit exceeded

I don't think I've seen this on CircleCI before. Is there a special limit for downloading from github through github actions?

Enabling GH actions disk cache together with Bazel repository cache may help avoid this.

@mboes
Copy link
Member Author

mboes commented Jan 5, 2021

That's unfortunate. But I'm guessing that rate limiting applies to any machine. The error you quote is in the unpinning test, which by design does network requests, right?

@aherrmann
Copy link
Member

The error you quote is in the unpinning test, which by design does network requests, right?

Yes, that's right. Though the pinned ones also download the Cabal files from GH. Presumably this contributes to the access count and might push it over the limit together with the unpinned test. With a repository cache enabled those pinned package Cabal files would be cached already in most cases.

@mboes
Copy link
Member Author

mboes commented Jan 8, 2021

Is a repository cache something it makes sense for BuildBuddy to also provide (now or in the future)? The alternative would be to use GitHub's cache restore functionality, with all the downsides that has (manually keying an immutable cache correctly is a source of much brittleness).

@aherrmann
Copy link
Member

Is a repository cache something it makes sense for BuildBuddy to also provide (now or in the future)? The alternative would be to use GitHub's cache restore functionality, with all the downsides that has (manually keying an immutable cache correctly is a source of much brittleness).

That would be a great feature for BuildBuddy or in general any remote cache to provide. However, the Bazel feature request for a remote repository cache is still open. It looks like the flag --experimental_remote_downloader enables this, though I'm not sure if this requires dedicated support from the remote cache.

Re GH cache, a cache setup similar to what we had with CircleCI can be modeled using restore-keys. An example of such a config is here.

@siggisim
Copy link

siggisim commented Jan 8, 2021

We have "alpha" support for --experimental_remote_downloader, but its current implementation in bazel has quite a few limitations - I wouldn't recommend using it for anything mission critical.

Ultimately in order for the repo cache to be effective, it should be as close to the build machine as possible. Its main benefit comes from reducing the network bottleneck - so using a "remote" repository cache kind of defeats the purpose unless the remote cache is co-located or close to your build machines.

In our CI, we use GitHub's caching functionality:
https://github.com/buildbuddy-io/buildbuddy/blob/master/.github/workflows/main.yaml#L19

And then specify a repo cache directory in our .bazelrc:
https://github.com/buildbuddy-io/buildbuddy/blob/master/.bazelrc#L43

Just using a local repository cache is less brittle than using a full disk cache, because it's designed to be used across workspaces and will only be pulled from if the download request has a SHA256 specified.

@mboes
Copy link
Member Author

mboes commented Jan 10, 2021

@siggisim Interesting. Like the CircleCI cache, AFAIU GitHub's cache is immutable. Unlike the CircleCI cache, which appears to be forever, the GitHub cache is short-lived (7 days if no cache hits during that time). So I guess you assume that every once in a while the cache will be dropped, for reasons, and therefore there will be neither unbounded growth of the repository cache (due to keeping only repositories forever in the cache)? Dropping the cache every once in a while is also crucial to keeping the cache effective. Since it's immutable, if the cache was never dropped, new repositories would never make it into the cache.

mboes and others added 5 commits January 10, 2021 10:20
XCode 12.0 sets -Werror=implicit-function-declaration. That means that
on recent macOS, the zlib build fails. We pass a flag to turn the
error back into a warning.
GitHub actions runners, including macOS, are free for open source.
This is a far cry from the very expensive CircleCI macOS subscription.
Even with a disk cache, the CircleCI build was 1h15 minutes. Thanks to
BuildBuddy, we're now down to 16 minutes, with no need for a disk
cache.
Stored in the cache of GitHub actions.
@mboes mboes merged commit ebb4adf into master Jan 10, 2021
@mboes mboes deleted the gh-actions branch January 10, 2021 12:20
uses: actions/cache@v1
with:
path: ~/repo-cache
key: repo-cache-${{ secrets.REPO_CACHE_VERSION }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a secret instead of a suffix configured in the workflow.yaml? IIUC this will need to be updated manually whenever the external dependencies are changed, so might just as well update it in the corresponding PR.

Also, why rely on a manual version bump, that is likely to be forgotten, instead of using something like hashFiles or commit sha with fall-through?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is what is recommended by GitHub staff: https://github.community/t/how-to-clear-cache-in-github-actions/129038/5.

We could spend brain cycles working out exactly which files should go into hashFiles(), but even if we get that correct now we would have to make sure that list stays correct later. Hence not obviating the need for a manual mechanism to purge the cache, as above. Setting a cache version number is expedient and I submit sufficient given the short-lived nature of caches on GitHub anyways (7 days).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, though the way I read that thread it's not a direct recommendation of this approach and more a mention that it exists. I'm all on-board for minimizing brain cycles on this aspect. But, I think that speaks in favor of a simple suffix in the code instead of requiring an out-of-band change through GH secrets. Case in point, I myself don't see the corresponding secret in the repository's secrets tab, so there's already some permissions issue to debug, and external contributors won't be able to access it for sure. I've opened a PR here: #1472

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW you do have access to that secret. It just doesn't exist right now. I added it then removed it as an experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants