Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable --remote_download_minimal #209

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

jwnimmer-tri
Copy link
Contributor

@jwnimmer-tri jwnimmer-tri commented Mar 23, 2023

Baseline from master:

Canary using this pull request:

This risks bazelbuild/bazel#8250 if our cache TTL is too low, but I am thinking we're probably safe.


This change is Reviewable

@jwnimmer-tri jwnimmer-tri marked this pull request as ready for review March 23, 2023 02:46
@svenevs svenevs self-assigned this Mar 23, 2023
Copy link
Contributor

@svenevs svenevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I added the equivalent mac-arm jobs since those are cached, this PR build is already finished in 3 minutes and 43 seconds. Basline master is 15 minutes and 33 seconds. I wasn't sure where you were looking to determine where seconds in bazel and seconds in prep / cleanup go but it seems like we should let this run and revert if we start encountering problems.

I didn't fully understand some of the issues contributors on the bazel issue were describing, but it seemed like you can only run into problems if things start getting deleted from the cache somehow. I don't think we will encounter that since nightly and continuous just overwrite everything? We just use --remote_upload_local_results=yes, there doesn't appear to be anything like "remote upload minimal" at least.

And just tell the buildcops on slack if they start seeing errors like java.lang.IllegalStateException: Unexpected inconsistency: ActionLookupData or java.io.IOException: Failed to fetch file with hash then we'll probably want to revert this.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jwnimmer-tri)

Copy link
Contributor

@BetsyMcPhail BetsyMcPhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jwnimmer-tri)

@BetsyMcPhail BetsyMcPhail merged commit fa06c03 into RobotLocomotion:main Mar 23, 2023
@jwnimmer-tri
Copy link
Contributor Author

... it seemed like you can only run into problems if things start getting deleted from the cache somehow. I don't think we will encounter that since nightly and continuous just overwrite everything?

Maybe. It's not so much "overwrite" as "repopulate" though.

I think the risk is that "access time" is no longer a sound measure of what is still relevant in the cache, so our "evict-after-a-few-days-without-access" script rule might delete a build output, but not delete the ac entry that refers to that output.

I also realized -- if we see trouble here, we might need to not only revert this patch but also bump the cache key version from v2 => v3 to force a re-population.

set(DASHBOARD_REMOTE_CACHE_KEY_VERSION "v2")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants