-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote execution fails with a garbage collecting remote cache #3452
Comments
That is ... not ideal. Any ideas? |
I don't know the cause yet. I am debugging it currently. The good thing is that I can reproduce it 100%. |
We somehow seem to upload failed action execution to the action cache. I managed to get the CAS hash of the I then logged on to the remote cache machine and disassembled the blob (which is of type
Tag 2 is output files, 6 is stdout digest, 8 is stderr digest.
So we seem to upload failed action execution. That means we have two bugs:
😄 🍾 🎆 |
I fixed the upload of failed action results, and now things are better but we are still failing in weird ways. Looking... |
We seem to have a problem with output files of cached actions having been deleted from the cache. So the action is still cached, but one or more of the output files has been deleted. |
Ok we found the problem. If the remote download of an output file fails, it still creates an empty output file ... and then the local fallback execution fails because the output file already exists, as we are not using sandboxing as a fallback. Solution seems to be to delete all output files if the remote download fails. |
Right, I thought we should delete the partially created file as part of the fail/cleanup, but Ulf just explained this will not be enough for some tools. You're right, deleting everything is the way to go. |
Ok got no more failures, even with aggressive purging ... working on fixes ... |
If a remote download fails, delete any output files that might have already been created. Else, this might intefere with a subsequent locally executed actions that expects none of its output files to exist. See #3452. Change-Id: I467a97d05606c586aa257326213940a37dad9dd5 PiperOrigin-RevId: 163336093
Using bazel 0.5.3rc3 for local execution. The remote cache is nginx. We have a python script running that enforces an upper size bound and deletes files based on their atime.
We get errors like
The problem being that the file is empty.
cc: @damienmg @ulfjack @ola-rozenfeld
The text was updated successfully, but these errors were encountered: