-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--experimental_remote_cache_compression
causes incomplete writes
#14654
Comments
Is |
Actually the reason for this error seems to be |
--experimental_remote_cache_async
causes incomplete writes--experimental_remote_cache_compression
causes incomplete writes
This is where those errors are thrown: bazel/src/main/java/com/google/devtools/build/lib/remote/ByteStreamUploader.java Lines 466 to 476 in 72d5255
|
Highlighting the relevant parts of the error:
I am not super familiar with the byte stream uploader (attempting to grok it now), but is it possible that the Asking because those error messages look as though the server is returning uncompressed blob sizes as the committed size (e.g. a 721 byte chunk compressing to 374 bytes looks about right, and a 2 byte chunk compressing to 14 bytes looks accurate as well since the compression overhead probably requires more space than the bytes themselves). |
CC @coeuvre , @benjaminp , @buchgr , @Wyverald |
CC @bazelbuild/remote-execution |
I started a draft PR which has a failing test illustrating the issue, and I am fairly certain that the fake ByteStream impl in the test adheres to protocol, but please correct me if I'm wrong: #14655 I think there is an ambiguity with the current way that the
In summary, one of two things can happen in response to clients sending their final chunk:
But, what does "full size of the uploaded file" mean?
Given that the former interpretation would be useless to clients, I imagine the latter interpretation is what was originally intended. If so, I think it implies that the current implementation of
CC @mostynb who may be able to help clarify the spec |
You're right that |
@mostynb Thanks for your input! I would be happy to send a PR so that Bazel stops checking the Separately, I am trying to figure out whether the
Firstly, is this scenario is even possible? (not sure if client-streaming gRPC works the same way in Java as it does in Go, but IIUC the client does not need to wait for the server to receive and respond to each message before proceeding to send the next message) Secondly, is the only workaround on the cache side just to wait for Bazel to upload the entire stream, regardless of whether the digest already exists in cache or not, and then return the total length of the compressed stream so that it satisfies Bazel's committed_size expectation? |
@mostynb FYI, I think this issue might affect It looks like on the first WriteRequest, if the digest already exists in cache, a |
Maybe the best we can do is update the REAPI spec to advise clients to ignore I'm not sure how the early-exit mechanism is useful in practice actually. As you mentioned the client calls |
This is exactly the workaround that I'm currently implementing 👍
I think it's good advice for servers that want to be compatible with Bazel's current behavior, so it's probably worth mentioning this approach in the REAPI spec as a notable client quirk. I'm not sure whether it'd be the best advice though if/when Bazel removes the |
Let's open an REAPI issue to get more input- can you do that or would you like me to? |
I'll go ahead and open one |
This is an implementation of this REAPI spec update: bazelbuild/remote-apis#213 Which is part of the solution to this issue: bazelbuild/bazel#14654
This is an implementation of this REAPI spec update: bazelbuild/remote-apis#213 Here's a bazel-remote build that can be used to test this change: buchgr/bazel-remote#527 Fixes bazelbuild#14654
This is an implementation of this REAPI spec update: bazelbuild/remote-apis#213 Here's a bazel-remote build that can be used to test this change: buchgr/bazel-remote#527 Fixes bazelbuild#14654
@bazel-io fork 5.1 |
This is an implementation of this REAPI spec update: bazelbuild/remote-apis#213 Here's a bazel-remote build that can be used to test this change: buchgr/bazel-remote#527 Fixes bazelbuild#14654 Closes bazelbuild#14870. PiperOrigin-RevId: 430167812 (cherry picked from commit d184e48)
This is an implementation of this REAPI spec update: bazelbuild/remote-apis#213 Here's a bazel-remote build that can be used to test this change: buchgr/bazel-remote#527 Fixes #14654 Closes #14870. PiperOrigin-RevId: 430167812 (cherry picked from commit d184e48) Co-authored-by: Mostyn Bramley-Moore <mostyn@antipode.se>
This is an implementation of this REAPI spec update: bazelbuild/remote-apis#213 Which is part of the solution to this issue: bazelbuild/bazel#14654
Still seeing this, removing |
@keith: your remote cache might require an update- is it something opensource? If so which one and which version? |
Using Google's 🙃 |
cc @bergsieker |
This will be fixed in Bazel 5.3 😍. |
Description of the problem / feature request:
We upgraded to Bazel 5.0.0 and are currently trying out the
--experimental_remote_cache_async
. During our build, I see over 1000 messages printed out to stdout that look like this possibly caused byBulkTransferException
s:Feature requests: what underlying problem are you trying to solve with this feature?
I'm testing the new
--experimental_remote_cache_compression
flag to speed up artifacts download and upload.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I couldn't reproduce it yet. This is a pretty big project with a lot of flags. Maybe this flag is conflicting with one of these as well:
macOS 12.1
What's the output of
bazel info release
?release 5.0.0
The text was updated successfully, but these errors were encountered: