-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression: Further specify ByteStream WriteResponse committed_size field for compressed blobs #212
Comments
I dug into this a bit, and early-return can work if this happens (referring to generated go bindings): In the server's
On the client side:
Note that in step (3) of the server, if we return a non-nil error like the gRPC Then I think we should require the server to set a placeholder |
…mitted_size -1 We require that uncompressed bytestream uploads specify committed_size set to the size of the blob when returning early (if the blob already exists on the server). We also require that for compressed bytestream uploads committed_size refers to the initial write offset plus the number of compressed bytes uploaded. But if the server wants to return early in this case it doesn't know how many compressed bytes would have been uploaded (the client might not know this ahead of time either). So let's require that the server set committed_size to -1 in this case. For early return to work, we also need to ensure that the server does not return an error code. Resolves bazelbuild#212.
…mitted_size -1 (#213) We require that uncompressed bytestream uploads specify committed_size set to the size of the blob when returning early (if the blob already exists on the server). We also require that for compressed bytestream uploads committed_size refers to the initial write offset plus the number of compressed bytes uploaded. But if the server wants to return early in this case it doesn't know how many compressed bytes would have been uploaded (the client might not know this ahead of time either). So let's require that the server set committed_size to -1 in this case. For early return to work, we also need to ensure that the server does not return an error code. Resolves #212.
Context: bazelbuild/bazel#14654
There is an ambiguity with the current way that the ByteStream.Write protocol is specified for compressed blobs. The relevant part of the spec with the ambiguity is here:
remote-apis/build/bazel/remote/execution/v2/remote_execution.proto
Lines 256 to 264 in 636121a
The ambiguity is: what does "full size of the uploaded file" mean? Does it mean "compressed size" or "uncompressed size"? If it's compressed size, then it's not useful info to be returned in the response, since compressed size can vary depending on the compression level used by the other client that uploaded the file.
Note also that Bazel 5.0.0 effectively expects this to match the compressed size that it uploaded. Whether or not this is a bug, it means that servers have to wait for Bazel 5.0.0 to upload all chunks before they can reply with a committed_size that Bazel 5.0.0 will be happy with, effectively preventing the "early-exit" strategy when an object already exists in the cache.
From @mostynb on that thread:
This workaround of discarding all received data will work, but it's unclear how significant of a performance impact this will have in practice, compared to early-exit. Hoping that some folks with REAPI experience can chime in.
The text was updated successfully, but these errors were encountered: