Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing outputs when update action result #25106

Open
sluongng opened this issue Jan 28, 2025 · 3 comments
Open

Missing outputs when update action result #25106

sluongng opened this issue Jan 28, 2025 · 3 comments
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@sluongng
Copy link
Contributor

Description of the bug:

One of our users did an experiment: First, they built invocation_1 on machine A. Wait for the build to finish, then built invocation_2 on machine B (using the same git commit). The 2 invocations are local exec with a disk cache + grpc remote cache.

In invocation_2, we see a lot of AC hits and one AC miss. Diving deeper into the miss, we saw that it was a cacheable action with an output directory. And the output directory was relatively large (~180MB, >6k files, >200 dirs).

Inspecting the remote_grpc_log and compact execution, for both invocations, we found that this action was locally executed twice in both invocation and the UpdateActionResult rpc was called on our server to upload the result. However, some of the files in the output directory were never uploaded to our cache.

> cat grpc_1.log | jq 'select(.metadata.targetId == "//my:target") | select(.methodName == "build.bazel.remote.execution.v2.ContentAddressableStorage/FindMissingBlobs") | .details.findMissingBlobs | {request: {blobDigests: [.request.blobDigests[] | select(.hash == "92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4")]}, response: {missingBlobDigests: [.response.missingBlobDigests[] | select(.hash == "92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4")]}}'
{
  "request": {
    "blobDigests": [
      {
        "hash": "92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4",
        "sizeBytes": "1239"
      }
    ]
  },
  "response": {
    "missingBlobDigests": [
      {
        "hash": "92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4",
        "sizeBytes": "1239"
      }
    ]
  }
}

> rg 92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4 bb_grpc_1.log
542912:    "message": "ActionResult (hash:\"9dee32e38302bd3a4fcec117cd81c77682828ccadab87e728edf284e65cca803\" size_bytes:148) not found: rpc error: code = NotFound desc = ActionResult output file: 'hash:\"92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4\" size_bytes:1239' not found in cache"
6943288:            "hash": "92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4",
6953277:            "hash": "92ff80c9ee2912abb0c5476c4ce78d124ff0d1e9a2fe627e794710a78abe1df4",

In this case, 9dee.. is the digest of the Action while 92ff.. was one of the files inside the output directory. In the log, we can also find many successful CAS writes (BytesStream.Write) from the same target to our cache, but for some reason, not all the blobs in "missingBlobDigests" was uploaded.

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

7.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?


If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

Reviewing the current implementation of CombinedCache in master as well as 7.2.1 I cannot see a good reason why the blob was not uploaded. The missing blobs from the remote cache are combined with the missing blobs from disk cache to create a union set of blob to be uploaded. But the upload never happened?

We expected the entire output directory to be uploaded successfully before UpdateActionResult is called. Is that a good assumption to make?

Any other information, logs, or outputs that you want to share?

No response

@tjgq
Copy link
Contributor

tjgq commented Jan 28, 2025

I'd suspect #20296 which was likely fixed by 8683da9, but never backported to 7.x.

@sgowroji sgowroji added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Jan 28, 2025
@sluongng
Copy link
Contributor Author

Our affected users confirmed that upgrading to Bazel 8 solved their issue.

@tjgq @coeuvre I will let you decide whether this is worth the effort cherry-picking back to Bazel 7.5.
I think most of our users are still on 7.x and 7.2.1 is relatively popular among them

@coeuvre
Copy link
Member

coeuvre commented Jan 29, 2025

7.5 is already on RC3, it's pretty risky to cherrypick more features. If we ever have 7.6, we can consider backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

7 participants