Experimentally support remote persistent workers. #16362

benjaminp · 2022-09-30T17:16:15Z

Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request.

This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x.

We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag.

Fixes #10091.

Co-authored-by: Ulf Adams ulf@engflow.com

Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements #10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3

coeuvre · 2022-10-04T08:35:56Z

cc @larsrc-google

larsrc-google · 2022-10-04T13:59:17Z

src/main/java/com/google/devtools/build/lib/remote/RemoteExecutionService.java

      throws IOException, ForbiddenActionInputException {
    // Add output directories to inputs so that they are created as empty directories by the
    // executor. The spec only requires the executor to create the parent directory of an output
    // directory, which differs from the behavior of both local and sandboxed execution.
    SortedMap<PathFragment, ActionInput> outputDirMap = buildOutputDirMap(spawn);
-    if (remoteOptions.remoteMerkleTreeCache) {
+    if (remoteOptions.remoteMerkleTreeCache && toolSignature == null) {


So this means with workers we can't use the remote Merkle tree cache? Is that a problem performance-wise? Is that something that can/should be fixed?

This change was developed before the merkle tree cache was merged. Performance benefits were realized for the intended usecases. So, when the merkle tree cache came along, it didn't seem necessary to integrate tightly with it. I think our benchmarks didn't benefit from the merkle tree cache anyway because, for example, Javac actions normally don't have massive input trees.

No doubt this could be improved but one has to do the thinking to make sure if inputs move in and out of the tool set, the merkle tree is invalidated properly.

Ok, that can be a follow-up item. Just mark it as such here, otherwise this if is very confusing.

I have now expanded the logic here and also added a comment.

wiwa · 2022-10-13T21:03:16Z

src/main/java/com/google/devtools/build/lib/remote/RemoteExecutionService.java

+   * A simple value class combining a hash of the tool inputs (and their digests) as well as a set
+   * of the relative paths of all tool inputs.
+   */
+  private static final class ToolSignature {


Hi! I've been experimenting with remote persistent workers in a setting where the remote execution backend has zero knowledge of the tools. I ran into the issue of the backend not knowing how to actually run the tool. To overcome this, I added the actual command (with --persistent_worker) called by Bazel to ToolSignature. Implementation here: wiwa@07d40ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR1548

I then pass it to the backend via another entry in platformAdditionalProperties called persistentWorkerCommand: wiwa@07d40ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR487

Do you think this PR could also support this functionality? Or would it have to be a different change?

I propose that we not risk delaying this change by complicating it further. Let's file additional issues or pull requests for enhancements.

Yep, I agree. Was not sure if it would particularly delay it, but the answer seems like "yes it would".

@wiwa That sounds dangerous. If your backend doesn't know if it's running workers or not, it could end up with a lot of sometimes large processes hanging around.

It's not that the backend doesn't know there are workers. It's that it doesn't need to know how to start specific worker processes. The backend is agnostic to the action which spins up a worker. I don't believe that is possible without addtional modification beyond the change in this PR. This is because Bazel persistent workers have more info to spawn the process than what is provided by a persistentWorkerKey.

Example: a Scalac action calling scalac.sh <various startup flags> --persistent_worker to initialize the worker process before passing the "actual" compile args <compilation flags> -cp <classpath> <files to compile>. The remote execution backend would need to know the inputs/hash of scalac.sh and also what those startup flags are.

Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <ulf@engflow.com>

Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <ulf@engflow.com> Closes bazelbuild#16362. PiperOrigin-RevId: 482433908 Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028

benjaminp requested a review from a team as a code owner September 30, 2022 17:16

ShreeM01 added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Sep 30, 2022

larsrc-google reviewed Oct 4, 2022

View reviewed changes

benjaminp force-pushed the remote-persistent-worker branch from c55a27b to f58c5e7 Compare October 4, 2022 15:05

wiwa reviewed Oct 13, 2022

View reviewed changes

wiwa mentioned this pull request Oct 14, 2022

Introduce Persistent Workers buildfarm/buildfarm#1195

Merged

benjaminp force-pushed the remote-persistent-worker branch from f58c5e7 to bdac5e3 Compare October 18, 2022 15:57

copybara-service bot closed this in 72b481a Oct 20, 2022

benjaminp deleted the remote-persistent-worker branch October 20, 2022 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimentally support remote persistent workers. #16362

Experimentally support remote persistent workers. #16362

benjaminp commented Sep 30, 2022

coeuvre commented Oct 4, 2022

larsrc-google Oct 4, 2022

benjaminp Oct 4, 2022

larsrc-google Oct 4, 2022

benjaminp Oct 4, 2022

wiwa Oct 13, 2022

benjaminp Oct 13, 2022

wiwa Oct 14, 2022

larsrc-google Oct 19, 2022

wiwa Oct 19, 2022

wiwa Oct 20, 2022

Experimentally support remote persistent workers. #16362

Experimentally support remote persistent workers. #16362

Conversation

benjaminp commented Sep 30, 2022

coeuvre commented Oct 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment