Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimentally support remote persistent workers. #16362

Closed

Conversation

benjaminp
Copy link
Collaborator

Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request.

This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x.

We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag.

Fixes #10091.

Co-authored-by: Ulf Adams ulf@engflow.com

@benjaminp benjaminp requested a review from a team as a code owner September 30, 2022 17:16
brentleyjones referenced this pull request Sep 30, 2022
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag
tool inputs when executing actions remotely, and also adds a tools input key
to the platform proto sent as part of the remote execution request.

This allows a remote execution system to implement persistent workers, i.e.,
to keep worker processes around and reuse them for subsequent actions. In a
trivial example, this improves build performance by ~3x.

We use "persistentWorkerKey" for the platform property, with the value being
a hash of the tool inputs, and "bazel_tool_input" as the node property name,
with an empty string as value (this is just a boolean tag).

Implements #10091.

Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
@ShreeM01 ShreeM01 added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Sep 30, 2022
@coeuvre
Copy link
Member

coeuvre commented Oct 4, 2022

cc @larsrc-google

throws IOException, ForbiddenActionInputException {
// Add output directories to inputs so that they are created as empty directories by the
// executor. The spec only requires the executor to create the parent directory of an output
// directory, which differs from the behavior of both local and sandboxed execution.
SortedMap<PathFragment, ActionInput> outputDirMap = buildOutputDirMap(spawn);
if (remoteOptions.remoteMerkleTreeCache) {
if (remoteOptions.remoteMerkleTreeCache && toolSignature == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means with workers we can't use the remote Merkle tree cache? Is that a problem performance-wise? Is that something that can/should be fixed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was developed before the merkle tree cache was merged. Performance benefits were realized for the intended usecases. So, when the merkle tree cache came along, it didn't seem necessary to integrate tightly with it. I think our benchmarks didn't benefit from the merkle tree cache anyway because, for example, Javac actions normally don't have massive input trees.

No doubt this could be improved but one has to do the thinking to make sure if inputs move in and out of the tool set, the merkle tree is invalidated properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that can be a follow-up item. Just mark it as such here, otherwise this if is very confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now expanded the logic here and also added a comment.

* A simple value class combining a hash of the tool inputs (and their digests) as well as a set
* of the relative paths of all tool inputs.
*/
private static final class ToolSignature {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I've been experimenting with remote persistent workers in a setting where the remote execution backend has zero knowledge of the tools. I ran into the issue of the backend not knowing how to actually run the tool. To overcome this, I added the actual command (with --persistent_worker) called by Bazel to ToolSignature. Implementation here: wiwa@07d40ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR1548

I then pass it to the backend via another entry in platformAdditionalProperties called persistentWorkerCommand: wiwa@07d40ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR487

Do you think this PR could also support this functionality? Or would it have to be a different change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we not risk delaying this change by complicating it further. Let's file additional issues or pull requests for enhancements.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree. Was not sure if it would particularly delay it, but the answer seems like "yes it would".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wiwa That sounds dangerous. If your backend doesn't know if it's running workers or not, it could end up with a lot of sometimes large processes hanging around.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that the backend doesn't know there are workers. It's that it doesn't need to know how to start specific worker processes. The backend is agnostic to the action which spins up a worker. I don't believe that is possible without addtional modification beyond the change in this PR. This is because Bazel persistent workers have more info to spawn the process than what is provided by a persistentWorkerKey.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: a Scalac action calling scalac.sh <various startup flags> --persistent_worker to initialize the worker process before passing the "actual" compile args <compilation flags> -cp <classpath> <files to compile>. The remote execution backend would need to know the inputs/hash of scalac.sh and also what those startup flags are.

Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request.

This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x.

We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag.

Fixes bazelbuild#10091.

Co-authored-by: Ulf Adams <ulf@engflow.com>
@benjaminp benjaminp deleted the remote-persistent-worker branch October 20, 2022 14:56
dan-huenecke pushed a commit to gemini/bazel that referenced this pull request Jun 5, 2024
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request.

This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x.

We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag.

Fixes bazelbuild#10091.

Co-authored-by: Ulf Adams <ulf@engflow.com>

Closes bazelbuild#16362.

PiperOrigin-RevId: 482433908
Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028
dan-huenecke pushed a commit to gemini/bazel that referenced this pull request Jun 5, 2024
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request.

This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x.

We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag.

Fixes bazelbuild#10091.

Co-authored-by: Ulf Adams <ulf@engflow.com>

Closes bazelbuild#16362.

PiperOrigin-RevId: 482433908
Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support running persistent workers remotely
5 participants