Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for wrapping system streams in WorkRequestHandler #14201

Conversation

Bencodes
Copy link
Contributor

@Bencodes Bencodes commented Nov 1, 2021

There are often places where persistent workers need to swap out the standard system streams to avoid tools poisoning the worker communication streams by writing logs/exceptions to it.

This pull request extracts that pattern into an optional WorkerIO wrapper can be used to swap in and out the standard streams without the added boilerplate.

@Bencodes
Copy link
Contributor Author

@larsrc-google mind taking a look at this?


private final InputStream originalInputStream;
private final PrintStream originalOutputStream;
private final PrintStream originalErrorStream;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping System.err isn't actually be necessary for a wrapper worker io.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Also System.in is already handled properly. Only System.out needs to be wrapped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also System.in is already handled properly. Only System.out needs to be wrapped.

The motivation behind wrapping all of the system streams in a WorkerIO class is that there is a clear and consistent place to access the root system streams while setting up the work request handler. For System.in and System.err we can just return the original streams when calling getOriginalInputStream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further reflection, what you do makes sense, but not in the way you use it in the example. Instead of wrapping the whole WorkRequestHandler, wrapping the inside of the WorkRequestCallback would be clearer. You wouldn't even need to have the accessors to the original streams. That prevents the tool from attempting to read from stdin and corrupting the communication.

That change would also mean that the WorkerIO doesn't need to reset the buffer, as it'll be thrown away once a request is done. Plus in case processRequest in your example throws an exception the WorkerIO will be closed properly.

A definite benefit of either approach is that stdout and stderr get merged into a single stream, that will be easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapping the inside of the WorkRequestCallback would be clearer

By this do you mean calling capture from within the implementation of WorkRequestHandler, or in the lambda that gets passed into the WorkRequestHandler? I like idea of this being hidden inside the WorkRequestCallback because we hide WorkerIO#readCapturedAsUtf8String entirely which is kind of an awkward API to need to expose.

That change would also mean that the WorkerIO doesn't need to reset the

Would each work request call WorkerIO#capture swapping in it's own buffer?

A definite benefit of either approach is that stdout and stderr get merged into a single stream, that will be easier to understand.

Wrapping the stderr could be a bit error-prone because if there's ever an unrecoverable error that causes the worker to not send a work response, you just won't see an exception come out of the worker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does your current solution deal with multiplex workers?

@larsrc-google I took another pass at building this into the WorkRequestHandler so that it's a bit more transparent, and with the multiplex worker cases in mind. The new approach uses the existing WorkerIO implementation, but wraps the system streams when you call WorkRequestHandler#processRequests.

The WorkerIO instance gets passed down into respondToRequest so that there is a single source of truth for the wrapped streams, and so that we don't have to deal with multiple WorkerIO instances being created inside of respondToRequest which doesn't really make sense in a multiplex-world where respondToRequest may get called concurrently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I've been so focused on the implementation details that I forgot the larger problem: Redirecting the stderr messages into the output field of the response changes semantics. The output field is for shorter messages to be displayed to the user, stderr goes into the logs. See https://docs.bazel.build/versions/main/creating-workers.html#work-responses. It's known that multiplex workers only have one log between them, but the messages shown to the user should not interleave.

And of course, since the multiplex threads still share one System.err instance regardless of what we set, we can't possibly disentangle that. But we can still fix the main issue of stdout getting corrupted by setting it to the same stream as stderr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure i'm understanding next steps here - You'd prefer that we avoid capturing the tool output streams and instead just swap the System.err in for System.out to not pollute the work response output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Having System.err and System.out go to the same place (the logs) is good. Changing the work response output would cause a lot of noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the changes for swapping System.err in for System.out and updated the tests.

@larsrc-google
Copy link
Contributor

Sorry for not reacting earlier, need to update mail filters.

This is a good thing to do, something I have considered but not gotten around to yet. One question: Why an optional wrapper? Why not just make it part of WorkRequestHandler? Can you imagine a case where a user of WorkRequestHandler would not want to redirect stdout?

@Bencodes
Copy link
Contributor Author

Why an optional wrapper? Why not just make it part of WorkRequestHandler?

No reason other than it felt easier to reason about merging this into Bazel as an isolated tool first, instead of making changes to WorkRequestHandler directly.

Can you imagine a case where a user of WorkRequestHandler would not want to redirect stdout?

I can't, although the java workers seem to be managing fine without wrapped system streams.

@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch from 04ac34c to cc18fc2 Compare November 22, 2021 19:56
@Bencodes
Copy link
Contributor Author

Bencodes commented Dec 4, 2021

@larsrc-google is there a path forward for merging some form of this?

@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch from 901c960 to 2856b3b Compare January 10, 2022 04:33
@google-cla google-cla bot added cla: no and removed cla: yes labels Jan 10, 2022
@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch from 2856b3b to 008280a Compare January 10, 2022 04:39
@google-cla google-cla bot added cla: yes and removed cla: no labels Jan 10, 2022
@lberki lberki removed request for tetromino and a team January 10, 2022 08:21

// Redirect System.out to System.err
System.setOut(originalErrorStream);
System.setErr(originalErrorStream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot simpler. I don't see any reason to do anything to System.err, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch 2 times, most recently from 9a11aa4 to 911ea81 Compare January 21, 2022 22:34
@@ -311,6 +315,8 @@ public WorkRequestHandler build() {
* returns. If {@code in} reaches EOF, it also returns.
*/
public void processRequests() throws IOException {
WorkerIO workerIO = WorkerIO.redirectSystemStreams();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at #14623, you're not the only one with this idea. You might be right in making it optional, as some workers do their own handling. Fortunately that could easily be done by adding a method to the builder.

@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch from 911ea81 to fdc6e07 Compare January 24, 2022 23:42
@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch from 3da7148 to 6e5c7cb Compare February 6, 2022 17:34
@nkoroste
Copy link
Contributor

nkoroste commented May 3, 2022

Any updates on this?

@larsrc-google
Copy link
Contributor

Sorry for the silence, I was out sick for quite a while. I believe being optional is not as big a deal as I indicated, so if you can fix the conflicts, we should be able to merge this.

@Bencodes Bencodes force-pushed the add-support-for-wrapping-system-streams-in-workrequesthandler branch from 6e5c7cb to 2d3689f Compare August 25, 2022 15:56
@Bencodes
Copy link
Contributor Author

@larsrc-google I have this a rebase. Please give it another once over to make sure that the conflicts got resolved correctly against the new changes that were in master.

@sgowroji sgowroji added team-Performance Issues for Performance teams awaiting-review PR is awaiting review from an assigned reviewer labels Oct 19, 2022
@copybara-service copybara-service bot closed this in e05345d Jan 2, 2023
hvadehra pushed a commit that referenced this pull request Feb 14, 2023
There are often [places](https://github.com/bazelbuild/bazel/blob/ea19c17075478092eb77580e6d3825d480126d3a/src/tools/android/java/com/google/devtools/build/android/ResourceProcessorBusyBox.java#L188) where persistent workers need to swap out the standard system streams to avoid tools poisoning the worker communication streams by writing logs/exceptions to it.

This pull request extracts that pattern into an optional WorkerIO wrapper can be used to swap in and out the standard streams without the added boilerplate.

Closes #14201.

PiperOrigin-RevId: 498983983
Change-Id: Iefb956d38a5887d9e5bbf0821551eb0efa14fce9
@brentleyjones
Copy link
Contributor

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Feb 24, 2023
@brentleyjones
Copy link
Contributor

brentleyjones commented Feb 24, 2023

(Above was from a discussion with @larsrc-google: #17448 (comment))

@ShreeM01
Copy link
Contributor

@bazel-io fork 6.1.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Feb 24, 2023
@Bencodes Bencodes deleted the add-support-for-wrapping-system-streams-in-workrequesthandler branch March 2, 2023 01:32
copybara-service bot pushed a commit that referenced this pull request Aug 23, 2023
With the baked in `WorkerIO` added here #14201 we can remove the custom system stream wrapping and instead rely on `WorkRequestHandler` to manage that for us.

Closes #17654.

PiperOrigin-RevId: 559454874
Change-Id: I683ca6580bb7d7da3b7712e0033e1b903010de5b
copybara-service bot pushed a commit that referenced this pull request Aug 28, 2023
This pull request provides a custom `DiagnosticsHandler` implementation that redirects info and warning messages to the original `System#err` stream so that the noisy D8 messages are written to the worker log file instead of directly to the console.

In addition this PR also removes the custom system stream redirection, and instead uses the baked in `WorkerIO` implementation that automatically wraps the system stream for us that was added here #14201.

#17647

Closes #17648.

PiperOrigin-RevId: 560600558
Change-Id: I9d2d162b62fe5dfd00e8614ba73caf1cac5f02d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer cla: yes team-Performance Issues for Performance teams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants