Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Jetty/gRPC shutdown #3123

Merged
merged 8 commits into from
Dec 5, 2022
Merged

Conversation

niloc132
Copy link
Member

A series of four commits to address #2985 and other adjacent issues. I expect it will make more sense to merge these together, but providing a single branch might help review and testing.

These commits seek to ensure that the gRPC server shuts down in a predictable way from the server application's perspective and from any gRPC client's perspective.

The first commit addresses a bug where a client doesn't return early enough when reporting an error the first time, and will report a second error.

The second commit reverts some earlier cleanup around the HealthCheck service and offers an alternative solution to cleanly shut down gRPC's own threadpool only once shutdown is complete, instead of right away. (This has been discussed with James, who originally wrote that patch)

The third commit correctly ends any autocomplete stream when a session ends - among other things, this will enable the server to halt those streams during server shutdown.

Finally, the biggest commit, which likely should land separately, modifies both our "multiplexed" websocket format to support a "GO_AWAY" message (attempting to reflect h2 semantics), and reorders Jetty shutdown to give client's a change to handle a message. Normal h2 streams in Jetty are notified that the socket will end soon, and are given a chance to finish their work, but websockets are immediately closed - this change avoids closing those websockets until later in the shutdown process, giving browser clients a chance to finish their work.

This includes a change to the multiplexed websocket format to indicate
to the client that the websocket itself will soon stop, so no more
streams should be started on it, and existing calls should be finished.
This reflects the same basic semantics that http2 uses for GO_AWAY.
if (closed != ClosedState.OPEN) {
return closingFuture;
}
closed = ClosedState.CLOSING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should closed technically be marked as volatile? Even though we expect closed to go OPEN -> CLOSING -> CLOSED with happens-before/happens-after semantics, I don't think the java memory model guarantees other threads will see the change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought I had this set to only be possible from one thread, but I will double check and get back to you, or at least try to tell a convincing story...

Copy link
Member Author

@niloc132 niloc132 Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that could be helpful, but I think not strictly necessary - stopAcceptingNewStreams() can only be called once, and the other tests require a full round trip to the client after that to make sense. I'm not sure what the threading "guarantees" there would be, except that by the time we've sent a message and received a corresponding reply, that change should have propagated.

The exception to this is a newly opened socket, but I think that onOpen is already racing WebsocketFactory's loop in that case. Will investigate.

Open to adding it anyway, but in that case we have many more fields that should have it.

@niloc132 niloc132 added the ReleaseNotesNeeded Release notes are needed label Dec 1, 2022
devinrsmith
devinrsmith previously approved these changes Dec 1, 2022
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Were you able to get a testing procedure that makes you relatively confident in these changes? Not sure if you preferred to separate out the commits or not. I'm fine either way.

@niloc132
Copy link
Member Author

niloc132 commented Dec 1, 2022

Looks good. Were you able to get a testing procedure that makes you relatively confident in these changes?

The testing process for the original bug was to open more than one browser/tab/window, and then shutdown the server, ideally with kill -SIGTERM (ctrl-c to gradle tends to kill it without giving it time to shutdown cleanly?), ensuring that all received the shutdown message properly.

To ensure that GO_AWAY behaved as expected, I would put a breakpoint in the client implementation of the websocket, and prevent it from acknowledging the message, to ensure that the server would stay up until its timeout. It appears that in some cases the browser will hold open other connections (likely keep-alive for plain http/1.1 calls), and the server waits for those to all terminate as well - if the client has been running for more than a few minutes or has been closed, this doesn't happen.

Not sure if you preferred to separate out the commits or not. I'm fine either way.

If you don't see an issue with landing them together, I can at least keep the branch in my fork so we can roll back individual commits if it comes to that, else I can submit several small PRs to land one at a time. For the sake of simplicity, given how long until we release I lean towards merging this PR instead, but it isn't a strong preference.

nbauernfeind
nbauernfeind previously approved these changes Dec 2, 2022
@devinrsmith devinrsmith self-requested a review December 2, 2022 22:03
devinrsmith
devinrsmith previously approved these changes Dec 2, 2022
@niloc132 niloc132 merged commit 5d4b79b into deephaven:main Dec 5, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Dec 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants