-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Jetty/gRPC shutdown #3123
Conversation
This includes a change to the multiplexed websocket format to indicate to the client that the websocket itself will soon stop, so no more streams should be started on it, and existing calls should be finished. This reflects the same basic semantics that http2 uses for GO_AWAY.
server/src/main/java/io/deephaven/server/console/ConsoleServiceGrpcImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/console/ConsoleServiceGrpcImpl.java
Outdated
Show resolved
Hide resolved
...et-jakarta/src/main/java/io/grpc/servlet/web/websocket/MultiplexedWebSocketServerStream.java
Outdated
Show resolved
Hide resolved
...et-jakarta/src/main/java/io/grpc/servlet/web/websocket/MultiplexedWebSocketServerStream.java
Outdated
Show resolved
Hide resolved
if (closed != ClosedState.OPEN) { | ||
return closingFuture; | ||
} | ||
closed = ClosedState.CLOSING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should closed
technically be marked as volatile
? Even though we expect closed
to go OPEN -> CLOSING -> CLOSED with happens-before/happens-after semantics, I don't think the java memory model guarantees other threads will see the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought I had this set to only be possible from one thread, but I will double check and get back to you, or at least try to tell a convincing story...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that could be helpful, but I think not strictly necessary - stopAcceptingNewStreams()
can only be called once, and the other tests require a full round trip to the client after that to make sense. I'm not sure what the threading "guarantees" there would be, except that by the time we've sent a message and received a corresponding reply, that change should have propagated.
The exception to this is a newly opened socket, but I think that onOpen is already racing WebsocketFactory's loop in that case. Will investigate.
Open to adding it anyway, but in that case we have many more fields that should have it.
...client-api/src/main/java/io/deephaven/web/client/api/grpc/MultiplexedWebsocketTransport.java
Show resolved
Hide resolved
...et-jakarta/src/main/java/io/grpc/servlet/web/websocket/MultiplexedWebSocketServerStream.java
Outdated
Show resolved
Hide resolved
...et-jakarta/src/main/java/io/grpc/servlet/web/websocket/MultiplexedWebSocketServerStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Were you able to get a testing procedure that makes you relatively confident in these changes? Not sure if you preferred to separate out the commits or not. I'm fine either way.
The testing process for the original bug was to open more than one browser/tab/window, and then shutdown the server, ideally with To ensure that GO_AWAY behaved as expected, I would put a breakpoint in the client implementation of the websocket, and prevent it from acknowledging the message, to ensure that the server would stay up until its timeout. It appears that in some cases the browser will hold open other connections (likely keep-alive for plain http/1.1 calls), and the server waits for those to all terminate as well - if the client has been running for more than a few minutes or has been closed, this doesn't happen.
If you don't see an issue with landing them together, I can at least keep the branch in my fork so we can roll back individual commits if it comes to that, else I can submit several small PRs to land one at a time. For the sake of simplicity, given how long until we release I lean towards merging this PR instead, but it isn't a strong preference. |
1ecd509
A series of four commits to address #2985 and other adjacent issues. I expect it will make more sense to merge these together, but providing a single branch might help review and testing.
These commits seek to ensure that the gRPC server shuts down in a predictable way from the server application's perspective and from any gRPC client's perspective.
The first commit addresses a bug where a client doesn't return early enough when reporting an error the first time, and will report a second error.
The second commit reverts some earlier cleanup around the HealthCheck service and offers an alternative solution to cleanly shut down gRPC's own threadpool only once shutdown is complete, instead of right away. (This has been discussed with James, who originally wrote that patch)
The third commit correctly ends any autocomplete stream when a session ends - among other things, this will enable the server to halt those streams during server shutdown.
Finally, the biggest commit, which likely should land separately, modifies both our "multiplexed" websocket format to support a "GO_AWAY" message (attempting to reflect h2 semantics), and reorders Jetty shutdown to give client's a change to handle a message. Normal h2 streams in Jetty are notified that the socket will end soon, and are given a chance to finish their work, but websockets are immediately closed - this change avoids closing those websockets until later in the shutdown process, giving browser clients a chance to finish their work.