Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies in Bidirectional Streaming Error Handling #7558

Open
GuyLewin opened this issue Oct 27, 2020 · 11 comments
Open

Inconsistencies in Bidirectional Streaming Error Handling #7558

GuyLewin opened this issue Oct 27, 2020 · 11 comments
Assignees
Labels
Milestone

Comments

@GuyLewin
Copy link

What version of gRPC-Java are you using?

1.33.0

What is your environment?

macOS 10.15 (but also tested on Ubuntu 18.04), JDK version 11

What did you expect to see?

Documentation on when and what onError() callbacks are called when receiving errors from either side (server / client), and a consistent behavior in both.
Also - a consistent error (and documentation) of using a stream after an error has been raised on it.

What did you see instead?

Multiple inconsistencies in Exception classes and whether they've even been thrown.
I performed some tests and summarized the results on my blog post: https://lewin.co.il/inconsistencies-in-grpc-java-error-handling-with-bidirectional-streams

Steps to reproduce the bug

A project reproducing all the above bugs can be found here: https://github.com/GuyLewin/grpc-bidirectional-streaming-error-handling

@voidzcy
Copy link
Contributor

voidzcy commented Oct 29, 2020

Thanks for your behavior report and easy-to-reproduce example project. We recently had some discussion about how users should handle exceptions on StreamObserver, #7172 (and the merged #7173) is open trying to make such things clearer (note that PR may still have some pending changes). I believe we do have some vague definitions on how errors should be propagated/bounced/handled, they may be inconsistent on the two sides. It's confusing and not well-documented. We will bring up this for discussion and get back to you once we had some progressive outcome.

@GuyLewin
Copy link
Author

Thanks @voidzcy , let me know if I can help in any way 😄

@voidzcy
Copy link
Contributor

voidzcy commented Nov 10, 2020

Most of the observations described are working as intended, and the way how it works is just the nature of gRPC and HTTP/2 protocol. You can see the gRPC protocol design at gRPC over HTTP2.

Client ends RPC by cancellation (sends a RST_STREAM frame) while server ends RPC by finishing the stream (encodes status to stream trailer and mark END_STREAM). If you look at the interfaces for ClientCall and ServerCall (and their Listeners), you will see the difference. From StreamObserver API's perspective, the outbound onError() semantics for client and server are still the same, which terminates the call/stream.

Information passed to client's outbound onError() is never transmitted to the server, server side only sees the cancellation. The "client cancelled" description is just the canonical message attached by server implementation. After the RST_STREAM frame is sent, the client side call is considered to be closed without waiting for future server messages. The ClientCall.Listener is immediately notified with CANCELLED status including the original cause.

The Throwable passed to server's outbound onError() should be StatusException or StatusRuntimeException as mentioned in its Javadoc. Otherwise, UNKNOWN status code will be used. Its description will be encoded to the stream trailer, but the cause (which may contain server application's information) will not. After the stream trailer with END_STREAM is sent, the server side call is considered to be closed. The ServerCall.Listener is immediately notified. Note this is considered to be a normal completion of the RPC (imagine in HTTP client-server model, server always responds with some status code even if it is 500 Internal Server Error). That means, ServerCall.Listener is actually called with onComplete().

However, ServerCall.Listener's onComplete() does not call anything on server's inbound StreamObserver in this case. (Note for client cancel or half-close, server's inbound StreamObserver is notified correctly by ServerCall.Listener's onCancel() or onHalfClose()). This is something we are trying to discuss. Calling onError() might be a little strange and we need to think about what status code and description to deliver, while calling onCompleted() may make it more confusing for server applications.

@GuyLewin
Copy link
Author

@voidzcy sorry for the delayed response.
It all makes sense, I think it would have been useful to have an example bidirectional streaming client/server (like in my repo) that handles all the errors in the right way, so developers can use it as a reference.
Even though everything works as expected, I still think it's not intuitive for developers to understand how to be resilient to errors.

@suztomo
Copy link
Contributor

suztomo commented May 18, 2021

@voidzcy In #8174, I encountered a situation where client's inbound onError was sometimes not called after client's outbound onError call. This observation defies your explanation below (I assume client's inbound onError is a ClientCall.Listener)

Information passed to client's outbound onError() is never transmitted to the server, ... The ClientCall.Listener is immediately notified with CANCELLED status including the original cause.

Do you happen to know a case where ClientCall.Listener is not called upon client's outbound onError()?

@sanjaypujare
Copy link
Contributor

@voidzcy In #8174, I encountered a situation where client's inbound onError was sometimes not called after client's outbound onError call. This observation defies your explanation below (I assume client's inbound onError is a ClientCall.Listener)

There is actually a ClientCall.Listener class and that's the reference here (I suspect). So that's different from the ResponseObserver's onError callback and I suspect the onError doesn't get called in this case - but let's see what @voidzcy says

Information passed to client's outbound onError() is never transmitted to the server, ... The ClientCall.Listener is immediately notified with CANCELLED status including the original cause.

Do you happen to know a case where ClientCall.Listener is not called upon client's outbound onError()?

@voidzcy
Copy link
Contributor

voidzcy commented May 21, 2021

Do you happen to know a case where ClientCall.Listener is not called upon client's outbound onError()?

One thing I can think of: there is an extremely small time window for the race between outbound onError() and deadline-exceeded cancellation:

boolean delegateToRealCall = true;
Listener<RespT> listenerToClose = null;
synchronized (this) {
// If realCall != null, then either setCall() or cancel() has been called
if (realCall == null) {
@SuppressWarnings("unchecked")
ClientCall<ReqT, RespT> noopCall = (ClientCall<ReqT, RespT>) NOOP_CALL;
setRealCall(noopCall);
delegateToRealCall = false;
// If listener == null, then start() will later call listener with 'error'
listenerToClose = listener;
error = status;
} else if (onlyCancelPendingCall) {
return;
}
}
if (delegateToRealCall) {
delayOrExecute(new Runnable() {
@Override
public void run() {
realCall.cancel(status.getDescription(), status.getCause());
}
});
} else {
if (listenerToClose != null) {
callExecutor.execute(new CloseListenerRunnable(listenerToClose, status));
}
drainPendingCalls();
}

This happens when a real call has not been created (e.g., name resolution has not completed) and the outbound onError() is called to cancel the call. At the same time the deadline-exceeded timer fires to cancel the call. The first candidate entering the critical section sets the real call to no-op and delegateToRealCall to false while the second turns it back to true. Then it goes to cancel the no-op call with listener notification discarded.

/cc @dapengzhang0 @ejona86

@suztomo
Copy link
Contributor

suztomo commented May 21, 2021

@voidzcy Thank you for response. In my case, I didn't find "deadline exceeded" in the log. The problem was (mysteriously) resolved by changing the use of CountDownLatch. #8174 (comment) Thank you for your attention!

@ejona86
Copy link
Member

ejona86 commented May 21, 2021

The code that @voidzcy linked to doesn't appear to be a problem to me, as when it uses noopCall it also uses CloseListenerRunnable which will call the listener.

@suztomo, it sounds like some application code was interacting with the RPC from multiple threads simultaneously. The call object isn't thread-safe, so if that happens all bets are off.

@voidzcy
Copy link
Contributor

voidzcy commented May 21, 2021

Oops, my bad. I overlooked that delegateToRealCall is actual on stack, it's not a shared state between threads. Right, the code doesn't seem to have issues.

@ejona86
Copy link
Member

ejona86 commented Aug 2, 2021

From an API review meeting in January:

  • Documenting the problem ("However, ServerCall.Listener's onComplete() does not call anything on server's inbound StreamObserver in this case.") will cement the current behavior. But changing the behavior introduces pain for users. Let’s punt until someone is actually suffering; we won’t improve the documentation at this point
  • However, there is a legitimate issue understanding how the StreamObservers relate to each other and even what the Throwable implies. We should better document the basics for how the stub works. We can do that in ServerCalls and then @see to it from the generated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants