Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor #8120

Closed
JackOfMostTrades opened this issue Apr 27, 2021 · 5 comments

Comments

@JackOfMostTrades
Copy link
Contributor

JackOfMostTrades commented Apr 27, 2021

What version of gRPC-Java are you using?

1.37.0

What is your environment?

Linux, OpenJDK 1.8
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

What did you expect to see?

In a bidirectional streaming service, I'm observing different a status passed to ServerCall.close(Status, Metadata) versus the status passed to ServerStreamTracer.streamClosed(Status) when a misbehaving client is abruptly terminating the connection (without sending any sort of cancel or reset).

What did you see instead?

The server has a logging interceptor which intercepts the ServerCall.close(Status, Metadata) method and logs the status. The status seen here is Status{code=CANCELLED, description=Cancelling request because of error from client., cause=null}. We also have a ServerStreamTracer installed on the server, but the Status passed to ServerStreamTracer.streamClosed(Status) is Status{code=UNAVAILABLE, description=connection terminated for unknown reason, cause=null}.

Steps to reproduce the bug

I have a minimal reproducer with some instructions here.

Comments

If the client is terminating the connection then I suppose there really is no true status (either from the server or from the client). So I could be convinced that either of these Status codes would be valid. What really threw off our metrics and sent a couple of engineers down a debugging rabbit hole is that we couldn't correlate any request logs on the server (being logged from the ServerInterceptor) with the status metrics being emitted from the ServerStreamTracer. So I think regardless of what Status is ultimately reported for a abruptly closed connection, it should at least be consistent between these two sources.

This may also be related to #7558 .

@JackOfMostTrades JackOfMostTrades changed the title Inconsistent server-side error in ServerStreamTracer vs StreamObserver Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor Apr 27, 2021
@JackOfMostTrades
Copy link
Contributor Author

Edit, updated ticket to reflect inconsistency between ServerStreamTracer and ServerInterceptor (rather than StreamObserver as previously mentioned).

@voidzcy
Copy link
Contributor

voidzcy commented Apr 27, 2021

On the server call layer, it does not preserve the error/reason of the call being terminated. The call is either finished normally (client side half-close) or cancelled (anything other than normal half-close is considered as cancellation). Server call canceling does not propagate the error/reason (ServerCall.Listener.onCancel()).

gRPC is based on HTTP/2, which is designed for client-server communication model (instead of peer-to-peer). It is inherently asymmetric on the two sides. From a server application's perspective, anything happens in the middle of a HTTP request that causes the request to be terminated is a cancellation, no matter if it is caused by timeout, explicit cancellation by the client, network errors, etc. Like HTTP servers, server applications usually do not care the reason of requests being cancelled.

ServerStreamTracer works in a lower level than the server call. It directly interacts with the stream and HTTP traffic. So it knows what's happening on the network and the status seen by ServerStreamTracer tells the real state of the network.

What really threw off our metrics and sent a couple of engineers down a debugging rabbit hole is that we couldn't correlate any request logs on the server (being logged from the ServerInterceptor) with the status metrics being emitted from the ServerStreamTracer.

Seen by consumers from different perspective (server application vs. traffic monitoring), the two layers present different levels of state information. Server calls and streams are of a one-on-one mapping, but the status they deliver is minimally correlated (as server calls throw away the status information and anything other than normal termination is a cancellation).

@JackOfMostTrades
Copy link
Contributor Author

JackOfMostTrades commented Apr 27, 2021

Thanks for the insights @voidzcy . I think I'm understanding the different sources of the different Status values now. If I'm understanding it correctly, I think I have a follow up question/request, which is that it would be nice inside ServerStreamTracer.streamClosed() to be able to discern if the Status actually from the server itself (i.e. a result of the server calling streamObserver.onError(Status.fromCode(Status.Code.UNAVAILABLE).asException())) in which case the status is actually assed verbatim through to ServerStreamTracer.streamClosed() versus a Status that's derived from a underlying transport issue (in the case of the example I posted, it's coming from NettyServerHandler.channelInactive()). We can obviously inspect the description on the status, but that feels brittle.

@voidzcy
Copy link
Contributor

voidzcy commented Apr 28, 2021

The Status that pops through ServerStreamTracer.streamClosed() intends to keep the original cause that leads to the stream being terminated. So it should be reasonable to carry the Status from server's outbound StreamObserver.onError().

Status is being used general-purposely for carrying end-of-life information. The Status code itself doesn't mean to indicate anything about whether something is happening locally or from the network. The description/cause of Status does play a significant role and should not be ignore. There is a convention that we follow and also suggested to users: never use a Status without description/cause.

@voidzcy
Copy link
Contributor

voidzcy commented May 16, 2021

Closing, no action is needed for the current work-as-intended behaviors. We would be improving some doc according to the discussion after #7558.

@voidzcy voidzcy closed this as completed May 16, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants