-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Http2StreamException ENHANCE_YOUR_CALM during bidirectional streaming stress test after ca. 200 server calls #2010
Comments
From what i can deduce, it seems grpc-dotnet depends on the client calling RequestStream.CompleteAsync(); before the server call returns? This is pretty weird to rely on the client to tell the server to free its resources, before the server call returns. If the server call returns without being told by the client that it should free its resources, it does not (at least not immediately). The server knows when the call returns (better than anyone?), and it could just immediately free the resources then, without relying on the client calling calling RequestStream.CompleteAsync(), I would think? As it is made now, it may be possible to use this behavior to craft a DOS attack? |
I'm pretty sure this is an expected error. What you're doing is creating a lot of gRPC calls that aren't gracefully complete. Read more here: https://learn.microsoft.com/en-us/aspnet/core/grpc/client?view=aspnetcore-7.0#bi-directional-streaming-call Because the server is ending the request before the client, it has to keep some state about the request active for a time. This is required because it is a connection error for an HTTP/2 connection to send data to an unknown stream ID. Kestrel keeps up to 200ish streams active in the background, eventually cleaning them up after a grace period to allow in incoming data that might still be in flight to be drained. If the server exceeds that limit, which is happening in your stress test, the server decides the connection is being abused and closes it with ENHANCE_YOUR_CALM. That signals to the client that it is doing something wrong and should change how it makes HTTP requests. |
Yes, if it was not for the native server working correctly, I would have bought stat story. |
The server must remember the stream until the client indicates it is no longer sending data because it is an HTTP/2 connection protocol error to receive data to an unknown or closed stream ID.
You don't. The server method completes gracefully when it exits after the request stream MoveNext returns false. Not only is this the cause of this error, but you'll also get worse performance. The state of gracefully completed streams is reused while reset streams are thrown away. If you turn on logging I expect you'll see lots of log messages saying "the application completed without reading the entire request body." That means the request wasn't completely gracefully.
I don't know how the native server is implemented, but a server can't keep tracking an unlimited number of ungracefully closed (i.e. reset) streams. If it does, then that's an attack vector that a malicious client could use to use up all the memory on the server. Or maybe the native server doesn't track them and is ok receiving DATA frames to unknown or closed stream IDs. |
But why should the server track a stream after the server method has returned? In case the client suddenly make a call\write to it? But why? The server method has returned and there is no one there to read from the stream anyways, so why keep it open? And how long after the server method has returned does the server expect that a client would write to this stream that no one will be reading from anyways? 1 second? 1 hour? I don't get it. If a client would happen to write to a stream after a stream is closed, this would off course fail, but that would in no way exhaust the servers resources like it does now. It would simply be a reply from the server: sorry, that stream does exist. |
But since this is a radical difference between native server and grpc-dotnet, maybe there is possible to contact someone in grpc.io to ensure both servers work the same (eg. either have native grpc fail the same way or make grpc-dotnet not fail). |
"then that's an attack vector that a malicious client could use to use up all the memory on the server." So for the native implementation, calling RequestStream.CompleteAsync() is purely optional. Maybe this call only exist so that the client can end the stream before the server returns, as a micro optimization, or something. So yes, I see that it is possible to workaround this problem for me with having the server tell the client to call RequestStream.CompleteAsync(). Even so, I think this is a too big difference between native and grpc-dotnet to just ignore. |
You didn't consider the possibility that the native gRPC server doesn't track reset streams and silently ignores frames sent to non-open streams. I'm not familiar with the native gRPC's HTTP/2 layer, but it seems to me that it is ignoring DATA frames when it can't match them to a stream. Kestrel's priorities are security and adherence to HTTP specifications. Those priorities are causing you to get an error, but it's in a stress test that isn't gracefully completing streams. Kestrel isn't going to be changed to reduce security or ignore the HTTP/2 spec for this situation. |
But maybe it could be a setting (opt-in) or something that can be done via ServerCallContext.GetHttpContext() to make it close the stream immediately so it can behave the same? |
I tried to read the http2 spec. But when the server call return, it should write a END_STREAM frame\flag to the client and initiate the close. I don't see why this should not work with a server initiated close. The reason for the stream ending up in this 200 streams list is because it is waiting for the END_STREAM frame\flag from the client? Maybe there is some logic missing within the client in the ResponseStream.MoveNext() that call, so it is not returning with false? I don't know:-) "Because the server is ending the request before the client, it has to keep some state about the request active for a time. This is required because it is a connection error for an HTTP/2 connection to send data to an unknown stream ID. Kestrel keeps up to 200ish streams active in the background, eventually cleaning them up after a grace period to allow in incoming data that might still be in flight to be drained." I will try to test some more with the native client <-> dotnet server and vice versa to see if this changes anything. Until now I have only tested native agains native and dotnet agains dotnet. Found a related issue: https://stackoverflow.com/questions/67806047/best-way-to-return-bidirectional-grpc-method-net-5-c-sharp |
Native client -> dotnet server give a different error at almost the same place: |
When the server completes a response without completely reading the request it sends RST_STREAM to the client, not END_STREAM. It doesn't enter a half-closed state, it goes immediately to close. The client state also goes immediately to close and doesn't send any more frames to the server. |
I tried a ninja hack, for fun. In the server handler I added this at the end:
This works well with the native client. Fast, no handled exception in the client (such as cancellations). Testet up to 300k echoes, no issues. |
I updated the github example and added a native client and I added what I think is supposed to be the correct workaround for this problem (server write to the client that the client should "hang up" (call RequestStream.CompleteAsync() before server return). This seems to work for the dotnet client but the native client still seems to fail with this after approx 239 echoes: So I still feel this is a bug. At least there seem to be no way to make the native client work with the grpc server without the server failing (without using ninja hacks). Writing to the client from the server and telling the client to "hang up" seems to work for the dotnet client but this feels weird and not logical too. |
Updated with Python client and server examples. python client -> dotnet server give similar problem. In this case the client does break in response to "quit", but still fail (similar to native client -> dotnet server, that also fail even if client "hang up" first
Python example losely based on RouteChat in https://github.com/grpc/grpc/tree/v1.50.0/examples/python/route_guide |
I don't see anything actionable here. Closing because of lack of activity. |
So by your logic it is probably all the other grpc implementations that are wrong. Even if that is true, which seems strange, interoperability should have been the main goal IMO (it does not help to be "right" if it does not work). |
Hello! Was it resolved in the end? I had the same problem here. Status(StatusCode="ResourceExhausted", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. Http2StreamException: The HTTP/2 server reset the stream. HTTP/2 error code 'ENHANCE_YOUR_CALM' (0xb).", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.IO.IOException: The request was aborted. ---> System.Net.Http.Http2StreamException: The HTTP/2 server reset the stream. HTTP/2 error code 'ENHANCE_YOUR_CALM' (0xb). --- End of inner exception stack trace --- at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException) at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState() at System.Net.Http.Http2Connection.Http2Stream.TryEnsureHeaders() at System.Net.Http.Http2Connection.Http2Stream.ReadResponseHeadersAsync(CancellationToken cancellationToken) at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at Grpc.Net.Client.Internal.GrpcCall |
@JamesNK sorry for bothering you with this but I ran into this error when trying to abort a client streaming call, like so
I can't really keep reading the request stream when I want to abort the call. So I'm thinking here that I will have to change this into a bidirectional streaming call so that I can coordinate with the client because there's no way to return a response and at the same time keeping reading and waiting for the client to end the request stream. If it helps, this is a file upload. The server can sometimes quickly check that it already has the file and thus abort the file upload since it is redundant. In this case I thought it would be sufficient to have a client streaming call but given what's been said here the expectation is that the client is supposed to complete the request stream first. |
Yes. A disadvantage of server streaming calls is they can only be stopped from the client by aborting. |
Status update. Still fails: No longer failing: All other combinations works fine. So I guess it is status Q. |
Minimal code to repro here: https://github.com/osexpert/GrpcDotNetProblem
Can not always repro in debug run via Visual Studio. Can most often repro when run in debug from command line. Fails the same way in NETCORE3.0, NET50, NET60.
Removing the logging by calling:
builder.Logging.ClearProviders();
can sometime be needed because else the problem may not happen (the logging can slow it down enough so it still accidentally works)
The error you should get:
(String[] args)Message from server: Via server : hello 200
Unhandled exception. Grpc.Core.RpcException: Status(StatusCode="ResourceExhausted", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. Http2StreamException: The HTTP/2 server reset the stream. HTTP/2 error code 'ENHANCE_YOUR_CALM' (0xb).", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.IO.IOException: The request was aborted.
---> System.Net.Http.Http2StreamException: The HTTP/2 server reset the stream. HTTP/2 error code 'ENHANCE_YOUR_CALM' (0xb).
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
at System.Net.Http.Http2Connection.Http2Stream.TryEnsureHeaders()
at System.Net.Http.Http2Connection.Http2Stream.ReadResponseHeadersAsync(CancellationToken cancellationToken)
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Grpc.Net.Client.Internal.GrpcCall
2.RunCall(HttpRequestMessage request, Nullable
1 timeout)")at Grpc.Net.Client.Internal.HttpContentClientStreamReader`2.MoveNextCore(CancellationToken cancellationToken)
at GrpcClient.Program.Main(String[] args) in E:\git\GrpcDotNetProblem\GrpcClient\Program.cs:line 24
at GrpcClient.Program.
When it happens in debug run via Visual Studio i can see in the Paralell Stacks viewer:
1 Process, 200 Async Logical Stack, Scheduled, Http2OutputProducer.ProcessDataWrites()
I am guessing it is related to bidirectional streaming somehow.
I also see this in log after every method call is returned, but not sure if related to the ENHANCE_YOUR_CALM problem:
info: Microsoft.AspNetCore.Server.Kestrel[32]
Connection id "{someId}", Request id "{someId}": the application completed without reading the entire request body.
The native Grpc libraries does not have this problem, they can run forever.
I can not switch from the native Grpc libraries and to grpc-dotnet the way it currently works.
The text was updated successfully, but these errors were encountered: