-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc transport closing issues #1462
Comments
There a wiiiiiide range of reasons explaining why the transport is closing:
Do you have any reason to suspect that this is a problem in our libraries, specifically? |
We're using this Google Cloud Library in Google App Engine Flexible app with no special configuration or anything. Everything is standard in this deployment. The application is running on the default GAE network and I don't believe there is a proxy involved in network connections to the Cloud services... I can't think of anything we're doing that's special that would cause it. I don't know if it's specific to the GCloud libraries or gRPC, but I've also been discussing on this gRPC issue: grpc/grpc-go#2663. I don't know how to diagnose this. |
Oh right, good points. Hmmm... I'm also not sure how to diagnose this. Let me check with some folks and see if I can find how to proceed, or who can help you out. |
Hi @mikesun, Do you have some code to help reproduce or help us figure out what's happening? It looks like your context is being canceled - where is that happening? This would be somewhere in your code (the spanner/tasks library doesn't/shouldn't cancel any contexts, but just pass yours right through). The "transport is closing" messages are normal operation. From the logs you pasted, it doesn't look like they correlate to actual failed RPCs. |
Hi @broady, I should provide a more detailed explanation on the issues we're encountering. On 5/26/19, from 1:20AM to 1:44AM EST we saw a large number of errors relating to requests to Spanner and Cloud Tasks. They were all context cancellation related, e.g.: Spanner errors:
Cloud Task errors:
We are not doing any explicit context cancellations in our HTTP server app. We are using the the Go
We do configure a max idle connections on the http transport:
My thought was that the context cancellations are coming from HTTP request timeouts due to blocking/hanging RPC calls. Also, this interval on 5/26/19 is the only extended period of time in which we had a burst of these errors. Perhaps there was a network issue during that time? I checked the Google Cloud service status dashboard and did not see any reported incidents during that time though. Other than that period on 5/26/19, we see infrequent Cloud Task errors (one every couple of days) that are seemingly gRPC related:
Digging into those errors is how I arrived at the gRPC issue grpc/grpc-go#2663. I enabled gRPC logging and saw the GRPC transport disconnect messages I posted earlier:
I'm not sure if those transport disconnect/reconnection messages indicate any issue or they can be safely ignored, but this comment by a gRPC developer gave me pause: grpc/grpc-go#2663 (comment) Maybe there isn't a gRPC transport issue but we just had an isolated network issue on 5/26/19? I'm still surprised at the frequency of gRPC transport connection resets on our GAE Flexible app as I don't see those occur as frequently when running from my local machine. The codebase is large, but I could provide a sample app based on it if it would help. Any help would be greatly appreciated. |
cc @olavloite |
I can reproduce this in a situation with very low load (in this case: 1 query per 5 minutes). And my log shows exactly the same as @mikesun's, i.e. a TRANSIENT_FAILURE and then CONNECT, READY. The timing is also similar. The problem occurs at intervals of whole minutes, i.e. ending on :38 seconds every time. My hunch is that the log entries are cause by the session pool maintainer that runs exactly once per minute. My log:
(The COL1, code:INT64 string_value:"1" etc. is from my test query). |
It seems that this is cause by Cloud Spanner closing the network connection serverside if there has been no activity for a certain amount of time. If I run the same test application against a local test server, I do not get the same messages. According to the code where the message is logged, it should also not be considered an error, hence also the fact that it is logged at INFO level. See https://github.com/grpc/grpc-go/blob/73b304d882a0822aaeb3c982c747563777e79586/internal/transport/controlbuf.go#L430 for the place where the error is logged. Closing this issue as this is not something that is actionable in the client library. |
Client
Spanner
Cloud Tasks
Describe Your Environment
Google App Engine Flexible.
Pinned dependencies:
cloud.google.com/go v0.40.0
google.golang.org/grpc v1.21.1
Expected Behavior
No connection errors and GRPC disconnect errors.
Actual Behavior
Seeing both Spanner connection and Cloud Tasks connection errors that seem grpc related:
I enabled grpc logging and am observing several connections dying every few minutes:
At first it seems related to grpc/grpc-go#2663, but the that retry backoff issue is fixed in the grpc version we are using.
The text was updated successfully, but these errors were encountered: