-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoogleAsyncClient should shutdown gracefully #15072
Comments
Stack frames: please ignore the dispatcher behavior.
|
This is the lucky sequence that deferred deleted happens before dispatcher shutdown. See
|
I'll see if I can take a look at the info above soon after your changes in #14954 are merged in. Based on your mention of deferred delete vs other operations, I think that it may be helpful to make changes similar to #14293 to make these races more likely as we attempt to fix the underlying issues. Sequencing shutdown such that the async clients are shutdown before the main thread may help. Thanks for the sample test runs that illustrate the problem. |
Further changes that we may want to consider:
|
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
Ownership and reference chains that result in this crash:
During shutdown ThreadLocal is destructed before filter chain configs, which leaves a dangling reference. |
Possible fix: b1e38d7 |
Destroying GoogleAsyncClientThreadLocal put the reset streams in deferred deleted list. What's more, it expect the stream objects not destroyed. If these streams are destroyed, the GrpcStream, which hides google grpc and envoy grpc, holds a dangling pointer to the destroyed stream. The xds client may call sendMessage later and crash Envoy.
Essentially dispatcher should be able to clean up the deferred deleted list because the deferred deleted object may hold references to ssl ctx object. We want to clean it up in shutdown. Also this allows dispatcher to easier to detect unexpected behavior after shutdown.
The text was updated successfully, but these errors were encountered: