Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogleAsyncClient should shutdown gracefully #15072

Open
lambdai opened this issue Feb 17, 2021 · 7 comments
Open

GoogleAsyncClient should shutdown gracefully #15072

lambdai opened this issue Feb 17, 2021 · 7 comments
Assignees
Labels
enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue

Comments

@lambdai
Copy link
Contributor

lambdai commented Feb 17, 2021

Destroying GoogleAsyncClientThreadLocal put the reset streams in deferred deleted list. What's more, it expect the stream objects not destroyed. If these streams are destroyed, the GrpcStream, which hides google grpc and envoy grpc, holds a dangling pointer to the destroyed stream. The xds client may call sendMessage later and crash Envoy.

Essentially dispatcher should be able to clean up the deferred deleted list because the deferred deleted object may hold references to ssl ctx object. We want to clean it up in shutdown. Also this allows dispatcher to easier to detect unexpected behavior after shutdown.

@lambdai lambdai added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Feb 17, 2021
@lambdai lambdai changed the title allow google grpc local shutdown GoogleAsyncClient should shutdown gracefully Feb 17, 2021
@antoniovicente antoniovicente self-assigned this Feb 17, 2021
@junr03 junr03 removed the triage Issue requires triage label Feb 17, 2021
@lambdai
Copy link
Contributor Author

lambdai commented Feb 17, 2021

Stack frames: please ignore the dispatcher behavior.

[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:233] resetStream
  [20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:401] Stream cleanup with 1 in-flight tags
  [24][debug][grpc] [source/common/grpc/google_async_client_impl.cc:73] completionThread exiting
  [20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:40] Joining completionThread
  [20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:42] Joined completionThread
  [20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:390] Deferred delete
  [20][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1065] shutting down thread local cluster manager
  [20][info][main] [source/server/server.cc:801] exiting
  [20][debug][init] [source/common/init/watcher_impl.cc:31] RunHelper destroyed
  [20][debug][main] [source/server/server.cc:139] destroying listener manager
  [20][debug][init] [source/common/init/target_impl.cc:34] target LDS destroyed
  [20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher worker_0
  [20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info ads_cluster
  [20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info dummy_cluster
  [20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info eds_cluster
  [20][debug][main] [source/server/server.cc:141] destroyed listener manager
  [20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:162] GoogleAsyncStreamImpl destruct
  [20][debug][main] [source/common/event/dispatcher_impl.cc:345] Round 1: destroyed 1 deferred deletables, 1 post callbacks and 4 thread local deletables in shutdown
  [20][debug][main] [source/common/event/dispatcher_impl.cc:345] Round 2: destroyed 0 deferred deletables, 0 post callbacks and 0 thread local deletables in shutdown
  [20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher workers_guarddog_thread
  [20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher main_thread_guarddog_thread
  [20][debug][main] [source/common/access_log/access_log_manager_impl.cc:16] destroying access logger /dev/null
  [20][debug][main] [source/common/access_log/access_log_manager_impl.cc:19] destroyed access loggers
  [20][debug][init] [source/common/init/target_impl.cc:34] target RTDS ads_rtds_layer destroyed
  source/common/grpc/typed_async_client.cc:15:11: runtime error: member call on address 0x618000105480 which does not point to an object of type 'Envoy::Grpc::RawAsyncStream'
  0x618000105480: note: object has invalid vptr
   d5 01 80 10  1d 02 80 71 00 00 00 00  78 16 b1 1e 00 00 00 00  70 1b 2d 00 30 60 00 00  00 be be be
                ^~~~~~~~~~~~~~~~~~~~~~~
                invalid vptr
      #0 0x198b5605 in Envoy::Grpc::Internal::sendMessageUntyped(Envoy::Grpc::RawAsyncStream*, google::protobuf::Message const&, bool) /proc/self/cwd/source/common/grpc/typed_async_client.cc:15:11
      #1 0x1980737b in Envoy::Grpc::AsyncStream<envoy::service::discovery::v3::DiscoveryRequest>::sendMessage(google::protobuf::Message const&, bool) /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/grpc/_virtual_includes/typed_async_client_lib/common/grpc/typed_async_client.h:41:5
      #2 0x197f478b in Envoy::Config::GrpcStream<envoy::service::discovery::v3::DiscoveryRequest, envoy::service::discovery::v3::DiscoveryResponse>::sendMessage(envoy::service::discovery::v3::DiscoveryRequest const&) /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:76:16
      #3 0x197dcc41 in Envoy::Config::GrpcMuxImpl::sendDiscoveryRequest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /proc/self/cwd/source/common/config/grpc_mux_impl.cc:61:16
      #4 0x197e5d4a in Envoy::Config::GrpcMuxImpl::drainRequests() /proc/self/cwd/source/common/config/grpc_mux_impl.cc:344:5
      #5 0x197df39f in Envoy::Config::GrpcMuxImpl::queueDiscoveryRequest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /proc/self/cwd/source/common/config/grpc_mux_impl.cc:300:3
      #6 0x198083bf in Envoy::Config::GrpcMuxImpl::GrpcMuxWatchImpl::~GrpcMuxWatchImpl() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/config/_virtual_includes/grpc_mux_lib/common/config/grpc_mux_impl.h:96:17
      #7 0x198085bf in Envoy::Config::GrpcMuxImpl::GrpcMuxWatchImpl::~GrpcMuxWatchImpl() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/config/_virtual_includes/grpc_mux_lib/common/config/grpc_mux_impl.h:93:34
      #8 0x175a4bfe in std::__1::default_delete<Envoy::Config::GrpcMuxWatch>::operator()(Envoy::Config::GrpcMuxWatch*) const /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2262:5
      #9 0x175a4af5 in std::__1::unique_ptr<Envoy::Config::GrpcMuxWatch, std::__1::default_delete<Envoy::Config::GrpcMuxWatch> >::reset(Envoy::Config::GrpcMuxWatch*) /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2517:7
      #10 0x175a22f1 in std::__1::unique_ptr<Envoy::Config::GrpcMuxWatch, std::__1::default_delete<Envoy::Config::GrpcMuxWatch> >::~unique_ptr() /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2471:19
      #11 0x197da1ee in Envoy::Config::GrpcSubscriptionImpl::~GrpcSubscriptionImpl() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/config/_virtual_includes/grpc_subscription_lib/common/config/grpc_subscription_impl.h:20:7
      #12 0x197da25f in Envoy::Config::GrpcSubscriptionImpl::~GrpcSubscriptionImpl() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/config/_virtual_includes/grpc_subscription_lib/common/config/grpc_subscription_impl.h:20:7
      #13 0x15d8f56e in std::__1::default_delete<Envoy::Config::Subscription>::operator()(Envoy::Config::Subscription*) const /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2262:5
      #14 0x15d8f385 in std::__1::unique_ptr<Envoy::Config::Subscription, std::__1::default_delete<Envoy::Config::Subscription> >::reset(Envoy::Config::Subscription*) /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2517:7
      #15 0x15d8bb61 in std::__1::unique_ptr<Envoy::Config::Subscription, std::__1::default_delete<Envoy::Config::Subscription> >::~unique_ptr() /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2471:19
      #16 0x1a97c585 in Envoy::Runtime::RtdsSubscription::~RtdsSubscription() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/runtime/_virtual_includes/runtime_lib/common/runtime/runtime_impl.h:191:8
      #17 0x1a97c5ef in Envoy::Runtime::RtdsSubscription::~RtdsSubscription() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/runtime/_virtual_includes/runtime_lib/common/runtime/runtime_impl.h:191:8
      #18 0x1a98f35e in std::__1::default_delete<Envoy::Runtime::RtdsSubscription>::operator()(Envoy::Runtime::RtdsSubscription*) const /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2262:5
      #19 0x1a98f175 in std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >::reset(Envoy::Runtime::RtdsSubscription*) /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2517:7
      #20 0x1a979a71 in std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >::~unique_ptr() /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2471:19
      #21 0x1a98e41f in std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > >::destroy(std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >*) /usr/lib/llvm-my11/bin/../include/c++/v1/memory:1811:92
      #22 0x1a98e3f0 in void std::__1::allocator_traits<std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > > >::__destroy<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > >(std::__1::integral_con
      #23 0x1a98e3d0 in void std::__1::allocator_traits<std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > > >::destroy<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > >(std::__1::allocator<std:
      #24 0x1a98e351 in std::__1::__vector_base<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >, std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > > >::__destruct_at_end(std::__1::unique_ptr<En
      #25 0x1a98e212 in std::__1::__vector_base<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >, std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > > >::clear() /usr/lib/llvm-my11/bin/../include
      #26 0x1a98ddcc in std::__1::__vector_base<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >, std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > > >::~__vector_base() /usr/lib/llvm-my11/bin/.
      #27 0x1a979ca7 in std::__1::vector<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> >, std::__1::allocator<std::__1::unique_ptr<Envoy::Runtime::RtdsSubscription, std::__1::default_delete<Envoy::Runtime::RtdsSubscription> > > >::~vector() /usr/lib/llvm-my11/bin/../include/c++/
      #28 0x1a97c455 in Envoy::Runtime::LoaderImpl::~LoaderImpl() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/runtime/_virtual_includes/runtime_lib/common/runtime/runtime_impl.h:230:7
      #29 0x1a97c4ff in Envoy::Runtime::LoaderImpl::~LoaderImpl() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/runtime/_virtual_includes/runtime_lib/common/runtime/runtime_impl.h:230:7
      #30 0x1597dd2e in std::__1::default_delete<Envoy::Runtime::Loader>::operator()(Envoy::Runtime::Loader*) const /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2262:5
      #31 0x1597db45 in std::__1::unique_ptr<Envoy::Runtime::Loader, std::__1::default_delete<Envoy::Runtime::Loader> >::reset(Envoy::Runtime::Loader*) /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2517:7
      #32 0x15921be1 in std::__1::unique_ptr<Envoy::Runtime::Loader, std::__1::default_delete<Envoy::Runtime::Loader> >::~unique_ptr() /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2471:19
      #33 0x15942ef4 in Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader>::~ScopedInjectableLoader() /proc/self/cwd/bazel-out/k8-dbg/bin/source/common/singleton/_virtual_includes/threadsafe_singleton/common/singleton/threadsafe_singleton.h:81:64
      #34 0x15942def in std::__1::default_delete<Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader> >::operator()(Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader>*) const /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2262:5
      #35 0x15942c85 in std::__1::unique_ptr<Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader>, std::__1::default_delete<Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader> > >::reset(Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader>*) /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2517:7
      #36 0x15915f81 in std::__1::unique_ptr<Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader>, std::__1::default_delete<Envoy::ScopedInjectableLoader<Envoy::Runtime::Loader> > >::~unique_ptr() /usr/lib/llvm-my11/bin/../include/c++/v1/memory:2471:19
      #37 0x158a35d1 in Envoy::Server::InstanceImpl::~InstanceImpl() /proc/self/cwd/source/server/server.cc:143:1
      #38 0x1557c28e in Envoy::IntegrationTestServerImpl::createAndRunEnvoyServer(Envoy::OptionsImpl&, Envoy::Event::TimeSystem&, std::__1::shared_ptr<Envoy::Network::Address::Instance const>, Envoy::ListenerHooks&, Envoy::Thread::BasicLockable&, Envoy::Server::ComponentFactory&, std::__1::unique_ptr<Envoy::Random::RandomGenerator, std::__1::def
      #39 0x1557b134 in Envoy::IntegrationTestServer::threadRoutine(Envoy::Network::Address::IpVersion, bool, std::__1::optional<std::__1::reference_wrapper<Envoy::ProcessObject> >, Envoy::Server::FieldValidationConfig, unsigned int, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1l> >, Envoy::Server::DrainStrategy, std::__1::shared_p
      #40 0x15581f11 in Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::__1::function<void ()>, bool, bool, std::__1::optional<std::__1::reference_wrapper<Envoy::ProcessObject> >, Envoy::Server::FieldValidationConfig, unsigned int, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1l> >, Envoy::Server::DrainS
      #41 0x15581c0d in decltype(std::__1::forward<Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::__1::function<void ()>, bool, bool, std::__1::optional<std::__1::reference_wrapper<Envoy::ProcessObject> >, Envoy::Server::FieldValidationConfig, unsigned int, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1
      #42 0x15581b6d in void std::__1::__invoke_void_return_wrapper<void>::__call<Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::__1::function<void ()>, bool, bool, std::__1::optional<std::__1::reference_wrapper<Envoy::ProcessObject> >, Envoy::Server::FieldValidationConfig, unsigned int, std::__1::chrono::duration<l
      #43 0x15581b21 in std::__1::__function::__alloc_func<Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::__1::function<void ()>, bool, bool, std::__1::optional<std::__1::reference_wrapper<Envoy::ProcessObject> >, Envoy::Server::FieldValidationConfig, unsigned int, std::__1::chrono::duration<long long, std::__1::rat
      #44 0x1557f3c7 in std::__1::__function::__func<Envoy::IntegrationTestServer::start(Envoy::Network::Address::IpVersion, std::__1::function<void ()>, bool, bool, std::__1::optional<std::__1::reference_wrapper<Envoy::ProcessObject> >, Envoy::Server::FieldValidationConfig, unsigned int, std::__1::chrono::duration<long long, std::__1::ratio<1l,
      #45 0x14f24ad7 in std::__1::__function::__value_func<void ()>::operator()() const /usr/lib/llvm-my11/bin/../include/c++/v1/functional:1884:16
      #46 0x14f232b9 in std::__1::function<void ()>::operator()() const /usr/lib/llvm-my11/bin/../include/c++/v1/functional:2556:12
      #47 0x1d825dec in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::'lambda'(void*)::operator()(void*) const /proc/self/cwd/source/common/common/posix/thread_impl.cc:49:11
      #48 0x1d825d63 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::'lambda'(void*)::__invoke(void*) /proc/self/cwd/source/common/common/posix/thread_impl.cc:48:9
      #49 0x7f7c768a4608 in start_thread /build/glibc-ZN95T4/glibc-2.31/nptl/pthread_create.c:477:8
      #50 0x7f7c767ae292 in clone /build/glibc-ZN95T4/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior source/common/grpc/typed_async_client.cc:15:11 in
~                                                                                                                                                                                                                                                                                                                                                          
~                                                                                                                                                             

@lambdai
Copy link
Contributor Author

lambdai commented Feb 17, 2021

This is the lucky sequence that deferred deleted happens before dispatcher shutdown. See google_async_client_impl.cc:390] Deferred delete is immediately followed byGoogleAsyncStreamImpl destruct

[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:401] Stream cleanup with 0 in-flight tags
[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:390] Deferred delete
[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:162] GoogleAsyncStreamImpl destruct
[12][info][testing] [test/integration/server.cc:238] stopping integration test server
[20][info][main] [source/server/server.cc:809] shutting down server instance
[20][info][main] [source/server/server.cc:755] main dispatch loop exited
[20][debug][main] [source/server/server.cc:203] flushing stats
[20][debug][main] [source/server/server.cc:213] Envoy is not fully initialized, skipping histogram merge and flushing stats
[20][debug][init] [source/common/init/watcher_impl.cc:31] ClusterImplBase destroyed
[20][debug][init] [source/common/init/watcher_impl.cc:31] init manager Cluster ads_cluster destroyed
[20][debug][config] [source/common/config/grpc_mux_impl.cc:290] No stream available to queueDiscoveryRequest for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
[20][debug][init] [source/common/init/watcher_impl.cc:31] ClusterImplBase destroyed
[20][debug][init] [source/common/init/watcher_impl.cc:31] init manager Cluster eds_cluster destroyed
[20][debug][init] [source/common/init/watcher_impl.cc:31] ClusterImplBase destroyed
[20][debug][init] [source/common/init/watcher_impl.cc:31] init manager Cluster dummy_cluster destroyed
[20][debug][config] [source/common/config/grpc_mux_impl.cc:290] No stream available to queueDiscoveryRequest for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
[20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info cluster_0
[20][debug][init] [source/common/init/watcher_impl.cc:31] ClusterImplBase destroyed
[20][debug][init] [source/common/init/watcher_impl.cc:31] init manager Cluster cluster_0 destroyed
[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:40] Joining completionThread
[24][debug][grpc] [source/common/grpc/google_async_client_impl.cc:73] completionThread exiting
[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:42] Joined completionThread
[20][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1065] shutting down thread local cluster manager
[20][info][main] [source/server/server.cc:801] exiting
[20][debug][init] [source/common/init/watcher_impl.cc:31] RunHelper destroyed
[20][debug][main] [source/server/server.cc:139] destroying listener manager
[20][debug][init] [source/common/init/target_impl.cc:34] target LDS destroyed
[20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher worker_0
[20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info ads_cluster
[20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info dummy_cluster
[20][debug][upstream] [source/common/upstream/upstream_impl.cc:928] Schedule destroy cluster info eds_cluster
[20][debug][main] [source/server/server.cc:141] destroyed listener manager
[20][debug][main] [source/common/event/dispatcher_impl.cc:345] Round 1: destroyed 0 deferred deletables, 0 post callbacks and 4 thread local deletables in shutdown
[20][debug][main] [source/common/event/dispatcher_impl.cc:345] Round 2: destroyed 0 deferred deletables, 0 post callbacks and 0 thread local deletables in shutdown
[20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher workers_guarddog_thread
[20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher main_thread_guarddog_thread
[20][debug][main] [source/common/access_log/access_log_manager_impl.cc:16] destroying access logger /dev/null
[20][debug][main] [source/common/access_log/access_log_manager_impl.cc:19] destroyed access loggers
[20][debug][init] [source/common/init/target_impl.cc:34] target RTDS ads_rtds_layer destroyed
[20][debug][config] [source/common/config/grpc_mux_impl.cc:290] No stream available to queueDiscoveryRequest for type.googleapis.com/envoy.service.runtime.v3.Runtime
[20][debug][grpc] [source/common/grpc/google_async_client_impl.cc:111] Client teardown, resetting streams
[20][debug][init] [source/common/init/watcher_impl.cc:31] init manager RTDS destroyed
[20][debug][init] [source/common/init/watcher_impl.cc:31] RTDS destroyed
[20][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher main_thread
[20][debug][init] [source/common/init/watcher_impl.cc:31] init manager Server destroyed
[12][debug][main] [source/common/event/dispatcher_impl.cc:81] destroying dispatcher test_thread
[       OK ] IpVersionsClientTypeDelta/AdsIntegrationTestWithRtdsAndSecondaryClusters.Basic/2 (437 ms)
[----------] 1 test from IpVersionsClientTypeDelta/AdsIntegrationTestWithRtdsAndSecondaryClusters (437 ms total)

@antoniovicente
Copy link
Contributor

I'll see if I can take a look at the info above soon after your changes in #14954 are merged in.

Based on your mention of deferred delete vs other operations, I think that it may be helpful to make changes similar to #14293 to make these races more likely as we attempt to fix the underlying issues.

Sequencing shutdown such that the async clients are shutdown before the main thread may help. Thanks for the sample test runs that illustrate the problem.

@antoniovicente
Copy link
Contributor

Further changes that we may want to consider:

  • Discourage calls to deleteInDispatcherThread from the dispatcher thread if possible to avoid the possibility of call chaining
  • set "shutdown_called_ = true" as soon as we enter the Dispatcher shutdown method
  • Disallow chaining of deferred deletions
  • Disallow deleteInDispatcherThread from scheduling deferred deletions
  • Disallow the scheduling of deferred deletions or deleteInDispatcherThread when destroying post callbacks.
  • perform cross thread deletes before executing post callbacks to provide some determinism to operations from the two lists of deferred work.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 21, 2021
@lambdai lambdai added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Mar 22, 2021
@ggreenway
Copy link
Contributor

Ownership and reference chains that result in this crash:

GoogleAsyncStreamImpl (RawAsyncStream)

  • non-owning referenced by AsyncStream (typed_async_client.h) RawAsyncStream* stream_
  • owned by GrpcStream Grpc::AsyncStream<RequestProto> stream_{}
  • owned by GrpcMuxImpl GrpcStream<envoy::service::discovery::v3::DiscoveryRequest, envoy::service::discovery::v3::DiscoveryResponse> grpc_stream_
  • owned by config subscription
  • owned by filter chain config

GoogleAsyncStreamImpl

  • owned by GoogleAsyncClientThreadLocal (absl::node_hash_set<GoogleAsyncStreamImpl*> streams_)
  • owned by ThreadLocal.

During shutdown ThreadLocal is destructed before filter chain configs, which leaves a dangling reference.

@ggreenway
Copy link
Contributor

Possible fix: b1e38d7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

4 participants