Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: destroy on main thread #14954

Merged
merged 43 commits into from
Feb 23, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
b474d23
destroy hosts on master
lambdai Nov 12, 2020
7b933b1
tobetrypost
lambdai Nov 15, 2020
65ed52e
to tryPost
lambdai Nov 19, 2020
df9beaa
revert hosts guard
lambdai Nov 20, 2020
3af5aac
Merge branch 'master' into completedestroyonmaster
lambdai Nov 20, 2020
b647d0c
fix cluster test
lambdai Nov 20, 2020
8c6266d
fix server fuzz test
lambdai Nov 20, 2020
d8d639c
Merge branch 'main' into clusterdestory
lambdai Feb 3, 2021
564b17a
cleanup
lambdai Feb 5, 2021
8e380ce
remove extra runPostCallbacks() in run
lambdai Feb 5, 2021
bcad62e
remove exit flag in dispatcher
lambdai Feb 5, 2021
50ec141
rename to movePost returning void, tests failing
lambdai Feb 5, 2021
4746d53
relax ssl manager required empty context
lambdai Feb 5, 2021
0ca27ec
format
lambdai Feb 5, 2021
ec6eb4a
revert accidentally touched files
lambdai Feb 5, 2021
2f26bf4
add preShutdown
lambdai Feb 6, 2021
63b063a
fix typo
lambdai Feb 6, 2021
75c257a
fixing server tests
lambdai Feb 6, 2021
dd4b151
clang-tidy
lambdai Feb 6, 2021
968a611
clang-tidy another try
lambdai Feb 8, 2021
d4dc8f8
fix format
lambdai Feb 8, 2021
ea86eb1
rename shutdown, add integration test
lambdai Feb 9, 2021
187179b
clean up movePost
lambdai Feb 9, 2021
7782047
fix ambigious
lambdai Feb 9, 2021
16e1aca
Revert "fix ambigious"
lambdai Feb 10, 2021
cf64576
Revert "clean up movePost"
lambdai Feb 10, 2021
40f2530
introduce DispatcherThreadDeletablePtr and replace movePost
lambdai Feb 10, 2021
a5c2c3c
add mising file
lambdai Feb 10, 2021
30c9751
remove assert dispatcher-thread-deletable, add comment, fix sds test …
lambdai Feb 10, 2021
13379da
use independent thread_local_delete_cb_ in dispatcher
lambdai Feb 11, 2021
cc0f944
revert server.h
lambdai Feb 11, 2021
8680fa7
check shutdown_called_
lambdai Feb 16, 2021
5af6d37
call worker dispatcher shutdown at worker thread
lambdai Feb 16, 2021
93e0b0f
fail to fix dispatcher shutdown
lambdai Feb 17, 2021
31fc603
DispatcherShutdownTest
lambdai Feb 17, 2021
f9cd68f
test, log cleanup
lambdai Feb 17, 2021
01a6557
revert source/common/config/grpc_mux_impl.h
lambdai Feb 17, 2021
eb01989
revert source/common/grpc/typed_async_client.h
lambdai Feb 17, 2021
32381e6
revert assert on nullptr
lambdai Feb 18, 2021
19309cb
call reserve() to massage clangtidy
lambdai Feb 18, 2021
2d603b8
Merge branch 'main' into clusterdestroy
lambdai Feb 18, 2021
171b6d4
trace level and delete assert nullptr before use
lambdai Feb 23, 2021
d1bb2d5
fix format
lambdai Feb 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions include/envoy/event/dispatcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,12 @@ class Dispatcher : public DispatcherBase {
*/
virtual void post(PostCb callback) PURE;

/**
lambdai marked this conversation as resolved.
Show resolved Hide resolved
* Similar to `post()` but will destroy passed callback. This simulates posting move only
* function.
lambdai marked this conversation as resolved.
Show resolved Hide resolved
*/
virtual void movePost(PostCb&& callback) PURE;
lambdai marked this conversation as resolved.
Show resolved Hide resolved

/**
* Runs the event loop. This will not return until exit() is called either from within a callback
* or from a different thread.
Expand Down
16 changes: 15 additions & 1 deletion source/common/event/dispatcher_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -265,9 +265,23 @@ void DispatcherImpl::post(std::function<void()> callback) {
}
}

void DispatcherImpl::movePost(std::function<void()>&& callback) {
bool do_post;
{
Thread::LockGuard lock(post_lock_);
do_post = post_callbacks_.empty();
post_callbacks_.push_back(std::move(callback));
// Poor man's move only function.
callback = nullptr;
}

if (do_post) {
post_cb_->scheduleCallbackCurrentIteration();
lambdai marked this conversation as resolved.
Show resolved Hide resolved
}
}

void DispatcherImpl::run(RunType type) {
run_tid_ = api_.threadFactory().currentThreadId();

// Flush all post callbacks before we run the event loop. We do this because there are post
// callbacks that have to get run before the initial event loop starts running. libevent does
// not guarantee that events are run in any particular order. So even if we post() and call
Expand Down
1 change: 1 addition & 0 deletions source/common/event/dispatcher_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ class DispatcherImpl : Logger::Loggable<Logger::Id::main>,
void exit() override;
SignalEventPtr listenForSignal(signal_t signal_num, SignalCb cb) override;
void post(std::function<void()> callback) override;
void movePost(std::function<void()>&& callback) override;
void run(RunType type) override;
Buffer::WatermarkFactory& getWatermarkFactory() override { return *buffer_factory_; }
void pushTrackedObject(const ScopeTrackedObject* object) override;
Expand Down
16 changes: 13 additions & 3 deletions source/common/upstream/upstream_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -919,9 +919,19 @@ ClusterImplBase::ClusterImplBase(

auto socket_matcher = std::make_unique<TransportSocketMatcherImpl>(
cluster.transport_socket_matches(), factory_context, socket_factory, *stats_scope);
info_ = std::make_unique<ClusterInfoImpl>(cluster, factory_context.clusterManager().bindConfig(),
runtime, std::move(socket_matcher),
std::move(stats_scope), added_via_api, factory_context);
auto& dispatcher = factory_context.dispatcher();
info_ = std::shared_ptr<const ClusterInfoImpl>(
lambdai marked this conversation as resolved.
Show resolved Hide resolved
new ClusterInfoImpl(cluster, factory_context.clusterManager().bindConfig(), runtime,
std::move(socket_matcher), std::move(stats_scope), added_via_api,
factory_context),
[&dispatcher](const ClusterInfoImpl* self) {
const auto name = self->name();
ENVOY_LOG(debug, "Schedule destroy cluster info {}", self->name());
lambdai marked this conversation as resolved.
Show resolved Hide resolved
dispatcher.movePost([raii_cluster = std::shared_ptr<const ClusterInfoImpl>(self)]() {
lambdai marked this conversation as resolved.
Show resolved Hide resolved
ENVOY_LOG(debug, "Destroying cluster info {}. This thread should be master thread.",
raii_cluster->name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to do an explicit:

  raii_cluster.reset();

Would it be possible to have some tests that verify that repo the original crash and thus verify that this change fixes the issue of objects being deleted from the wrong thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the explicit reset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test reproduces the crash after the isThreadSafe() is added in Envoy::Grpc::AsyncStreamImpl::sendMessageRaw(). Thank you for the suggestion! Without thse assert(s), it is hard to prove it fixed.

});
});

if ((info_->features() & ClusterInfoImpl::Features::USE_ALPN) &&
!raw_factory_pointer->supportsAlpn()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ namespace Tls {

ContextManagerImpl::~ContextManagerImpl() {
removeEmptyContexts();
KNOWN_ISSUE_ASSERT(contexts_.empty(), "https://github.com/envoyproxy/envoy/issues/10030");
// Contexts could be referenced by cluster to be destroyed. The cluster destroy is driven by the
// master thread dispatcher, but the dispatcher ceases running for good.
}

void ContextManagerImpl::removeEmptyContexts() {
Expand Down
2 changes: 2 additions & 0 deletions test/common/upstream/cluster_manager_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -874,8 +874,10 @@ TEST_F(ClusterManagerImplTest, HttpHealthChecker) {
createClientConnection_(
PointeesEq(Network::Utility::resolveUrl("tcp://127.0.0.1:11001")), _, _, _))
.WillOnce(Return(connection));
EXPECT_CALL(factory_.dispatcher_, movePost(_));
create(parseBootstrapFromV3Yaml(yaml));
factory_.tls_.shutdownThread();
factory_.dispatcher_.to_delete_.clear();
}

TEST_F(ClusterManagerImplTest, UnknownCluster) {
Expand Down
2 changes: 1 addition & 1 deletion test/common/upstream/logical_dns_cluster_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -204,11 +204,11 @@ class LogicalDnsClusterTest : public Event::TestUsingSimulatedTime, public testi
Network::DnsResolver::ResolveCb dns_callback_;
NiceMock<ThreadLocal::MockInstance> tls_;
Event::MockTimer* resolve_timer_;
std::shared_ptr<LogicalDnsCluster> cluster_;
ReadyWatcher membership_updated_;
ReadyWatcher initialized_;
NiceMock<Runtime::MockLoader> runtime_;
NiceMock<Event::MockDispatcher> dispatcher_;
std::shared_ptr<LogicalDnsCluster> cluster_;
antoniovicente marked this conversation as resolved.
Show resolved Hide resolved
NiceMock<LocalInfo::MockLocalInfo> local_info_;
NiceMock<Server::MockAdmin> admin_;
Singleton::ManagerImpl singleton_manager_{Thread::threadFactoryForTest()};
Expand Down
2 changes: 1 addition & 1 deletion test/common/upstream/original_dst_cluster_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,11 @@ class OriginalDstClusterTest : public Event::TestUsingSimulatedTime, public test

Stats::TestUtil::TestStore stats_store_;
Ssl::MockContextManager ssl_context_manager_;
NiceMock<Event::MockDispatcher> dispatcher_;
lambdai marked this conversation as resolved.
Show resolved Hide resolved
OriginalDstClusterSharedPtr cluster_;
ReadyWatcher membership_updated_;
ReadyWatcher initialized_;
NiceMock<Runtime::MockLoader> runtime_;
NiceMock<Event::MockDispatcher> dispatcher_;
Event::MockTimer* cleanup_timer_;
NiceMock<Random::MockRandomGenerator> random_;
NiceMock<LocalInfo::MockLocalInfo> local_info_;
Expand Down
1 change: 1 addition & 0 deletions test/mocks/event/mocks.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ class MockDispatcher : public Dispatcher {
MOCK_METHOD(void, exit, ());
MOCK_METHOD(SignalEvent*, listenForSignal_, (signal_t signal_num, SignalCb cb));
MOCK_METHOD(void, post, (std::function<void()> callback));
MOCK_METHOD(void, movePost, (std::function<void()> && callback));
MOCK_METHOD(void, run, (RunType type));
MOCK_METHOD(void, pushTrackedObject, (const ScopeTrackedObject* object));
MOCK_METHOD(void, popTrackedObject, (const ScopeTrackedObject* expected_object));
Expand Down
2 changes: 2 additions & 0 deletions test/mocks/event/wrapped_dispatcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ class WrappedDispatcher : public Dispatcher {

void post(std::function<void()> callback) override { impl_.post(std::move(callback)); }

void movePost(std::function<void()>&& callback) override { impl_.movePost(std::move(callback)); }

void run(RunType type) override { impl_.run(type); }

Buffer::WatermarkFactory& getWatermarkFactory() override { return impl_.getWatermarkFactory(); }
Expand Down