You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We noticed that after upgrading a cluster to 2.1.6.0-b9, two of the yb-master processes never finished the initialization process. We found a couple of threads that were stuck:
Thread 20 (Thread 0x7f30353ab700 (LWP 1088)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f303c8eb85c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew/lib/libstdc++.so.6
#2 0x00007f30409f2133 in wait<yb::Synchronizer::WaitUntil(const time_point&)::<lambda()> > (__p=..., __lock=..., this=0x7f30353aa828) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/condition_variable:98
#3 yb::Synchronizer::WaitUntil (this=this@entry=0x7f30353aa800, time=...) at ../../src/yb/util/async_util.cc:72
#4 0x00007f3047d1e42a in Wait (this=0x7f30353aa800) at ../../src/yb/util/async_util.h:83
#5 yb::client::YBClient::Data::SetMasterServerProxy (this=0x1292580, deadline=..., skip_resolution=<optimized out>, wait_for_leader_election=<optimized out>) at ../../src/yb/client/client-internal.cc:1663
#6 0x00007f3047d0374b in yb::client::YBClientBuilder::DoBuild (this=this@entry=0x1173180, messenger=<optimized out>, client=client@entry=0x7f30353aa9c0) at ../../src/yb/client/client.cc:378
#7 0x00007f3047d03e83 in yb::client::YBClientBuilder::Build (this=this@entry=0x1173180, messenger=<optimized out>) at ../../src/yb/client/client.cc:406
#8 0x00007f3047cecb8b in yb::client::AsyncClientInitialiser::InitClient (this=0x1173180) at ../../src/yb/client/async_initializer.cc:76
#9 0x00007f303c8f07a0 in ?? () from /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew/lib/libstdc++.so.6
#10 0x00007f303c105694 in start_thread (arg=0x7f30353ab700) at pthread_create.c:333
#11 0x00007f303b84241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 13 (Thread 0x7f3031ba4700 (LWP 1095)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007f303c8ee748 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew/lib/libstdc++.so.6
#2 0x00007f3049ec80f8 in _M_load_and_test_until (__ns=..., __s=..., __has_timeout=<optimized out>, __mo=<optimized out>, __equal=<optimized out>, __operand=<optimized out>, __assumed=<optimized out>, this=<optimized out>)
at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/atomic_futex.h:104
#3 _M_load_and_test (__mo=<optimized out>, __equal=<optimized out>, __operand=<optimized out>, __assumed=<optimized out>, this=<optimized out>) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/atomic_futex.h:122
#4 _M_load_when_equal (__mo=std::memory_order_acquire, __val=1, this=0x128da90) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/atomic_futex.h:162
#5 wait (this=0x128da80) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/future:322
#6 _M_get_result (this=<optimized out>) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/future:681
#7 get (this=<optimized out>) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/future:889
#8 yb::tablet::TransactionStatusResolver::Impl::Execute (this=this@entry=0x1ab1040) at ../../src/yb/tablet/transaction_status_resolver.cc:113
#9 0x00007f3049ec68e5 in Start (deadline=..., this=0x1ab1040) at ../../src/yb/tablet/transaction_status_resolver.cc:60
#10 yb::tablet::TransactionStatusResolver::Start (this=this@entry=0x1bf43b8, deadline=deadline@entry=...) at ../../src/yb/tablet/transaction_status_resolver.cc:233
#11 0x00007f3049eb1bc7 in TryStartCheckLoadedTransactionsStatus<std::atomic<bool>, bool> (flag_to_set=0x1bf4422, flag_to_check=0x1bf4420, this=0x1bf4000) at ../../src/yb/tablet/transaction_participant.cc:1159
#12 Start (this=0x1bf4000) at ../../src/yb/tablet/transaction_participant.cc:167
#13 yb::tablet::TransactionParticipant::Start (this=<optimized out>) at ../../src/yb/tablet/transaction_participant.cc:1327
#14 0x00007f3049e99db9 in yb::tablet::TabletPeer::InitTabletPeer (this=0x1292850, tablet=..., client_future=..., server_mem_tracker=..., messenger=messenger@entry=0x1294600, proxy_cache=0xf6d340, log=..., metric_entity=..., raft_pool=0x12c5200, tablet_prepare_pool=0x12c5800, retryable_requests=0x0, split_op_id=...)
at ../../src/yb/tablet/tablet_peer.cc:295
#15 0x00007f304ac50a46 in yb::master::SysCatalogTable::OpenTablet (this=this@entry=0xf5e6c0, metadata=...) at ../../src/yb/master/sys_catalog.cc:535
#16 0x00007f304ac518e0 in yb::master::SysCatalogTable::SetupTablet (this=this@entry=0xf5e6c0, metadata=...) at ../../src/yb/master/sys_catalog.cc:486
#17 0x00007f304ac51fb2 in yb::master::SysCatalogTable::Load (this=0xf5e6c0, fs_manager=0x104c780) at ../../src/yb/master/sys_catalog.cc:266
#18 0x00007f304ab5cede in yb::master::CatalogManager::InitSysCatalogAsync (this=this@entry=0x12c0000, is_first_run=is_first_run@entry=false) at ../../src/yb/master/catalog_manager.cc:1288
#19 0x00007f304ab6a6bd in yb::master::CatalogManager::Init (this=0x12c0000, is_first_run=<optimized out>) at ../../src/yb/master/catalog_manager.cc:546
#20 0x00007f304ac0b91a in yb::master::Master::InitCatalogManager (this=this@entry=0x7ffdc92f1140) at ../../src/yb/master/master.cc:275
#21 0x00007f304ac0ba26 in yb::master::Master::InitCatalogManagerTask (this=0x7ffdc92f1140) at ../../src/yb/master/master.cc:264
#22 0x00007f30409c7654 in yb::ThreadPool::DispatchThread (this=0x12c8a00, permanent=false) at ../../src/yb/util/threadpool.cc:608
#23 0x00007f30409c3fdf in operator() (this=0xf6fb58) at /home/yugabyte/yb-software/yugabyte-2.1.6.0-b9-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#24 yb::Thread::SuperviseThread (arg=0xf6fb00) at ../../src/yb/util/thread.cc:744
#25 0x00007f303c105694 in start_thread (arg=0x7f3031ba4700) at pthread_create.c:333
#26 0x00007f303b84241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
more specifically, it is stuck waiting for the client_future
But the client future gets set here:
void AsyncClientInitialiser::InitClient() {
LOG(INFO) << "Starting to init ybclient";
while (!stopping_) {
auto result = client_builder_.Build(messenger_);
if (result.ok()) {
LOG(INFO) << "Successfully built ybclient";
client_holder_.reset(result->release());
client_promise_.set_value(client_holder_.get());
return;
}
LOG(ERROR) << "Failed to initialize client: " << result.status();
std::this_thread::sleep_for(1s);
}
client_promise_.set_value(nullptr);
}
which is after SetMasterServerProxy has returned. But no leader election will happen because it is waiting for the peer to be started (TabletPeer::Start) which won't happen until the TabletPeer constructor returns
So in other words, TransactionStatusResolver is waiting for the client future to be set, the client future won't get set until an election happen, an election cannot happen because it needs TabletPeer to be initialized first, and TabletPeer cannot be initialized because the thread is stuck in the constructor waiting for TransactionStatusResolver to return.
The text was updated successfully, but these errors were encountered:
Summary:
When transactional tablet is started, it loads active transactions and resolve their status.
Status resolution is initiated after transaction participant is started and all transactions were loaded.
When number of active transactions is small, they are loaded before transaction participant start.
In this case status resolution is initiated from TransactionParticipant::Start, that is called by Tablet::Start.
But status resolution requires client, so it will wait until client would be able to resolve master leader.
So when above scenario happens with master tablet we end up with deadlock.
Since tablet start waits until client construction, but it cannot complete it's construction because it require master leader for it.
Test Plan: ybd --gtest_filter PgMiniTest.DDLWithRestart
Reviewers: hector, bogdan
Reviewed By: bogdan
Subscribers: ybase
Differential Revision: https://phabricator.dev.yugabyte.com/D8450
We noticed that after upgrading a cluster to 2.1.6.0-b9, two of the yb-master processes never finished the initialization process. We found a couple of threads that were stuck:
Thread 20 is stuck in this method:
In other words, it is waiting for a leader election to happen.
Thread 13 is stuck in
more specifically, it is stuck waiting for the
client_future
But the client future gets set here:
which is after
SetMasterServerProxy
has returned. But no leader election will happen because it is waiting for the peer to be started (TabletPeer::Start
) which won't happen until theTabletPeer
constructor returnsSo in other words, TransactionStatusResolver is waiting for the client future to be set, the client future won't get set until an election happen, an election cannot happen because it needs TabletPeer to be initialized first, and TabletPeer cannot be initialized because the thread is stuck in the constructor waiting for TransactionStatusResolver to return.
The text was updated successfully, but these errors were encountered: