Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Potential long wait / deadlock between DeleteTablet calls and paused compactions #2100

Closed
bmatican opened this issue Aug 19, 2019 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug

Comments

@bmatican
Copy link
Contributor

Rocksdb shutdown stuck

rpc_tp_TabletServer_18-21726    14.92    2.13    6.6    
    @     0x7f8a49ab711e  (unknown)
    @     0x7f8a4a4323b7  __pthread_cond_timedwait
    @     0x7f8a52fe14e0  rocksdb::port::CondVar::TimedWait()
    @     0x7f8a53059992  rocksdb::InstrumentedCondVar::TimedWait()
    @     0x7f8a52f3984f  rocksdb::DBImpl::~DBImpl()
    @     0x7f8a52f3a20f  rocksdb::DBImpl::~DBImpl()
    @     0x7f8a564ff050  yb::tablet::Tablet::Shutdown()
    @     0x7f8a56532dfe  yb::tablet::TabletPeer::CompleteShutdown()
    @     0x7f8a56fb5916  yb::tserver::TSTabletManager::DeleteTablet()
    @     0x7f8a56f923c0  yb::tserver::TabletServiceAdminImpl::DeleteTablet()
    @     0x7f8a547c80b6  yb::tserver::TabletServerAdminServiceIf::Handle()
    @     0x7f8a5132922f  yb::rpc::ServicePoolImpl::Handle()
    @     0x7f8a512d6292  yb::rpc::InboundCall::InboundCallTask::Run()
    @     0x7f8a51334be6  yb::rpc::(anonymous namespace)::Worker::Execute()
    @     0x7f8a4f923ed7  yb::Thread::SuperviseThread()
    @     0x7f8a4a42d692  start_thread

Pause compaction

worker-20533    13892.3    268.85    210.92    
    @     0x7f8a49ab711e  (unknown)
    @     0x7f8a4a43200c  __pthread_cond_wait
    @     0x7f8a4ac1385a  std::condition_variable::wait()
    @     0x7f8a4f8ffa31  yb::PriorityThreadPool::Impl::PauseIfNecessary()
    @     0x7f8a4f8ffeef  yb::(anonymous namespace)::PriorityThreadPoolWorker::PauseIfNecessary()
    @     0x7f8a530558d3  rocksdb::WritableFileWriter::RequestToken()
    @     0x7f8a53055960  rocksdb::WritableFileWriter::WriteBuffered()
    @     0x7f8a530561d8  rocksdb::WritableFileWriter::Flush()
    @     0x7f8a52fe68a1  rocksdb::BlockBasedTableBuilder::FlushDataBlock()
    @     0x7f8a52fe6e01  rocksdb::BlockBasedTableBuilder::Add()
    @     0x7f8a52f0141d  rocksdb::CompactionJob::ProcessKeyValueCompaction()
    @     0x7f8a52f02c47  rocksdb::CompactionJob::Run()
    @     0x7f8a52f3cc1e  rocksdb::DBImpl::BackgroundCompaction()
    @     0x7f8a52f3e86d  rocksdb::DBImpl::BackgroundCallCompaction()
    @     0x7f8a52f4ed78  rocksdb::DBImpl::CompactionTask::DoRun()
    @     0x7f8a52f4466f  rocksdb::DBImpl::ThreadPoolTask::Run()

Actual symptom that highlighted the issue was the master heartbeater was stuck, because it also needs to get info from all the tablets:

heartbeat-20182    84.98    660.8    0.02    
    @     0x7f8a49ab711f  (unknown)
    @     0x7f8a4a43546c  __GI___nanosleep
    @     0x7f8a564fbd19  yb::tablet::Tablet::GetCurrentVersionSstFilesSize()
    @     0x7f8a56f5109c  yb::tserver::Heartbeater::Thread::TryHeartbeat()
    @     0x7f8a56f5262a  yb::tserver::Heartbeater::Thread::DoHeartbeat()
    @     0x7f8a56f529c4  yb::tserver::Heartbeater::Thread::RunThread()
    @     0x7f8a4f923ed8  yb::Thread::SuperviseThread()
    @     0x7f8a4a42d693  start_thread
    @     0x7f8a49b6a41c  __clone
    @ 0xffffffffffffffff  (unknown)
@bmatican bmatican added kind/bug This issue is a bug area/docdb YugabyteDB core features labels Aug 19, 2019
@bmatican bmatican changed the title [docdb] Deadlock between DeleteTablet calls and paused compactions [docdb] Potential long wait / deadlock between DeleteTablet calls and paused compactions Aug 19, 2019
spolitov added a commit that referenced this issue Aug 20, 2019
Summary:
While performing shutdown we set shutting down flag, then abort all compactions
associated with this rocks DB.
But while scheduling new compaction we don't check shutting down flag,
so new compaction could be scheduled in parallel to shutdown.
Causing rocksdb DB to wait until this compaction completes.

Added shutdown check while scheduling compaction.

Test Plan: Jenkins

Reviewers: mikhail, timur, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7094
spolitov added a commit that referenced this issue Sep 11, 2019
…ons to false

Summary:
The issue described in #2100 was fixed by D7094, but we still could get into the situation when RocksDB shutdown takes a long time.
Because its compaction task was paused by other tasks with higher priority.

So it is better to set gflag `use_priority_thread_pool_for_compactions` to `false` until dynamic priorities will be implemented.

Test Plan: Jenkins

Reviewers: amitanand, kannan

Reviewed By: kannan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7184
spolitov added a commit that referenced this issue Sep 14, 2019
Summary:
Change default values for flags:
use_priority_thread_pool_for_compactions - true
allow_preempting_compactions - false

Test Plan: Jenkins

Reviewers: timur, mikhail, kannan

Reviewed By: kannan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7213
spolitov added a commit that referenced this issue Nov 11, 2019
Summary:
Long testing of write heavy workload shown that we perform better with compaction preemption.
So this diff changes allow_preempting_compactions to true.

Test Plan: Jenkins

Reviewers: rao, mikhail, bogdan, kannan

Reviewed By: bogdan, kannan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7547
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

2 participants