Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update build not to be hard-coded to require brew on linux #1

Closed
hengestone opened this issue Nov 6, 2017 · 4 comments
Closed

Update build not to be hard-coded to require brew on linux #1

hengestone opened this issue Nov 6, 2017 · 4 comments
Assignees

Comments

@hengestone
Copy link

Use cmake to detect library dependencies and fail if not found.

@mbautin
Copy link
Contributor

mbautin commented Nov 6, 2017

@hengestone Great point, we'll definitely look into this! The reason to introduce Linuxbrew was that CentOS is our primary target production platform, but the default toolchain on CentOS 7 is quite outdated. And it turns out that Linuxbrew has a pretty recent Boost version as well. Building without Linuxbrew on CentOS might still be challenging, but we might be able to build without it e.g. on a recent version of Ubuntu. What platform are you trying to build on, and what does your toolchain look like, by the way?

@hengestone
Copy link
Author

hengestone commented Nov 7, 2017

Totally understood - there just should be an alternative for environments that are more up to date and able to install newer packages. I'm on Ubuntu (17.10), which is used by a large majority of cloud platforms. See e.g. zdnet or G+

I should add: congratulations on the public Beta, wish you lots of success!

yugabyte-ci pushed a commit that referenced this issue Feb 2, 2018
… memtable

Summary:
There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message.

```
#0  operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178
#1  std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...)  at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857
#2  0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267
#3  rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...)
    at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259
#4  0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143
#5  0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e,
    job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586
#6  0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0,
    log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816
#7  0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838
#8  0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133
#9  0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157
#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333
```

Test Plan: Jenkins

Reviewers: hector, sergei

Reviewed By: hector, sergei

Subscribers: sergei, bogdan, bharat, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4044
@mbautin
Copy link
Contributor

mbautin commented Mar 19, 2018

@hengestone We have made changes to our build script and YugaByte DB should now build on Ubuntu 17.10. Please give it a try when you have a chance and let us know how it goes! By the way, at this point, we're still doing all of our official testing and releases on CentOS, but the resulting Linux package works on both CentOS and Ubuntu.

@hengestone
Copy link
Author

Thanks!

The build process gets a lot further now. I'll file a separate bug for the compile failure.

yugabyte-ci pushed a commit that referenced this issue Nov 30, 2018
…EANUP

Summary:
We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service.
This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload.

```#0  yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0,
    resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222
#1  0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...)
    at ../../src/yb/tserver/tablet_service.cc:431
#2  0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267
#3  0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214```

Changed LookupTablePeerOrRespond to return complete result using return value.

Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable.

Reviewers: robert, hector, mikhail, kannan

Reviewed By: mikhail, kannan

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5772
ajcaldera1 added a commit that referenced this issue Apr 19, 2019
ajcaldera1 added a commit that referenced this issue Apr 19, 2019
mbautin added a commit to yugabyte/yugabyte-db-thirdparty that referenced this issue Apr 29, 2019
Summary:
Enable building on gcc 7 on Ubuntu 17.10
- Extending the FALLTHROUGH_INTENDED macro appropriately
- Removing some unused code that gcc 7 complains about
- Adding "#include <functional>" to a bunch of files because otherwise gcc 7 does not find
  `std::function`.
- Add the thirdparty library directory to rpath of libraries in that directory. Without this glog fails to find gflags. Not clear why this only happens on Ubuntu 17.10 and not on CentOS with Linuxbrew.
- Fixing the clean_thirdparty.sh script that apparently has been incorrect since the Python rewrite
  of the thirdparty framework.

This will fix yugabyte/yugabyte-db#1.

Test Plan: Jenkins

Reviewers: hector, bharat, bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4395
mbautin pushed a commit that referenced this issue Jun 20, 2019
Restyle docs to match web site
mbautin pushed a commit that referenced this issue Jun 20, 2019
updated the architecture section
yugabyte-ci pushed a commit that referenced this issue Jun 27, 2019
…data

Summary:
Originally issue has been discovered as `RaftConsensusITest.TestAddRemoveVoter` random test failure in TSAN mode due to a data race.
```
WARNING: ThreadSanitizer: data race (pid=11050)
1663	[ts-2]	   Write of size 8 at 0x7b4c000603a8 by thread T51 (mutexes: write M3613):
...
1674	[ts-2]	     #10 yb::tablet::KvStoreInfo::LoadTablesFromPB(google::protobuf::RepeatedPtrField<yb::tablet::TableInfoPB>, string) src/yb/tablet/tablet_metadata.cc:170
1675	[ts-2]	     #11 yb::tablet::KvStoreInfo::LoadFromPB(yb::tablet::KvStoreInfoPB const&, string) src/yb/tablet/tablet_metadata.cc:189:10
1676	[ts-2]	     #12 yb::tablet::RaftGroupMetadata::LoadFromSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:508:5
1677	[ts-2]	     #13 yb::tablet::RaftGroupMetadata::ReplaceSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:545:3
1678	[ts-2]	     #14 yb::tserver::RemoteBootstrapClient::Finish() src/yb/tserver/remote_bootstrap_client.cc:486:3
...
   Previous read of size 4 at 0x7b4c000603a8 by thread T16:
1697	[ts-2]	     #0 yb::tablet::RaftGroupMetadata::schema_version() const src/yb/tablet/tablet_metadata.h:251:34
1698	[ts-2]	     #1 yb::tserver::TSTabletManager::CreateReportedTabletPB(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::master::ReportedTabletPB*) src/yb/tserver/ts_tablet_manager.cc:1323:71
1699	[ts-2]	     #2 yb::tserver::TSTabletManager::GenerateIncrementalTabletReport(yb::master::TabletReportPB*) src/yb/tserver/ts_tablet_manager.cc:1359:5
1700	[ts-2]	     #3 yb::tserver::Heartbeater::Thread::TryHeartbeat() src/yb/tserver/heartbeater.cc:371:32
1701	[ts-2]	     #4 yb::tserver::Heartbeater::Thread::DoHeartbeat() src/yb/tserver/heartbeater.cc:531:19
```

The reason is that although `RaftGroupMetadata::schema_version()` is getting `TableInfo` pointer from `primary_table_info()` under mutex lock, but then it accesses its field without lock.

Added `RaftGroupMetadata::primary_table_info_guarded()` private method which returns a pair of `TableInfo*` and `std::unique_lock` and used it in `RaftGroupMetadata::schema_version()` and other `RaftGroupMetadata` functions accessing primary table info fields.

Test Plan: `ybd tsan --sj --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestAddRemoveVoter -n 1000`

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6813
mbautin added a commit that referenced this issue Jul 11, 2019
…ed to the

earlier commit 864e72b

Original commit message:

ENG-2793 Do not fail when deciding if we can flush an empty immutable memtable

Summary:
There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message.

```
#0  operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178
#1  std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...)  at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857
#2  0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267
#3  rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...)
    at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259
#4  0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143
#5  0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e,
    job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586
#6  0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0,
    log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816
#7  0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838
#8  0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133
#9  0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157
#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333
```

Test Plan: Jenkins

Reviewers: hector, sergei

Reviewed By: hector, sergei

Subscribers: sergei, bogdan, bharat, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4044
mbautin pushed a commit that referenced this issue Jul 11, 2019
…ed to the

earlier commit 566d6d2

Original commit message:

ENG-4240: #613: Fix checking of tablet presence during transaction CLEANUP

Summary:
We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service.
This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload.

```#0  yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0,
    resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222
#1  0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...)
    at ../../src/yb/tserver/tablet_service.cc:431
#2  0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267
#3  0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214```

Changed LookupTablePeerOrRespond to return complete result using return value.

Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable.

Reviewers: robert, hector, mikhail, kannan

Reviewed By: mikhail, kannan

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5772
mbautin added a commit to mbautin/yugabyte-db that referenced this issue Jul 16, 2019
… memtable

Summary:
There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message.

```
#0  operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178
yugabyte#1  std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...)  at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857
yugabyte#2  0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267
yugabyte#3  rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...)
    at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259
yugabyte#4  0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143
yugabyte#5  0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e,
    job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586
yugabyte#6  0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0,
    log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816
yugabyte#7  0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838
yugabyte#8  0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133
yugabyte#9  0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157
yugabyte#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333
```

Test Plan: Jenkins

Reviewers: hector, sergei

Reviewed By: hector, sergei

Subscribers: sergei, bogdan, bharat, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4044

Note:
This commit provides additional functionality that is logically related to
the earlier commit yugabyte@864e72b
and supersedes the commit yugabyte@2932b0a
mbautin pushed a commit to mbautin/yugabyte-db that referenced this issue Jul 16, 2019
…ction CLEANUP

Summary:
We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service.
This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload.

```#0  yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0,
    resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222
yugabyte#1  0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...)
    at ../../src/yb/tserver/tablet_service.cc:431
yugabyte#2  0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267
yugabyte#3  0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214```

Changed LookupTablePeerOrRespond to return complete result using return value.

Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable.

Reviewers: robert, hector, mikhail, kannan

Reviewed By: mikhail, kannan

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5772

Note:
This commit provides additional functionality that is logically related to
the earlier commit yugabyte@566d6d2
and supersedes the commit yugabyte@63bae60
svarnau pushed a commit that referenced this issue May 25, 2024
Summary:
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35116
svarnau pushed a commit that referenced this issue May 29, 2024
… SIGTERM

Summary:
Original commit: 5862233 / D35116
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: yql, smishra, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35169
karthik-ramanathan-3006 added a commit that referenced this issue Jun 6, 2024
…IGTERM

Summary:
Original commit: 5862233 / D35116
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: yql, smishra, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35171
karthik-ramanathan-3006 added a commit that referenced this issue Jun 6, 2024
…IGTERM

Summary:
Original commit: 5862233 / D35116
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35170
jasonyb pushed a commit that referenced this issue Jun 11, 2024
The pg_stat_monitor is based on PostgreSQL-11's pg_stat_statement.
To keep track of the changes, this is the base code of
PostgreSQL's pg_stat_statement.

(commit = d898edf4f233a3ffe6a0da64179fc268a1d46200).
dr0pdb pushed a commit to dr0pdb/yugabyte-db that referenced this issue Jul 24, 2024
ddhodge added a commit that referenced this issue Jul 30, 2024
…23065)

* initial commit for logical replication docs

* title changes

* changes to view table

* fixed line break

* fixed line break

* added content for delete and update

* added more content

* replaced hyperlink todos with reminders

* added snapshot metrics

* added more content

* added more config properties to docs

* added more config properties to docs

* added more config properties to docs

* replaced postgresql instances with yugabytedb

* added properties

* added complete properties

* changed postgresql to yugabytedb

* added example for all record types

* fixed highlighting of table header

* added type representations

* added type representations

* full content in now;

* full content in now;

* changed postgres references appropriately

* added a missing keyword

* changed name

* self review comments

* self review comments

* added section for logical replication

* added section for logical replication

* modified content for monitor page

* added content for monitoring

* rebased to master;

* CDC logical replication overview (#3)


Co-authored-by: Vaibhav Kushwaha <34186745+vaibhav-yb@users.noreply.github.com>

* advanced-topic (#5)


Co-authored-by: Vaibhav Kushwaha <34186745+vaibhav-yb@users.noreply.github.com>

* removed references to incremental and ad-hoc snapshots

* replaced index page with an empty one

* addressed review comments

* added getting started section

* added section for get started

* self review comments

* self review comments

* group review comments

* added hstore and domain type docs

* Advance configurations for CDC using logical replication (#2)

* Fix overview section (#7)

* Monitor section (#4)


Co-authored-by: Vaibhav Kushwaha <34186745+vaibhav-yb@users.noreply.github.com>

* Initial Snapshot content (#6)

* Add getting started (#1)

* Fix for broken note (#9)

* Fix the issue yaml parsing

Summary:
Fixes the issue yaml parsing. We changed the formatting for yaml list. This diff fixes the
usage for the same.

Test Plan:
Prepared alma9 node using ynp.
Verified universe creation.

Reviewers: vbansal, asharma

Reviewed By: asharma

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36711

* [PLAT-14534]Add regex match for GCP Instance template

Summary:
Added regex match for gcp instance template.
Regex taken from gcp documentation [[https://cloud.google.com/compute/docs/reference/rest/v1/instanceTemplates | here]].

Test Plan: Tested manually that validation fails with invalid characters.

Reviewers: #yba-api-review!, svarshney

Reviewed By: svarshney

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36543

* update diagram (#23245)

* [/PLAT-14708] Fix JSON field name in TaskInfo query

Summary: This was missed when task params were moved out from details field.

Test Plan: Trivial - existing tests should succeed.

Reviewers: vbansal, cwang

Reviewed By: vbansal

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36705

* [#23173] DocDB: Allow large bytes to be passed to RateLimiter

Summary:
RateLimiter has a debug assert that you cannot `Request` more than `GetSingleBurstBytes`. In release mode we do not perform this check and any call gets stuck forever. This change allows large bytes to be requested on RateLimiter. It does so by breaking requests larger than `GetSingleBurstBytes` into multiple smaller requests.

This change is a temporary fix to allow xCluster to operate without any issues. RocksDB RateLimiter has multiple enhancements over the years that would help avoid this and more starvation issues. Ex: facebook/rocksdb@cb2476a. We should consider pulling in those changes.

Fixes #23173
Jira: DB-12112

Test Plan: RateLimiterTest.LargeRequests

Reviewers: slingam

Reviewed By: slingam

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D36703

* [#23179] CDCSDK: Support data types with dynamically alloted oids in CDC

Summary:
This diff adds support for data types with dynamically alloted oids in CDC (for ex: hstore, enum array, etc). Such types contain invalid pg_type_oid for the corresponding columns in docdb schema.

In the current implemtation, in `ybc_pggate`, while decoding the cdc records we look at the `type_map_` to obtain YBCPgTypeEntity, which is then used for decoding. However the `type_map_` does not contain any entries for the data types with dynamically alloted oids. As a result, this causes segmentation fault. To prevent such crashes, CDC prevents addition of tables with such columns to the stream.

This diff removes the filtering logic and adds the tables to the stream even if it has such a type column. A function pointer will now be passed to `YBCPgGetCDCConsistentChanges`, which takes attribute number and the table_oid and returns the appropriate type entity by querying the `pg_type` catalog table. While decoding if a column is encountered with invalid pg_type_oid then, the passed function is invoked and type entity is obtained for decoding.

**Upgrade/Rollback safety:**
This diff adds a field `optional int32 attr_num` to DatumMessagePB. These changes are protected by the autoflag `ysql_yb_enable_replication_slot_consumption` which already exists but has not yet been released.
Jira: DB-12118

Test Plan:
Jenkins: urgent

All the existing cdc tests

./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot#replicationConnectionConsumptionAllDataTypesWithYbOutput'

Reviewers: skumar, stiwary, asrinivasan, dmitry

Reviewed By: stiwary, dmitry

Subscribers: steve.varnau, skarri, yql, ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36689

* [PLAT-14710] Do not return apiToken in response to getSessionInfo

Summary:
**Context**
The GET /session_info YBA API returns:
{
    "authToken": "…",
    "apiToken": "….",
    "apiTokenVersion": "….",
    "customerUUID": "uuid1",
    "userUUID": "useruuid1"
}

The apiToken and apiTokenVersion is supposed to be the last generated token that is valid. We had the following sequence of changes to this API.

https://yugabyte.atlassian.net/browse/PLAT-8028 - Do not store YBA token in YBA.

After the above fix, YBA does not store the apiToken anymore. So it cannot return it as part of the /session_info. The change for this ticket returned the hashed apiToken instead.

https://yugabyte.atlassian.net/browse/PLAT-14672 - getSessionInfo should generate and return api key in response

Since the hashed apiToken value is not useful to any client, and it broke YBM create cluster (https://yugabyte.atlassian.net/browse/CLOUDGA-22117), the first change for this ticket returned a new apiToken instead.

Note that GET /session_info is meant to get customer and user information for the currently authenticated session. This is useful for automation starting off an authenticated session from an existing/cached API token. It is not necessary for the /session_info API to return the authToken and apiToken. The client already has one of authToken or apiToken with which it invoked /session_info API. In fact generating a new apiToken whenever /session_info is called will invalidate the previous apiToken which would not be expected by the client. There is a different API /api_token to regenerate the apiToken explicitly.

**Fix in this change**
So the right behaviour is for /session_info to stop sending the apiToken in the response. In fact, the current behaviour of generating a new apiToken everytime will break a client (for example node-agent usage of /session_info here (https://github.com/yugabyte/yugabyte-db/blob/4ca56cfe27d1cae64e0e61a1bde22406e003ec04/managed/node-agent/app/server/handler.go#L19).

**Client impact of not returning apiToken in response of /session_info**

This should not impact any normal client that was using /session_info only to get the user uuid and customer uuid.

However, there might be a few clients (like YBM for example) that invoked /session_info to get the last generated apiToken from YBA. Unfortunately, this was a mis-use of this API. YBA generates the apiToken in response to a few entry point APIs like /register, /api_login and /api_token. The apiToken is long lived. YBA could choose to expire these apiTokens after a fixed amount of (long) time, but for now there is no expiration. The clients are expected to store the apiToken at their end and use the token to reestablish a session with YBA whenever needed. After establishinig a new session, clients would call GET /session_info to get the user uuid and customer uuid. This is getting fixed in YBM with https://yugabyte.atlassian.net/browse/CLOUDGA-22117. So this PLAT change should be taken up by YBM only after CLOUDGA-22117 is fixed.

Test Plan:
* Manually verified that session_info does not return authToken
* Shubham verified that node-agent works with this fix. Thanks Shubham!

Reviewers: svarshney, dkumar, tbedi, #yba-api-review!

Reviewed By: svarshney

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36712

* [docs] updates to CVE table status column (#23225)

* updates to status column

* review comment

* format

---------

Co-authored-by: Dwight Hodge <ghodge@yugabyte.com>

* [docs] Fix load balance keyword in drivers page (#23253)

[docs] Fix `load_balance` -> `load-balance` in jdbc driver
[docs] Fix `load_balance` -> `loadBalance` in nodejs driver

* fixed compilation

* fix link, format

* format, links

* links, format

* format

* format

* minor edit

* best practice (#8)

* moved sections

* moved pages

* added key concepts page

* added link to getting started

* Dynamic table doc changes (#11)

* icons

* added box for lead link

* revert ybclient change

* revert accidental change

* revert accidental change

* revert accidental change

* fix link block for getting started page

* format

* minor edit

* links, format

* format

* links

* format

* remove reminder references

* Modified output plugin docs (#12)

* Naming edits

* format

* review comments

* diagram

* review comment

* fix links

* format

* format

* link

* review comments

* copy to stable

* link

---------

Co-authored-by: siddharth2411 <43139012+siddharth2411@users.noreply.github.com>
Co-authored-by: Shubham <svarshney@yugabyte.com>
Co-authored-by: asharma-yb <asharma@yugabyte.com>
Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com>
Co-authored-by: Naorem Khogendro Singh <nsingh@yugabyte.com>
Co-authored-by: Hari Krishna Sunder <hari90@users.noreply.github.com>
Co-authored-by: Sumukh-Phalgaonkar <sumukhphalgaonkar@gmail.com>
Co-authored-by: Subramanian Neelakantan <sneelakantan@yugabyte.com>
Co-authored-by: Aishwarya Chakravarthy <ashchakravarthy@gmail.com>
Co-authored-by: Dwight Hodge <ghodge@yugabyte.com>
Co-authored-by: ddorian <dorian.hoxha@gmail.com>
Co-authored-by: Sumukh-Phalgaonkar <61342752+Sumukh-Phalgaonkar@users.noreply.github.com>
arybochkin added a commit that referenced this issue Aug 1, 2024
…kward scans improvement

Summary:
The change updates cost based optimizer to take backward scan improvements into account, so that
backward scans are picked instead of the forward scan+sort when fast backward scan feautuer is
enabled via `FLAGS_use_fast_backward_scan`.

Results for TAQO run first. The first 4 columns are the values for 'Best Execution Plan Picked',
the last 4 columns are for number of queries with backward scans in execution plan. Cost based
optimizer is turned on for 'Master' and 'D36614'.

| Model                        | Master | D36614 | PG     | Num queries | Improved | Degraded | Plan changed
| ---------------------------- | ------ | ------ | ------ | ----------- | -------- | -------- | ------------
| basic                        |  91.04 |  91.04 |  96.64 |           0 |        0 |        0 |        0
| complex                      |  85.42 |  84.38 |  86.46 |           3 |        2 |        1 |        1
| cost-validation-joins        |  78.46 |  79.79 |  95.48 |          23 |       23 |        0 |        0
| cost-validation-misc         |  94.43 |   95.3 |  92.86 |          62 |       62 |        0 |        6
| cost-validation-single-table |  96.09 |  96.92 |   98.7 |           0 |        0 |        0 |        0
| join-order-benchmark         |  66.37 |   64.6 |  42.48 |           0 |        0 |        0 |        0
| subqueries                   |     80 |  86.67 |     80 |           0 |        0 |        0 |        0
| more-subqueries              |  77.94 |  76.47 |    100 |           1 |        1 |        0 |        0
| seek-next-estimation         |    100 |    100 |  96.88 |           0 |        0 |        0 |        0
| tpch                         |  72.73 |  68.18 |  72.73 |           0 |        0 |        0 |        0
| tuning_tests                 |  93.63 |  94.12 |  99.06 |          10 |       10 |        0 |        0

Some queries results by model (queries with backward scans):
| complex                          | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| 59ebf1c77e58cb2291c35486e2f96137 |                  |               |                  |                | plan changed
| Estimated cost                   |          2285.29 |       2285.29 |          1438.33 | 20000002970.96 |
| Execution time                   |             8.20 |          8.20 |           299.57 |          10.46 |
|                                  |                  |               |                  |                |
| 4039d9f16fb8f4ed263f48d5f5232215 |                  |               |                  |                |
| Estimated cost                   |         18222.67 |      18222.67 |         14353.61 |       14353.61 |
| Execution time                   |            22.04 |         22.04 |            13.19 |          13.19 |
|                                  |                  |               |                  |                |
| 5c918663b34f55e514fc6e6edc046556 |                  |               |                  |                |
| Estimated cost                   |         27084.99 |      27084.99 |         23715.02 |       23715.02 |
| Execution time                   |            35.53 |         35.53 |            27.63 |          27.63 |
|                                  |                  |               |                  |                |
| cost-validation-joins            | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| c8327c54b05e0781b1e095fefe6e314e |                  |               |                  |                |
| Estimated cost                   |           387.39 |        387.39 |           384.04 |         384.04 |
| Execution time                   |              6.3 |          6.30 |            	3.16 |           3.16 |
|                                  |                  |               |                  |                |
| 4ce5afcdad024545f07198bd75eb5312 |                  |               |                  |                |
| Estimated cost                   |           361.45 |        384.96 |           361.41 |         604.03 |
| Execution time                   |           153.51 |          5.99 |            137.8 |           2.41 |
|                                  |                  |               |                  |                |
| 01c77bebc5d6fe9d1444170d92a14ec1 |                  |               |                  |                |
| Estimated cost                   |           387.77 |        387.77 |           384.44 |         384.44 |
| Execution time                   |             6.34 |          6.34 |              3.3 |           3.30 |
|                                  |                  |               |                  |                |
| 3ca0ff16775634091419ecf365bbfcf5 |                  |               |                  |                |
| Estimated cost                   |           362.67 |        386.53 |           362.34 |         389.85 |
| Execution time                   |            21.43 |          6.09 |            16.82 |           2.47 |
|                                  |                  |               |                  |                |
| e4e31014bb2b4b3ab9478f7200516310 |                  |               |                  |                |
| Estimated cost                   |           374.17 |        386.53 |           373.83 |         383.19 |
| Execution time                   |            75.73 |          5.97 |            46.86 |           3.01 |
|                                  |                  |               |                  |                |
| 7a993deaf6d49e2b053987240fe78c02 |                  |               |                  |                |
| Estimated cost                   |           362.68 |        362.68 |           362.34 |         362.34 |
| Execution time                   |            10.21 |         10.21 |              6.7 |           6.70 |
|                                  |                  |               |                  |                |
| cost-validation-misc             | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| e8b548c59a52976c30bde07e5576fef3 |                  |               |                  |                | best plan changed
| Estimated cost                   |           379.01 |   10000000674 |           372.27 |         372.27 |
| Execution time                   |                1 |          0.73 |             0.65 |           0.65 |
|                                  |                  |               |                  |                |
| 7a353cc0498dfa2d44a441b37c5a1be2 |                  |               |                  |                |
| Estimated cost                   |          13766.3 |       13766.3 |         10364.91 |       10364.91 |
| Execution time                   |            13.91 |         13.91 |             7.62 |           7.62 |
|                                  |                  |               |                  |                |
| 45241b1e1c945e2d13ba488e42fa9f5f |                  |               |                  |                | plan changed
| Estimated cost                   |          1138.17 |       1327.55 |           956.81 |         956.81 |
| Execution time                   |             1.36 |          0.88 |             0.64 |           0.64 |
|                                  |                  |               |                  |                |
| df22d138bac1990b9a2b664478be3940 |                  |               |                  |                | best plan changed
| Estimated cost                   |         81398.55 |     133322.29 |         81398.55 |       99993.43 |
| Execution time                   |            92.79 |         14.05 |             80.1 |           7.41 |
|                                  |                  |               |                  |                |
| bd56ddf3c3edcbb4daed5e90222f3f43 |                  |               |                  |                | plan changed
| Estimated cost                   |          1156.06 |       1214.59 |           880.92 |         880.92 |
| Execution time                   |             1.84 |          0.93 |              0.7 |           0.70 |
|                                  |                  |               |                  |                |
| c2b94854c7334aaeab12fb6a7ad90bbe |                  |               |                  |                | best plan changed
| Estimated cost                   |          6621.15 |      16245.17 |          6621.15 |       11662.79 |
| Execution time                   |              8.3 |          1.06 |             7.53 |           0.87 |
|                                  |                  |               |                  |                |
| 8c35b5522e8da2fbb14154afbb949c17 |                  |               |                  |                | best plan changed
| Estimated cost                   |         61271.63 |     249323.79 |         61271.63 |      185166.06 |
| Execution time                   |            77.92 |          2.98 |            70.86 |           1.89 |
|                                  |                  |               |                  |                |
| 3e17a5426c1240a7cb15413483c6857b |                  |               |                  |                | best plan changed
| Estimated cost                   |         83146.63 |     286833.39 |         83146.63 |      200156.46 |
| Execution time                   |           102.32 |          3.01 |            89.26 |           1.91 |
|                                  |                  |               |                  |                |
| more-subqueries                  | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| df720fdc87e9d33aa1006ede6868310f |                  |               |                  |                |
| Estimated cost                   |        432396.84 |     432396.84 |        389020.18 |      389020.18 |
| Execution time                   |           268.49 |        268.49 |           220.67 |         220.67 |
|                                  |                  |               |                  |                |
| tuning_tests                     | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| a7a1a762f96b9445990d3057904a8f9b |                  |               |                  |                |
| Estimated cost                   |        102825.88 |     102825.88 |         56113.48 |       56113.48 |
| Execution time                   |            70.95 |         70.95 |            38.41 |          38.41 |
|                                  |                  |               |                  |                |
| 13026bfac847da1ec9f13fdaad5c39cf |                  |               |                  |                |
| Estimated cost                   |        117489.58 |     117489.58 |         64103.98 |       64103.98 |
| Execution time                   |            79.83 |         79.83 |            42.35 |          42.35 |
|                                  |                  |               |                  |                |
| 90e1ac9dfdbd77fa5ad96b1fe3526a41 |                  |               |                  |                |
| Estimated cost                   |        132153.27 |     132153.27 |         72094.47 |       72094.47 |
| Execution time                   |            90.19 |         90.19 |            46.56 |          46.56 |
|                                  |                  |               |                  |                |
| 90c2ad6894acac53c20efdd886e0fc03 |                  |               |                  |                |
| Estimated cost                   |        146816.97 |     146816.97 |         80084.97 |       80084.97 |
| Execution time                   |           101.19 |        101.19 |            52.69 |          52.69 |

The full report: https://taqo.dev.yugabyte.com/regression/33

Most of the queries with backward scans improved their time of execution. However, there's one
query 59ebf1c77e58cb2291c35486e2f96137 which shows a regression. From other tests it is clearly
seen that new approach gives a good improvement for backward scans, which may mean some other parts
of Cost Based Optimizer may have been tweaked additionally (like costs for seeks, next, etc). This
action requires additional analysis and will be covered by a separate ticket.
Query link: https://taqo.dev.yugabyte.com/reports/b5e885c8e491050e70320e4b801469b0/20240719-115328/tags/distinct.html#59ebf1c77e58cb2291c35486e2f96137

Jira: DB-11271

Test Plan:
Test case #1 (backward scan improvements are turned off).
1. Start a cluster: `./bin/yb-ctl start --rf=1`
2. Open `ysqlsh`
3. Create a table with some data:
`# CREATE TABLE ttable(h INT, r INT, c INT, PRIMARY KEY(h, r ASC));`
`# INSERT INTO ttable SELECT i, i, i FROM generate_series(1, 10) AS i;`
4. Turn CBO on: `# SET yb_enable_base_scans_cost_model TO true;`
5. Run a query `# EXPLAIN ANALYZE SELECT c, r FROM ttable WHERE h = 1 ORDER BY r DESC;`
6. Result:
```
 Sort  (cost=555.50..555.51 rows=5 width=8) (actual time=0.706..0.706 rows=1 loops=1)
   Sort Key: r DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using ttable_pkey on ttable  (cost=180.00..555.44 rows=5 width=8) (actual time=0.674..0.677 rows=1 loops=1)
         Index Cond: (h = 1)
 Planning Time: 6.147 ms
 Execution Time: 0.776 ms
 Peak Memory Usage: 60 kB
(8 rows)
```
It is expected to have Forward Scan + Sort in case of fast backward scan is turned off.

Test case #2 (backward scan improvements are turned on).
1. Start a cluster: `./bin/yb-ctl start --rf=1 --tserver_flags=allowed_preview_flags_csv=use_fast_backward_scan,use_fast_backward_scan=true`
2. Open `ysqlsh`
3. Create a table with some data:
`# CREATE TABLE ttable(h INT, r INT, c INT, PRIMARY KEY(h, r ASC));`
`# INSERT INTO ttable SELECT i, i, i FROM generate_series(1, 10) AS i;`
4. Turn CBO on: `# SET yb_enable_base_scans_cost_model TO true;`
5. Run a query `# EXPLAIN ANALYZE SELECT c, r FROM ttable WHERE h = 1 ORDER BY r DESC;`
6. Result:
```
 Index Scan Backward using ttable_pkey on ttable  (cost=180.00..557.77 rows=5 width=8) (actual time=1.075..1.079 rows=1 loops=1)
   Index Cond: (h = 1)
 Planning Time: 0.073 ms
 Execution Time: 1.129 ms
 Peak Memory Usage: 24 kB
(5 rows)
```
It is seen that CBO takes backward scan improvements into account and the planner prefers Index Scan Backward over Forward Scan + Sort.

Reviewers: rthallam, gkukreja, amartsinchyk

Reviewed By: rthallam, gkukreja, amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36614
myang2021 added a commit that referenced this issue Aug 8, 2024
Summary:
The DDL atomicity stress tests failed more on pg15 branch with an error like:

```
WARNING: ThreadSanitizer: data race (pid=180911)
  Write of size 8 at 0x7b2c000257b8 by thread T17 (mutexes: write M0):
    #0 profile_open_file prof_file.c (libkrb5.so.3+0xf45b3)
    #1 profile_init_flags <null> (libkrb5.so.3+0xfb056)
    #2 k5_os_init_context <null> (libkrb5.so.3+0xe5546)
    #3 krb5_init_context_profile <null> (libkrb5.so.3+0xabc90)
    #4 krb5_init_context <null> (libkrb5.so.3+0xabbd5)
    #5 krb5_gss_init_context init_sec_context.c (libgssapi_krb5.so.2+0x448da)
    #6 acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39159)
    #7 krb5_gss_acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39072)
    #8 gss_add_cred_from <null> (libgssapi_krb5.so.2+0x1fcd3)
    #9 gss_acquire_cred_from <null> (libgssapi_krb5.so.2+0x1f69d)
    #10 gss_acquire_cred <null> (libgssapi_krb5.so.2+0x1f431)
    #11 pg_GSS_have_cred_cache ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-gssapi-common.c:68:10 (libpq.so.5+0x543fe)
    #12 PQconnectPoll ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2909:22 (libpq.so.5+0x359ca)
    #13 connectDBComplete ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2241:10 (libpq.so.5+0x30807)
    #14 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:719:10 (libpq.so.5+0x30af1)
    #15 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:348:24 (libpq_utils.so+0x13c5b)
    #16 yb::pgwrapper::PGConn::Connect(string const&, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.h:254:12 (libpq_utils.so+0x1a77e)
    #17 yb::pgwrapper::PGConnBuilder::Connect(bool) const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:743:10 (libpq_utils.so+0x1a77e)
    #18 yb::pgwrapper::LibPqTestBase::ConnectToDBAsUser(string const&, string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:54:6 (libpg_wrapper_test_base.so+0x26f34)
    #19 yb::pgwrapper::LibPqTestBase::ConnectToDB(string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:44:10 (libpg_wrapper_test_base.so+0x26b1e)
    #20 yb::pgwrapper::LibPqTestBase::Connect(bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:40:10 (libpg_wrapper_test_base.so+0x26b1e)
    #21 yb::pgwrapper::PgDdlAtomicityStressTest::Connect() ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:147:25 (pg_ddl_atomicity_stress-test+0x136d6c)
    #22 yb::pgwrapper::PgDdlAtomicityStressTest::TestDdl(std::vector<string, std::allocator<string>> const&, int) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:165:15 (pg_ddl_atomicity_stress-test+0x136df5)
    #23 yb::pgwrapper::PgDdlAtomicityStressTest_StressTest_Test::TestBody()::$_2::operator()() const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:316:5 (pg_ddl_atomicity_stress-test+0x13d2eb)
```

It appears that the function `yb::pgwrapper::LibPqTestBase::Connect` isn't
thread safe. I restructured the code to make the connections in a single thread
and then pass them to various concurrent threads for testing.
Jira: DB-2996

Test Plan:
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17

Verified that no more tsan errors.

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D37111
myang2021 added a commit that referenced this issue Aug 8, 2024
… in tsan build

Summary:
The DDL atomicity stress tests failed more on pg15 branch with an error like:

```
WARNING: ThreadSanitizer: data race (pid=180911)
  Write of size 8 at 0x7b2c000257b8 by thread T17 (mutexes: write M0):
    #0 profile_open_file prof_file.c (libkrb5.so.3+0xf45b3)
    #1 profile_init_flags <null> (libkrb5.so.3+0xfb056)
    #2 k5_os_init_context <null> (libkrb5.so.3+0xe5546)
    #3 krb5_init_context_profile <null> (libkrb5.so.3+0xabc90)
    #4 krb5_init_context <null> (libkrb5.so.3+0xabbd5)
    #5 krb5_gss_init_context init_sec_context.c (libgssapi_krb5.so.2+0x448da)
    #6 acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39159)
    #7 krb5_gss_acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39072)
    #8 gss_add_cred_from <null> (libgssapi_krb5.so.2+0x1fcd3)
    #9 gss_acquire_cred_from <null> (libgssapi_krb5.so.2+0x1f69d)
    #10 gss_acquire_cred <null> (libgssapi_krb5.so.2+0x1f431)
    #11 pg_GSS_have_cred_cache ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-gssapi-common.c:68:10 (libpq.so.5+0x543fe)
    #12 PQconnectPoll ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2909:22 (libpq.so.5+0x359ca)
    #13 connectDBComplete ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2241:10 (libpq.so.5+0x30807)
    #14 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:719:10 (libpq.so.5+0x30af1)
    #15 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:348:24 (libpq_utils.so+0x13c5b)
    #16 yb::pgwrapper::PGConn::Connect(string const&, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.h:254:12 (libpq_utils.so+0x1a77e)
    #17 yb::pgwrapper::PGConnBuilder::Connect(bool) const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:743:10 (libpq_utils.so+0x1a77e)
    #18 yb::pgwrapper::LibPqTestBase::ConnectToDBAsUser(string const&, string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:54:6 (libpg_wrapper_test_base.so+0x26f34)
    #19 yb::pgwrapper::LibPqTestBase::ConnectToDB(string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:44:10 (libpg_wrapper_test_base.so+0x26b1e)
    #20 yb::pgwrapper::LibPqTestBase::Connect(bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:40:10 (libpg_wrapper_test_base.so+0x26b1e)
    #21 yb::pgwrapper::PgDdlAtomicityStressTest::Connect() ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:147:25 (pg_ddl_atomicity_stress-test+0x136d6c)
    #22 yb::pgwrapper::PgDdlAtomicityStressTest::TestDdl(std::vector<string, std::allocator<string>> const&, int) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:165:15 (pg_ddl_atomicity_stress-test+0x136df5)
    #23 yb::pgwrapper::PgDdlAtomicityStressTest_StressTest_Test::TestBody()::$_2::operator()() const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:316:5 (pg_ddl_atomicity_stress-test+0x13d2eb)
```

It appears that the function `yb::pgwrapper::LibPqTestBase::Connect` isn't
thread safe. I restructured the code to make the connections in a single thread
and then pass them to various concurrent threads for testing.
Jira: DB-2996

Original commit: bd4874b / D37111

Test Plan:
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17

Verified that no more tsan errors.

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37167
amitanandaiyer added a commit that referenced this issue Sep 5, 2024
Summary:
Call callback in ScopeExit block only. Not while holding the lock.

Without this fix, it is possible that a thread can get into a deadlock, trying to request a shared_lock on a mutex, while already holding an exclusive lock on the same mutex:

This deadlock can be triggered if there are active read/write requests to a Table (from more than 1 thread)  right after the table had a tablet-split.

 If there is only 1 thread, it is unlikely to run into the deadlock, as the thread notices -- as part of the callback -- that the table's partition info is stale. Having a different thread refresh the partition version before the main thread checks if the table version is stale, is likely necessary to trigger the stack trace seen below.

e.g:
```
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00005640c3eb441b in std::__1::shared_timed_mutex::lock_shared() ()
#2  0x00005640c3ffcbff in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>, yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
#3  0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#4  0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#5  0x00005640c401aa76 in yb::client::(anonymous namespace)::BatcherFlushDone(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig) ()
#6  0x00005640c401b371 in boost::detail::function::void_function_obj_invoker1<std::__1::__bind<void (*)(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig), std::__1::shared_ptr<yb::client::internal::Batcher> const&, std::__1::placeholders::__ph<1> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig&>, void, yb::Status const&>::invoke(boost::detail::function::function_buffer&, yb::Status const&) ()
#7  0x00005640c3f70398 in yb::client::internal::Batcher::Run() ()
#8  0x00005640c3f72656 in yb::client::internal::Batcher::FlushFinished() ()
#9  0x00005640c3f74a4d in yb::client::internal::Batcher::TabletLookupFinished(yb::client::internal::InFlightOp*, yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> >) ()
#10 0x00005640c3f759bc in std::__1::__function::__func<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0, std::__1::allocator<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0>, void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>::operator()(yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&) ()

#11 0x00005640c3fff05d in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>,  yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
** Is holding an exclusive lock in MetaCache::LookupTabletByKey/DoLookupTabletByKey **

#12 0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#13 0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#14 0x00005640c4017130 in yb::client::YBSession::FlushAsync(boost::function<void (yb::client::FlushStatus*)>) ()
#15 0x00005640c5225a0c in yb::tserver::PgClientServiceImpl::Perform(yb::tserver::PgPerformRequestPB const*, yb::tserver::PgPerformResponsePB*, yb::rpc::RpcContext) ()
#16 0x00005640c51c4487 in std::__1::__function::__func<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20, std::__1::allocator<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) ()
#17 0x00005640c51d374f in yb::tserver::PgClientServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#18 0x00005640c4f5f420 in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#19 0x00005640c4e845af in yb::rpc::InboundCall::InboundCallTask::Run() ()
#20 0x00005640c4f6e243 in yb::rpc::(anonymous namespace)::Worker::Execute() ()
#21 0x00005640c570ecb4 in yb::Thread::SuperviseThread(void*) ()
#22 0x00007f808b7c6694 in start_thread (arg=0x7f76d8caf700) at pthread_create.c:333
#23 0x00007f808bac341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
Jira: DB-12651

Test Plan:
Jenkins
yb_build.sh --cxx-test ql-stress-test QLStressTest.ReproMetaCacheDeadlock

Reviewers: rthallam, hsunder, qhu, timur

Reviewed By: hsunder

Subscribers: svc_phabricator, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37706
arpang added a commit that referenced this issue Sep 5, 2024
Summary:
The test CreateInitialSysCatalogSnapshot intermittently fails on asan build with the following stack trace:

   [m-1]     #1 0x5578be7959dc in pg_realloc /home/centos/code/yugabyte-db/src/postgres/src/common/../../../../../src/postgres/src/common/fe_memutils.c:72:8
   [m-1]     #2 0x5578be77b6b1 in readfile /home/centos/code/yugabyte-db/src/postgres/src/bin/initdb/../../../../../../src/postgres/src/bin/initdb/initdb.c:537:23
   [m-1]     #3 0x5578be77837b in bootstrap_template1 /home/centos/code/yugabyte-db/src/postgres/src/bin/initdb/../../../../../../src/postgres/src/bin/initdb/initdb.c:1434:14

This happens because during the execution of `bootstrap_template1()` in initdb, `bki_lines` first points to the memory allocated by readfile(). It then points to the memory allocated by replace_token(), without freeing the memory it was previously pointing to. Fix the issue by freeing up the memory allocated by readfile().

Note that there are more instances of memory leakage in initdb that are not detected by asan runs for some reason. For instance, memory allocated by replace_token() is never freed. These leakages are present in the YB master branch and upstream PG as well. Upstream PG doesn't care about it (https://www.postgresql.org/message-id/28473.1582440206%40sss.pgh.pa.us). The same reasoning applies to YB too. Also to prevent unnecessary deviation from PG code, we can let them remain.

Test Plan:
Jenkins: rebase: pg15
   ./yb_build.sh asan --cxx-test create_initial_sys_catalog_snapshot --gtest_filter CreateInitialSysCatalogSnapshotTest.CreateInitialSysCatalogSnapshot -n 100

Reviewers: jason

Reviewed By: jason

Subscribers: svc_phabricator, yql

Differential Revision: https://phorge.dev.yugabyte.com/D37736
amitanandaiyer added a commit that referenced this issue Sep 6, 2024
…ile holding the lock

Summary:
Original commit: c770d79 / D37706
Call callback in ScopeExit block only. Not while holding the lock.

Without this fix, it is possible that a thread can get into a deadlock, trying to request a shared_lock on a mutex, while already holding an exclusive lock on the same mutex:

This deadlock can be triggered if there are active read/write requests to a Table (from more than 1 thread)  right after the table had a tablet-split.

 If there is only 1 thread, it is unlikely to run into the deadlock, as the thread notices -- as part of the callback -- that the table's partition info is stale. Having a different thread refresh the partition version before the main thread checks if the table version is stale, is likely necessary to trigger the stack trace seen below.

e.g:
```
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00005640c3eb441b in std::__1::shared_timed_mutex::lock_shared() ()
#2  0x00005640c3ffcbff in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>, yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
#3  0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#4  0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#5  0x00005640c401aa76 in yb::client::(anonymous namespace)::BatcherFlushDone(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig) ()
#6  0x00005640c401b371 in boost::detail::function::void_function_obj_invoker1<std::__1::__bind<void (*)(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig), std::__1::shared_ptr<yb::client::internal::Batcher> const&, std::__1::placeholders::__ph<1> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig&>, void, yb::Status const&>::invoke(boost::detail::function::function_buffer&, yb::Status const&) ()
#7  0x00005640c3f70398 in yb::client::internal::Batcher::Run() ()
#8  0x00005640c3f72656 in yb::client::internal::Batcher::FlushFinished() ()
#9  0x00005640c3f74a4d in yb::client::internal::Batcher::TabletLookupFinished(yb::client::internal::InFlightOp*, yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> >) ()
#10 0x00005640c3f759bc in std::__1::__function::__func<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0, std::__1::allocator<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0>, void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>::operator()(yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&) ()

#11 0x00005640c3fff05d in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>,  yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
** Is holding an exclusive lock in MetaCache::LookupTabletByKey/DoLookupTabletByKey **

#12 0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#13 0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#14 0x00005640c4017130 in yb::client::YBSession::FlushAsync(boost::function<void (yb::client::FlushStatus*)>) ()
#15 0x00005640c5225a0c in yb::tserver::PgClientServiceImpl::Perform(yb::tserver::PgPerformRequestPB const*, yb::tserver::PgPerformResponsePB*, yb::rpc::RpcContext) ()
#16 0x00005640c51c4487 in std::__1::__function::__func<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20, std::__1::allocator<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) ()
#17 0x00005640c51d374f in yb::tserver::PgClientServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#18 0x00005640c4f5f420 in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#19 0x00005640c4e845af in yb::rpc::InboundCall::InboundCallTask::Run() ()
#20 0x00005640c4f6e243 in yb::rpc::(anonymous namespace)::Worker::Execute() ()
#21 0x00005640c570ecb4 in yb::Thread::SuperviseThread(void*) ()
#22 0x00007f808b7c6694 in start_thread (arg=0x7f76d8caf700) at pthread_create.c:333
#23 0x00007f808bac341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
Jira: DB-12651

Test Plan:
Jenkins
yb_build.sh --cxx-test ql-stress-test QLStressTest.ReproMetaCacheDeadlock

Reviewers: rthallam, hsunder, qhu, timur

Reviewed By: rthallam

Subscribers: ybase, svc_phabricator

Differential Revision: https://phorge.dev.yugabyte.com/D37788
amitanandaiyer added a commit that referenced this issue Sep 6, 2024
…e holding the lock

Summary:
Original commit: c770d79 / D37706
Call callback in ScopeExit block only. Not while holding the lock.

Without this fix, it is possible that a thread can get into a deadlock, trying to request a shared_lock on a mutex, while already holding an exclusive lock on the same mutex:

This deadlock can be triggered if there are active read/write requests to a Table (from more than 1 thread)  right after the table had a tablet-split.

 If there is only 1 thread, it is unlikely to run into the deadlock, as the thread notices -- as part of the callback -- that the table's partition info is stale. Having a different thread refresh the partition version before the main thread checks if the table version is stale, is likely necessary to trigger the stack trace seen below.

e.g:
```
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00005640c3eb441b in std::__1::shared_timed_mutex::lock_shared() ()
#2  0x00005640c3ffcbff in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>, yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
#3  0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#4  0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#5  0x00005640c401aa76 in yb::client::(anonymous namespace)::BatcherFlushDone(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig) ()
#6  0x00005640c401b371 in boost::detail::function::void_function_obj_invoker1<std::__1::__bind<void (*)(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig), std::__1::shared_ptr<yb::client::internal::Batcher> const&, std::__1::placeholders::__ph<1> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig&>, void, yb::Status const&>::invoke(boost::detail::function::function_buffer&, yb::Status const&) ()
#7  0x00005640c3f70398 in yb::client::internal::Batcher::Run() ()
#8  0x00005640c3f72656 in yb::client::internal::Batcher::FlushFinished() ()
#9  0x00005640c3f74a4d in yb::client::internal::Batcher::TabletLookupFinished(yb::client::internal::InFlightOp*, yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> >) ()
#10 0x00005640c3f759bc in std::__1::__function::__func<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0, std::__1::allocator<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0>, void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>::operator()(yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&) ()

#11 0x00005640c3fff05d in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>,  yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
** Is holding an exclusive lock in MetaCache::LookupTabletByKey/DoLookupTabletByKey **

#12 0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#13 0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#14 0x00005640c4017130 in yb::client::YBSession::FlushAsync(boost::function<void (yb::client::FlushStatus*)>) ()
#15 0x00005640c5225a0c in yb::tserver::PgClientServiceImpl::Perform(yb::tserver::PgPerformRequestPB const*, yb::tserver::PgPerformResponsePB*, yb::rpc::RpcContext) ()
#16 0x00005640c51c4487 in std::__1::__function::__func<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20, std::__1::allocator<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) ()
#17 0x00005640c51d374f in yb::tserver::PgClientServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#18 0x00005640c4f5f420 in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#19 0x00005640c4e845af in yb::rpc::InboundCall::InboundCallTask::Run() ()
#20 0x00005640c4f6e243 in yb::rpc::(anonymous namespace)::Worker::Execute() ()
#21 0x00005640c570ecb4 in yb::Thread::SuperviseThread(void*) ()
#22 0x00007f808b7c6694 in start_thread (arg=0x7f76d8caf700) at pthread_create.c:333
#23 0x00007f808bac341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
Jira: DB-12651

Test Plan:
Jenkins
yb_build.sh --cxx-test ql-stress-test QLStressTest.ReproMetaCacheDeadlock

Reviewers: rthallam, hsunder, qhu, timur

Reviewed By: rthallam

Subscribers: svc_phabricator, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37789
amitanandaiyer added a commit that referenced this issue Sep 6, 2024
…ile holding the lock

Summary:
Original commit: c770d79 / D37706
Call callback in ScopeExit block only. Not while holding the lock.

Without this fix, it is possible that a thread can get into a deadlock, trying to request a shared_lock on a mutex, while already holding an exclusive lock on the same mutex:

This deadlock can be triggered if there are active read/write requests to a Table (from more than 1 thread)  right after the table had a tablet-split.

 If there is only 1 thread, it is unlikely to run into the deadlock, as the thread notices -- as part of the callback -- that the table's partition info is stale. Having a different thread refresh the partition version before the main thread checks if the table version is stale, is likely necessary to trigger the stack trace seen below.

e.g:
```
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00005640c3eb441b in std::__1::shared_timed_mutex::lock_shared() ()
#2  0x00005640c3ffcbff in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>, yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
#3  0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#4  0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#5  0x00005640c401aa76 in yb::client::(anonymous namespace)::BatcherFlushDone(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig) ()
#6  0x00005640c401b371 in boost::detail::function::void_function_obj_invoker1<std::__1::__bind<void (*)(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig), std::__1::shared_ptr<yb::client::internal::Batcher> const&, std::__1::placeholders::__ph<1> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig&>, void, yb::Status const&>::invoke(boost::detail::function::function_buffer&, yb::Status const&) ()
#7  0x00005640c3f70398 in yb::client::internal::Batcher::Run() ()
#8  0x00005640c3f72656 in yb::client::internal::Batcher::FlushFinished() ()
#9  0x00005640c3f74a4d in yb::client::internal::Batcher::TabletLookupFinished(yb::client::internal::InFlightOp*, yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> >) ()
#10 0x00005640c3f759bc in std::__1::__function::__func<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0, std::__1::allocator<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0>, void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>::operator()(yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&) ()

#11 0x00005640c3fff05d in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>,  yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
** Is holding an exclusive lock in MetaCache::LookupTabletByKey/DoLookupTabletByKey **

#12 0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
#13 0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
#14 0x00005640c4017130 in yb::client::YBSession::FlushAsync(boost::function<void (yb::client::FlushStatus*)>) ()
#15 0x00005640c5225a0c in yb::tserver::PgClientServiceImpl::Perform(yb::tserver::PgPerformRequestPB const*, yb::tserver::PgPerformResponsePB*, yb::rpc::RpcContext) ()
#16 0x00005640c51c4487 in std::__1::__function::__func<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20, std::__1::allocator<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) ()
#17 0x00005640c51d374f in yb::tserver::PgClientServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#18 0x00005640c4f5f420 in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
#19 0x00005640c4e845af in yb::rpc::InboundCall::InboundCallTask::Run() ()
#20 0x00005640c4f6e243 in yb::rpc::(anonymous namespace)::Worker::Execute() ()
#21 0x00005640c570ecb4 in yb::Thread::SuperviseThread(void*) ()
#22 0x00007f808b7c6694 in start_thread (arg=0x7f76d8caf700) at pthread_create.c:333
#23 0x00007f808bac341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
Jira: DB-12651

Test Plan:
Jenkins
yb_build.sh --cxx-test ql-stress-test QLStressTest.ReproMetaCacheDeadlock

Reviewers: rthallam, hsunder, qhu, timur

Reviewed By: rthallam

Subscribers: ybase, svc_phabricator

Differential Revision: https://phorge.dev.yugabyte.com/D37831
es1024 added a commit that referenced this issue Sep 25, 2024
Summary:
It is possible for tablet peer's `tablet_` to be null when a rocksdb flush finishes. We call `tablet_->MaxPersistentOpId()` after flush to clean up recently applied transaction state, and this causes a SIGSEGV:
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str=<unavailable>) at string:898:9
    frame #1: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] yb::RWOperationCounter::resource_name(this=0x0000000000000378) const at operation_counter.h:95:12
    frame #2: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=0x0000000000000378, abort_status_holder=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.cc:190:62
    frame #3: 0x000055885b247ea6 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.h:140:9
    frame #4: 0x000055885b247e9f yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::tablet::Tablet::CreateScopedRWOperationBlockingRocksDbShutdownStart(this=0x0000000000000000, deadline=yb::CoarseTimePoint @ 0x00007f9455305d98) const at tablet.cc:3375:10
    frame #5: 0x000055885b247e90 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(this=0x0000000000000000, invalid_if_no_new_data=<unavailable>) const at tablet.cc:3540:32
    frame #6: 0x000055885b277f5e yb-tserver`yb::tablet::TabletPeer::MaxPersistentOpId(this=<unavailable>) const at tablet_peer.cc:946:23
    frame #7: 0x000055885b278e52 yb-tserver`non-virtual thunk to yb::tablet::TabletPeer::MaxPersistentOpId() const at tablet_peer.cc:0
    frame #8: 0x000055885b2dec44 yb-tserver`yb::tablet::TransactionParticipant::Impl::DoProcessRecentlyAppliedTransactions(this=0x0000153123151500, retryable_requests_flushed_op_id=<unavailable>, persist=<unavailable>) at transaction_participant.cc:2186:22
    frame #9: 0x000055885b2e0a8e yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions() [inlined] yb::tablet::TransactionParticipant::Impl::ProcessRecentlyAppliedTransactions(this=0x0000153123151500) at transaction_participant.cc:1440:27
    frame #10: 0x000055885b2e0a63 yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions(this=<unavailable>) at transaction_participant.cc:2629:17
    frame #11: 0x000055885b226093 yb-tserver`yb::tablet::Tablet::RocksDbListener::OnFlushCompleted(this=0x0000153110c2da58, (null)=<unavailable>, (null)=<unavailable>) at tablet.cc:503:34
    frame #12: 0x000055885af0e507 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) at db_impl.cc:2121:19
    frame #13: 0x000055885af0e275 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::FlushMemTableToOutputFile(this=0x0000153123150a80, cfd=0x000015317d651600, mutable_cf_options=0x00007f94553077d8, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048) at db_impl.cc:2008:3
    frame #14: 0x000055885af0d859 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::BackgroundFlush(this=0x0000153123150a80, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048, cfd=0x000015317d651600) at db_impl.cc:3399:10
    frame #15: 0x000055885af0d21f yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(this=0x0000153123150a80, cfd=<unavailable>) at db_impl.cc:3470:31
    frame #16: 0x000055885b024a53 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() at thread_posix.cc:133:5
    frame #17: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] rocksdb::ThreadPool::StartBGThreads(this=<unavailable>)::$_0::operator()() const at thread_posix.cc:172:5
    frame #18: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_0&>()()) std::__1::__invoke[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads()::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:340:25
    frame #19: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads(__args=<unavailable>)::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:415:5
    frame #20: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)[abi:ue170006]() at function.h:192:16
    frame #21: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)() at function.h:363:12
    frame #22: 0x000055885b9c1543 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x000015313de3b380)[abi:ue170006]() const at function.h:517:16
    frame #23: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x000015313de3b380)() const at function.h:1168:12
    frame #24: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(arg=0x000015313de3b320) at thread.cc:866:3
    frame #25: 0x00007f94994d81ca libpthread.so.0`start_thread + 234
    frame #26: 0x00007f9499729e73 libc.so.6`__clone + 67
```

This diff adds a null check and returns `OpId::Min()` (i.e. don't clean anything up) if `tablet_` is null and we cannot call `MaxPersistentOpId`.
Jira: DB-12915

Test Plan: Jenkins

Reviewers: sergei, rthallam

Reviewed By: sergei, rthallam

Subscribers: rthallam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38323
es1024 added a commit that referenced this issue Sep 26, 2024
…fter flush

Summary:
Original commit: 250a4d5 / D38323
It is possible for tablet peer's `tablet_` to be null when a rocksdb flush finishes. We call `tablet_->MaxPersistentOpId()` after flush to clean up recently applied transaction state, and this causes a SIGSEGV:
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str=<unavailable>) at string:898:9
    frame #1: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] yb::RWOperationCounter::resource_name(this=0x0000000000000378) const at operation_counter.h:95:12
    frame #2: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=0x0000000000000378, abort_status_holder=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.cc:190:62
    frame #3: 0x000055885b247ea6 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.h:140:9
    frame #4: 0x000055885b247e9f yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::tablet::Tablet::CreateScopedRWOperationBlockingRocksDbShutdownStart(this=0x0000000000000000, deadline=yb::CoarseTimePoint @ 0x00007f9455305d98) const at tablet.cc:3375:10
    frame #5: 0x000055885b247e90 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(this=0x0000000000000000, invalid_if_no_new_data=<unavailable>) const at tablet.cc:3540:32
    frame #6: 0x000055885b277f5e yb-tserver`yb::tablet::TabletPeer::MaxPersistentOpId(this=<unavailable>) const at tablet_peer.cc:946:23
    frame #7: 0x000055885b278e52 yb-tserver`non-virtual thunk to yb::tablet::TabletPeer::MaxPersistentOpId() const at tablet_peer.cc:0
    frame #8: 0x000055885b2dec44 yb-tserver`yb::tablet::TransactionParticipant::Impl::DoProcessRecentlyAppliedTransactions(this=0x0000153123151500, retryable_requests_flushed_op_id=<unavailable>, persist=<unavailable>) at transaction_participant.cc:2186:22
    frame #9: 0x000055885b2e0a8e yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions() [inlined] yb::tablet::TransactionParticipant::Impl::ProcessRecentlyAppliedTransactions(this=0x0000153123151500) at transaction_participant.cc:1440:27
    frame #10: 0x000055885b2e0a63 yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions(this=<unavailable>) at transaction_participant.cc:2629:17
    frame #11: 0x000055885b226093 yb-tserver`yb::tablet::Tablet::RocksDbListener::OnFlushCompleted(this=0x0000153110c2da58, (null)=<unavailable>, (null)=<unavailable>) at tablet.cc:503:34
    frame #12: 0x000055885af0e507 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) at db_impl.cc:2121:19
    frame #13: 0x000055885af0e275 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::FlushMemTableToOutputFile(this=0x0000153123150a80, cfd=0x000015317d651600, mutable_cf_options=0x00007f94553077d8, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048) at db_impl.cc:2008:3
    frame #14: 0x000055885af0d859 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::BackgroundFlush(this=0x0000153123150a80, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048, cfd=0x000015317d651600) at db_impl.cc:3399:10
    frame #15: 0x000055885af0d21f yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(this=0x0000153123150a80, cfd=<unavailable>) at db_impl.cc:3470:31
    frame #16: 0x000055885b024a53 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() at thread_posix.cc:133:5
    frame #17: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] rocksdb::ThreadPool::StartBGThreads(this=<unavailable>)::$_0::operator()() const at thread_posix.cc:172:5
    frame #18: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_0&>()()) std::__1::__invoke[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads()::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:340:25
    frame #19: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads(__args=<unavailable>)::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:415:5
    frame #20: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)[abi:ue170006]() at function.h:192:16
    frame #21: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)() at function.h:363:12
    frame #22: 0x000055885b9c1543 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x000015313de3b380)[abi:ue170006]() const at function.h:517:16
    frame #23: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x000015313de3b380)() const at function.h:1168:12
    frame #24: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(arg=0x000015313de3b320) at thread.cc:866:3
    frame #25: 0x00007f94994d81ca libpthread.so.0`start_thread + 234
    frame #26: 0x00007f9499729e73 libc.so.6`__clone + 67
```

This diff adds a null check and returns `OpId::Min()` (i.e. don't clean anything up) if `tablet_` is null and we cannot call `MaxPersistentOpId`.
Jira: DB-12915

Test Plan: Jenkins

Reviewers: sergei, rthallam

Reviewed By: rthallam

Subscribers: ybase, rthallam

Differential Revision: https://phorge.dev.yugabyte.com/D38431
pao214 added a commit that referenced this issue Oct 23, 2024
Summary:
### Issue

Test ClockSynchronizationTest.TestClockSkewError fails with tsan failure

```
WARNING: ThreadSanitizer: data race (pid=226462)
  Read of size 8 at 0x7b4000000bf0 by thread T82:
    #0 boost::intrusive_ptr<yb::Status::State>::get() const ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:181:16 (libyb_util.so+0x3c5994)
    #1 bool boost::operator==<yb::Status::State>(boost::intrusive_ptr<yb::Status::State> const&, std::nullptr_t) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:263:14 (libyb_util.so+0x3c5994)
    #2 yb::Status::ok() const ${YB_SRC_ROOT}/src/yb/util/status.h:120:51 (libyb_util.so+0x3c5994)
    #3 yb::MockClock::Now() ${YB_SRC_ROOT}/src/yb/util/physical_time.cc:141:3 (libyb_util.so+0x3c5994)
    #4 yb::server::HybridClock::NowWithError(yb::HybridTime*, unsigned long*) ${YB_SRC_ROOT}/src/yb/server/hybrid_clock.cc:155:22 (libserver_common.so+0xa5e12)
    #5 yb::server::HybridClock::NowRange() ${YB_SRC_ROOT}/src/yb/server/hybrid_clock.cc:144:3 (libserver_common.so+0xa5ceb)
    #6 yb::ClockBase::Now() ${YB_SRC_ROOT}/src/yb/common/clock.h:26:29 (libtserver.so+0x23a77a)
    #7 yb::tserver::Heartbeater::Thread::TryHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:437:41 (libtserver.so+0x23a77a)
    #8 yb::tserver::Heartbeater::Thread::DoHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:650:19 (libtserver.so+0x23d05f)
    #9 yb::tserver::Heartbeater::Thread::RunThread() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:697:16 (libtserver.so+0x23d74d)
    #10 decltype(*std::declval<yb::tserver::Heartbeater::Thread*&>().*std::declval<void (yb::tserver::Heartbeater::Thread::*&)()>()()) std::__invoke[abi:ue170006]<void (yb::tserver::Heartbeater::Thread::*&)(), yb::tserver::Heartbeater::Thread*&, void>(void (yb::tserver::Heartbeater::Thread::*&)(), yb::tserver::Heartbeater::Thread*&) ${YB_THIRDPARTY_DIR}/installed/tsan/libcxx/include/c++/v1/__type_traits/invoke.h:308:25 (libtserver.so+0x24206b)
...

  Previous write of size 8 at 0x7b4000000bf0 by main thread:
    #0 boost::intrusive_ptr<yb::Status::State>::swap(boost::intrusive_ptr<yb::Status::State>&) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:210:16 (libyb_util.so+0x3c5c54)
    #1 boost::intrusive_ptr<yb::Status::State>::operator=(boost::intrusive_ptr<yb::Status::State>&&) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:122:61 (libyb_util.so+0x3c5c54)
    #2 yb::Status::operator=(yb::Status&&) ${YB_SRC_ROOT}/src/yb/util/status.h:98:7 (libyb_util.so+0x3c5c54)
    #3 yb::MockClock::Set(yb::PhysicalTime const&) ${YB_SRC_ROOT}/src/yb/util/physical_time.cc:147:16 (libyb_util.so+0x3c5c54)
    #4 yb::ClockSynchronizationTest_TestClockSkewError_Test::TestBody() ${YB_SRC_ROOT}/src/yb/integration-tests/clock_synchronization-itest.cc:131:15 (clock_synchronization-itest+0x12e3ca)
    #5 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2599:10 (libgtest.so.1.12.1+0x894f9)
    #6 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2635:14 (libgtest.so.1.12.1+0x894f9)
    #7 testing::Test::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2674:5 (libgtest.so.1.12.1+0x6123f)
    #8 testing::TestInfo::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2853:11 (libgtest.so.1.12.1+0x62a05)
    #9 testing::TestSuite::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:3012:30 (libgtest.so.1.12.1+0x63f04)
    #10 testing::internal::UnitTestImpl::RunAllTests() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:5870:44 (libgtest.so.1.12.1+0x7be3d)
...

**SUMMARY**: ThreadSanitizer: data race ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:181:16 in boost::intrusive_ptr<yb::Status::State>::get() const
```

### Fix

Do what value_ does => wrap mock_status_ in boost::atomic.
Jira: DB-13604

Test Plan:
Jenkins

Ran

```
./yb_build.sh tsan --cxx-test integration-tests_clock_synchronization-itest --gtest_filter ClockSynchronizationTest.TestClockSkewError -n 50
```

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D39315
pao214 added a commit that referenced this issue Oct 24, 2024
Summary:
Original commit: 21d7ad3 / D39315
### Issue

Test ClockSynchronizationTest.TestClockSkewError fails with tsan failure

```
WARNING: ThreadSanitizer: data race (pid=226462)
  Read of size 8 at 0x7b4000000bf0 by thread T82:
    #0 boost::intrusive_ptr<yb::Status::State>::get() const ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:181:16 (libyb_util.so+0x3c5994)
    #1 bool boost::operator==<yb::Status::State>(boost::intrusive_ptr<yb::Status::State> const&, std::nullptr_t) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:263:14 (libyb_util.so+0x3c5994)
    #2 yb::Status::ok() const ${YB_SRC_ROOT}/src/yb/util/status.h:120:51 (libyb_util.so+0x3c5994)
    #3 yb::MockClock::Now() ${YB_SRC_ROOT}/src/yb/util/physical_time.cc:141:3 (libyb_util.so+0x3c5994)
    #4 yb::server::HybridClock::NowWithError(yb::HybridTime*, unsigned long*) ${YB_SRC_ROOT}/src/yb/server/hybrid_clock.cc:155:22 (libserver_common.so+0xa5e12)
    #5 yb::server::HybridClock::NowRange() ${YB_SRC_ROOT}/src/yb/server/hybrid_clock.cc:144:3 (libserver_common.so+0xa5ceb)
    #6 yb::ClockBase::Now() ${YB_SRC_ROOT}/src/yb/common/clock.h:26:29 (libtserver.so+0x23a77a)
    #7 yb::tserver::Heartbeater::Thread::TryHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:437:41 (libtserver.so+0x23a77a)
    #8 yb::tserver::Heartbeater::Thread::DoHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:650:19 (libtserver.so+0x23d05f)
    #9 yb::tserver::Heartbeater::Thread::RunThread() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:697:16 (libtserver.so+0x23d74d)
    #10 decltype(*std::declval<yb::tserver::Heartbeater::Thread*&>().*std::declval<void (yb::tserver::Heartbeater::Thread::*&)()>()()) std::__invoke[abi:ue170006]<void (yb::tserver::Heartbeater::Thread::*&)(), yb::tserver::Heartbeater::Thread*&, void>(void (yb::tserver::Heartbeater::Thread::*&)(), yb::tserver::Heartbeater::Thread*&) ${YB_THIRDPARTY_DIR}/installed/tsan/libcxx/include/c++/v1/__type_traits/invoke.h:308:25 (libtserver.so+0x24206b)
...

  Previous write of size 8 at 0x7b4000000bf0 by main thread:
    #0 boost::intrusive_ptr<yb::Status::State>::swap(boost::intrusive_ptr<yb::Status::State>&) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:210:16 (libyb_util.so+0x3c5c54)
    #1 boost::intrusive_ptr<yb::Status::State>::operator=(boost::intrusive_ptr<yb::Status::State>&&) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:122:61 (libyb_util.so+0x3c5c54)
    #2 yb::Status::operator=(yb::Status&&) ${YB_SRC_ROOT}/src/yb/util/status.h:98:7 (libyb_util.so+0x3c5c54)
    #3 yb::MockClock::Set(yb::PhysicalTime const&) ${YB_SRC_ROOT}/src/yb/util/physical_time.cc:147:16 (libyb_util.so+0x3c5c54)
    #4 yb::ClockSynchronizationTest_TestClockSkewError_Test::TestBody() ${YB_SRC_ROOT}/src/yb/integration-tests/clock_synchronization-itest.cc:131:15 (clock_synchronization-itest+0x12e3ca)
    #5 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2599:10 (libgtest.so.1.12.1+0x894f9)
    #6 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2635:14 (libgtest.so.1.12.1+0x894f9)
    #7 testing::Test::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2674:5 (libgtest.so.1.12.1+0x6123f)
    #8 testing::TestInfo::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2853:11 (libgtest.so.1.12.1+0x62a05)
    #9 testing::TestSuite::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:3012:30 (libgtest.so.1.12.1+0x63f04)
    #10 testing::internal::UnitTestImpl::RunAllTests() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:5870:44 (libgtest.so.1.12.1+0x7be3d)
...

**SUMMARY**: ThreadSanitizer: data race ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:181:16 in boost::intrusive_ptr<yb::Status::State>::get() const
```

### Fix

Use a mutex to prevent data race on mock_status_.

Jira: DB-13604

Test Plan:
Jenkins

Ran

```
./yb_build.sh tsan --cxx-test integration-tests_clock_synchronization-itest --gtest_filter ClockSynchronizationTest.TestClockSkewError -n 50
```

Backport-through: 2024.2

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D39359
manav-yb pushed a commit that referenced this issue Nov 20, 2024
…tion manager

Summary:
Set pthread_attr_setstacksize to 512 KB in ysql connection manager. This is to fix crashes involving Alma 9 machines.
```
#0  0x000055e430191fd7 in tcmalloc::tcmalloc_internal::PageTracker::Get(tcmalloc::tcmalloc_internal::Length) ()
#1  0x000055e430192774 in tcmalloc::tcmalloc_internal::HugePageFiller<tcmalloc::tcmalloc_internal::PageTracker>::TryGet(tcmalloc::tcmalloc_internal::Length, unsigned long)
    ()
#2  0x000055e430160b3a in tcmalloc::tcmalloc_internal::HugePageAwareAllocator::New(tcmalloc::tcmalloc_internal::Length, unsigned long) ()
#3  0x000055e4301437a2 in void* tcmalloc::tcmalloc_internal::SampleifyAllocation<tcmalloc::tcmalloc_internal::Static, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy> >(tcmalloc::tcmalloc_internal::Static&, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, unsigned long, unsigned long, void*, tcmalloc::tcmalloc_internal::Span*, unsigned long*) ()
#4  0x000055e43014339b in void* slow_alloc<tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, decltype(nullptr)>(tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, decltype(nullptr))
    ()
#5  0x000055e4301404e6 in malloc ()
#6  0x00007fc3a67b1d7e in ssl3_setup_write_buffer () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#7  0x00007fc3a67ae824 in do_ssl3_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#8  0x00007fc3a67ae3e1 in ssl3_write_bytes () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#9  0x00007fc3a67cc251 in ssl3_do_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#10 0x00007fc3a67c1a6a in state_machine () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#11 0x000055e43013d303 in mm_tls_handshake_cb (handle=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/tls.c:453
#12 0x000055e43013b9e7 in mm_epoll_step (poll=0x37beffd71a60, timeout=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/epoll.c:79
#13 0x000055e43013b386 in mm_loop_step (loop=0x37beffd72980) at ../../src/odyssey/third_party/machinarium/sources/loop.c:64
#14 machine_main (arg=0x37beffd72780) at ../../src/odyssey/third_party/machinarium/sources/machine.c:56
#15 0x00007fc3a5e89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#16 0x00007fc3a5f0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81```
Note that, this 512 KB value is the same as the value used in tserver and master processes via the min_thread_stack_size_bytes GFlag introduced in https://phorge.dev.yugabyte.com/D38053.
Jira: DB-13388

Test Plan: Jenkins: enable connection manager, all tests

Reviewers: skumar, stiwary

Reviewed By: stiwary

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40087
manav-yb pushed a commit that referenced this issue Nov 20, 2024
…KB in ysql connection manager

Summary:
Original commit: None / D40087
Set pthread_attr_setstacksize to 512 KB in ysql connection manager. This is to fix crashes involving Alma 9 machines.
```
#0  0x000055e430191fd7 in tcmalloc::tcmalloc_internal::PageTracker::Get(tcmalloc::tcmalloc_internal::Length) ()
#1  0x000055e430192774 in tcmalloc::tcmalloc_internal::HugePageFiller<tcmalloc::tcmalloc_internal::PageTracker>::TryGet(tcmalloc::tcmalloc_internal::Length, unsigned long)
    ()
#2  0x000055e430160b3a in tcmalloc::tcmalloc_internal::HugePageAwareAllocator::New(tcmalloc::tcmalloc_internal::Length, unsigned long) ()
#3  0x000055e4301437a2 in void* tcmalloc::tcmalloc_internal::SampleifyAllocation<tcmalloc::tcmalloc_internal::Static, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy> >(tcmalloc::tcmalloc_internal::Static&, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, unsigned long, unsigned long, void*, tcmalloc::tcmalloc_internal::Span*, unsigned long*) ()
#4  0x000055e43014339b in void* slow_alloc<tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, decltype(nullptr)>(tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, decltype(nullptr))
    ()
#5  0x000055e4301404e6 in malloc ()
#6  0x00007fc3a67b1d7e in ssl3_setup_write_buffer () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#7  0x00007fc3a67ae824 in do_ssl3_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#8  0x00007fc3a67ae3e1 in ssl3_write_bytes () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#9  0x00007fc3a67cc251 in ssl3_do_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#10 0x00007fc3a67c1a6a in state_machine () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#11 0x000055e43013d303 in mm_tls_handshake_cb (handle=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/tls.c:453
#12 0x000055e43013b9e7 in mm_epoll_step (poll=0x37beffd71a60, timeout=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/epoll.c:79
#13 0x000055e43013b386 in mm_loop_step (loop=0x37beffd72980) at ../../src/odyssey/third_party/machinarium/sources/loop.c:64
#14 machine_main (arg=0x37beffd72780) at ../../src/odyssey/third_party/machinarium/sources/machine.c:56
#15 0x00007fc3a5e89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#16 0x00007fc3a5f0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81```
Note that, this 512 KB value is the same as the value used in tserver and master processes via the min_thread_stack_size_bytes GFlag introduced in https://phorge.dev.yugabyte.com/D38053.
Jira: DB-13388

Test Plan: Jenkins: enable connection manager, all tests

Reviewers: skumar, stiwary

Reviewed By: stiwary

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40088
manav-yb pushed a commit that referenced this issue Nov 20, 2024
…2 KB in ysql connection manager

Summary:
Original commit: None / D40088
Set pthread_attr_setstacksize to 512 KB in ysql connection manager. This is to fix crashes involving Alma 9 machines.
```
#0  0x000055e430191fd7 in tcmalloc::tcmalloc_internal::PageTracker::Get(tcmalloc::tcmalloc_internal::Length) ()
#1  0x000055e430192774 in tcmalloc::tcmalloc_internal::HugePageFiller<tcmalloc::tcmalloc_internal::PageTracker>::TryGet(tcmalloc::tcmalloc_internal::Length, unsigned long)
    ()
#2  0x000055e430160b3a in tcmalloc::tcmalloc_internal::HugePageAwareAllocator::New(tcmalloc::tcmalloc_internal::Length, unsigned long) ()
#3  0x000055e4301437a2 in void* tcmalloc::tcmalloc_internal::SampleifyAllocation<tcmalloc::tcmalloc_internal::Static, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy> >(tcmalloc::tcmalloc_internal::Static&, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, unsigned long, unsigned long, void*, tcmalloc::tcmalloc_internal::Span*, unsigned long*) ()
#4  0x000055e43014339b in void* slow_alloc<tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, decltype(nullptr)>(tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, decltype(nullptr))
    ()
#5  0x000055e4301404e6 in malloc ()
#6  0x00007fc3a67b1d7e in ssl3_setup_write_buffer () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#7  0x00007fc3a67ae824 in do_ssl3_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#8  0x00007fc3a67ae3e1 in ssl3_write_bytes () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#9  0x00007fc3a67cc251 in ssl3_do_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#10 0x00007fc3a67c1a6a in state_machine () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
#11 0x000055e43013d303 in mm_tls_handshake_cb (handle=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/tls.c:453
#12 0x000055e43013b9e7 in mm_epoll_step (poll=0x37beffd71a60, timeout=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/epoll.c:79
#13 0x000055e43013b386 in mm_loop_step (loop=0x37beffd72980) at ../../src/odyssey/third_party/machinarium/sources/loop.c:64
#14 machine_main (arg=0x37beffd72780) at ../../src/odyssey/third_party/machinarium/sources/machine.c:56
#15 0x00007fc3a5e89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#16 0x00007fc3a5f0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81```
Note that, this 512 KB value is the same as the value used in tserver and master processes via the min_thread_stack_size_bytes GFlag introduced in https://phorge.dev.yugabyte.com/D38053.
Jira: DB-13388

Test Plan: Jenkins: enable connection manager, all tests

Reviewers: skumar, stiwary

Reviewed By: stiwary

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40089
arpang added a commit that referenced this issue Nov 27, 2024
…ABLE

Summary:
In some specific cases, a failed ALTER TABLE operation does not adequately invalidate the table cache, causing a "schema mismatch" error in the subsequent command. Consider the following example:

```
CREATE TABLE pk(a int primary key);
INSERT INTO pk values (1);
CREATE TABLE fk(a int);
INSERT INTO fk values (2);
ALTER TABLE fk ADD FOREIGN KEY (a) REFERENCES pk; -- fails due to FK constraint violation
BEGIN;
SELECT * from pk; -- throws schema version mismatch error
COMMIT;
```

Before the start of ALTER TABLE, the schema version of both fk and pk is zero.

The above ALTER TABLE does the following:
1. Increments schema version of both relations to 1 (YBCPrepareAlterTableCmd() increments schema version of both - the main relation and dependent relations).
2. Invalidates any pre-existing table schema of the two relations (ATRewriteCatalogs()).
3. Loads schema of both the relations (version 1) to check for FK violation, which it finds to be there (ATRewriteTables()).
4. ybAlteredTableIds contains only the oid corresponding to fk. Hence, only fk's table cache is invalidated during error recovery. This is where the bug is (explained below).
5. DDL txn verification on master increments the schema version of both the relations to 2 (see YsqlDdlTxnAlterTableHelper()).

Because of step #4, the table cache still contains the stale entry of pk (corresponding to version 1). The subsequent SELECT operation ends up using it. This leads to the "schema version mismatch, expected 2, got 1" error.

Resolution:
On a failure, in step #4, invalidate the table cache entries of all the relations whose schema is incremented at the start (step #1) and then at the end by YBResetDdlState() (step #5). This is done by including the oids of the dependent relations in `ybAlteredTableIds`.
Jira: DB-14126

Test Plan:
   ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTable'

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40275
Huqicheng added a commit that referenced this issue Dec 12, 2024
…on::GetDocPaths

Summary:
```
../../src/yb/docdb/conflict_resolution.cc:865:5: runtime error: load of value 4458368, which is not a valid value for type 'IsolationLevel'
[m-1] W1212 08:27:12.423846 99051 master_heartbeat_service.cc:426] Could not get YSQL db catalog versions for heartbeat response:
[m-1] W1212 08:27:12.425787 99052 master_heartbeat_service.cc:426] Could not get YSQL db catalog versions for heartbeat response:
    #0 0x7fd9ae920c0e in yb::docdb::(anonymous namespace)::GetWriteRequestIntents(std::vector<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>, std::allocator<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>>> const&, yb::dockv::KeyBytes*, yb::StronglyTypedBool<yb::dockv::PartialRangeKeyIntents_Tag>, yb::IsolationLevel) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:865:5
    #1 0x7fd9ae9190ce in yb::docdb::(anonymous namespace)::TransactionConflictResolverContext::GetRequestedIntents(yb::docdb::(anonymous namespace)::ConflictResolver*, yb::dockv::KeyBytes*) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1130:22
    #2 0x7fd9ae90fcee in yb::docdb::(anonymous namespace)::TransactionConflictResolverContext::ReadConflicts(yb::docdb::(anonymous namespace)::ConflictResolver*) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1164:22
    #3 0x7fd9ae8fd333 in yb::docdb::(anonymous namespace)::ConflictResolver::Resolve() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:198:26
    #4 0x7fd9ae8fc3b2 in yb::docdb::(anonymous namespace)::WaitOnConflictResolver::TryPreWait() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:697:25
    #5 0x7fd9ae8fc3b2 in yb::docdb::(anonymous namespace)::WaitOnConflictResolver::Run() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:670:7
    #6 0x7fd9ae8fa47d in yb::docdb::ResolveTransactionConflicts(std::vector<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>, std::allocator<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>>> const&, yb::docdb::ConflictManagementPolicy, yb::docdb::LWKeyValueWriteBatchPB const&, yb::HybridTime, yb::HybridTime, long, unsigned long, long, yb::docdb::DocDB const&, yb::StronglyTypedBool<yb::dockv::PartialRangeKeyIntents_Tag>, yb::TransactionStatusManager*, yb::tablet::TabletMetrics*, yb::docdb::LockBatch*, yb::docdb::WaitQueue*, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, boost::function<void (yb::Result<yb::HybridTime> const&)>) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1401:15
    #7 0x7fd9aff22ca9 in yb::tablet::WriteQuery::DoExecute() ${YB_SRC_ROOT}/src/yb/tablet/write_query.cc:801:10
    #8 0x7fd9aff1fda3 in yb::tablet::WriteQuery::Execute(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/write_query.cc:618:28
    #9 0x7fd9afb9b708 in yb::tablet::Tablet::AcquireLocksAndPerformDocOperations(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/tablet.cc:2147:3
    #10 0x7fd9afd166d7 in yb::tablet::TabletPeer::WriteAsync(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/tablet_peer.cc:704:12
    #11 0x7fd9b0fd4d57 in yb::tserver::TabletServiceImpl::PerformWrite(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext*) ${YB_SRC_ROOT}/src/yb/tserver/tablet_service.cc:2325:16
    #12 0x7fd9b0fd711a in yb::tserver::TabletServiceImpl::Write(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext) ${YB_SRC_ROOT}/src/yb/tserver/tablet_service.cc:2345:17
    #13 0x7fd9a91098ec in yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext) const ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:848:9
    #14 0x7fd9a91098ec in auto yb::rpc::HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::tserver::WriteRequestPB, yb::tserver::WriteResponsePB>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)>(std::shared_ptr<yb::rpc::InboundCall>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)) ${YB_SRC_ROOT}/src/yb/rpc/local_call.h:126:7
    #15 0x7fd9a91098ec in yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:846:7
    #16 0x7fd9a91098ec in decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&>()(std::declval<std::shared_ptr<yb::rpc::InboundCall>>())) std::__invoke[abi:ue170006]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__type_traits/invoke.h:340:25
    #17 0x7fd9a91098ec in void std::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__type_traits/invoke.h:415:5
    #18 0x7fd9a91098ec in std::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0, std::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0>, void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ue170006](std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:192:16
    #19 0x7fd9a91098ec in std::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0, std::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0>, void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:363:12
    #20 0x7fd9a9108b34 in std::__function::__value_func<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ue170006](std::shared_ptr<yb::rpc::InboundCall>&&) const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:16
    #21 0x7fd9a9108b34 in std::function<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>) const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:12
    #22 0x7fd9a9108b34 in yb::tserver::TabletServerServiceIf::Handle(std::shared_ptr<yb::rpc::InboundCall>) ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:831:3
    #23 0x7fd9a5b7a892 in yb::rpc::ServicePoolImpl::Handle(std::shared_ptr<yb::rpc::InboundCall>) ${YB_SRC_ROOT}/src/yb/rpc/service_pool.cc:269:19
    #24 0x7fd9a59fcea5 in yb::rpc::InboundCall::InboundCallTask::Run() ${YB_SRC_ROOT}/src/yb/rpc/inbound_call.cc:317:13
    #25 0x7fd9a5bad15d in yb::rpc::(anonymous namespace)::Worker::Execute() ${YB_SRC_ROOT}/src/yb/rpc/thread_pool.cc:115:15
    #26 0x7fd9a42ad037 in std::__function::__value_func<void ()>::operator()[abi:ue170006]() const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:16
    #27 0x7fd9a42ad037 in std::function<void ()>::operator()() const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:12
    #28 0x7fd9a42ad037 in yb::Thread::SuperviseThread(void*) ${YB_SRC_ROOT}/src/yb/util/thread.cc:895:3
    #29 0x56176114abea in asan_thread_start(void*) ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:225:31
    #30 0x7fd99f1071c9 in start_thread (/lib64/libpthread.so.0+0x81c9) (BuildId: 1962602ac5dc3011b6d697b38b05ddc244197114)
    #31 0x7fd99eb488d2 in clone (/lib64/libc.so.6+0x398d2) (BuildId: 37e4ac6a7fb96950b0e6bf72d73d94f3296c77eb)

UndefinedBehaviorSanitizer: undefined-behavior ../../src/yb/docdb/conflict_resolution.cc:865:5 in
```

IsolationLevel is left uninitialized at `PgsqlLockOperation::GetDocPaths` so the ASAN builds can get the `not a valid value` faliure.
Jira: DB-14458

Test Plan: advisory_lock-test

Reviewers: bkolagani

Reviewed By: bkolagani

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D40638
pkj415 added a commit that referenced this issue Jan 10, 2025
Summary:
Fix asan failure for tests running the isolation regress
framework: org.yb.pgsql.TestPgRegressWaitQueues,
org.yb.pgsql.TestPgRegressIsolationWithoutWaitQueues and
org.yb.pgsql.TestPgRegressIsolation.

The failure is:
```
+=================================================================
+==31476==ERROR: LeakSanitizer: detected memory leaks
+
+Direct leak of 864 byte(s) in 4 object(s) allocated from:
+    #0 0x55fc1116466e in malloc /opt/yb-build/llvm/yb-llvm-v17.0.6-yb-1-1720414757-9b881774-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
+    #1 0x7f94aa200490 in PQmakeEmptyPGresult /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:164:24
+    #2 0x7f94aa21df9d in getRowDescriptions /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-protocol3.c
+    #3 0x7f94aa21c0bc in pqParseInput3 /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11
+    #4 0x7f94aa207028 in parseInput /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2014:2
+    #5 0x7f94aa207028 in PQgetResult /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2100:3
+    #6 0x7f94aa208437 in PQexecFinish /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2417:19
+    #7 0x7f94aa208437 in PQexecParams /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2279:9
+    #8 0x55fc111a2881 in main /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/test/isolation/../../../../../../src/postgres/src/test/isolation/isolationtester.c:201:9
+    #9 0x7f94a8caa7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: 37e4ac6a7fb96950b0e6bf72d73d94f3296c77eb)
+
+Objects leaked above:
+0x511000004640 (216 bytes)
+0x511000008ec0 (216 bytes)
+0x5110000133c0 (216 bytes)
+0x5110000179c0 (216 bytes)
```
Jira: DB-13172

Test Plan: Jenkins: test regex: .*RegressIsolation.*|.*RegressWaitQueues.*

Reviewers: patnaik.balivada

Reviewed By: patnaik.balivada

Subscribers: jason, yql

Differential Revision: https://phorge.dev.yugabyte.com/D40987
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants