-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] Speedup local bootstrap: do not do LogReader::Open twice #2124
Comments
@bmatican - |
Forgot to reply to this @kmuthukk , it's not related to that optimization work, nor a regression from @rahuldesirazu 's work on not rebuilding indexes, as much as before that, the code would need 2 passes to open the logs (rebuild, actually read) and we just never removed that, since his work was still under a feature flag. |
…nabled Summary: Use TabletBootstrap::log_->reader_ instead of instantiating TabletBootstrap::log_reader_ Test Plan: Diff used for BenchmarkA: ``` diff --git a/bin/test_bsopt.py b/bin/test_bsopt.py index 81767050f..10c4aaf07 100755 --- a/bin/test_bsopt.py +++ b/bin/test_bsopt.py @@ -17,13 +17,13 @@ YUGABYTE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) YBSAMPLEAPPS_DIR = os.path.normpath(os.path.join(YUGABYTE_DIR, '..', 'yb-sample-apps')) # Number of tablets to use -NUM_TABLETS = 2 +NUM_TABLETS = 1 # Time to run the CQL test for (seconds) -CQL_TEST_TIME_SEC = 10 * 60 +CQL_TEST_TIME_SEC = 1 * 60 # Number of trials, both for the optimization on and off -NUM_TRIALS = 15 +NUM_TRIALS = 10 def test_cluster(opt_on): @@ -32,7 +32,7 @@ def test_cluster(opt_on): "java", "-jar", YBSAMPLEAPPS_DIR + "/target/yb-sample-apps.jar", "--workload", "CassandraBatchKeyValue", "--nodes", "127.0.0.1:9042", "--num_threads_read", "1", "--num_threads_write", "1", - "--num_unique_keys", "100000", "--nouuid", "--value_size", "1024", "--batch_size", "64"] + "--num_unique_keys", "1000000", "--nouuid", "--value_size", "1024", "--batch_size", "64"] proc = subprocess.Popen(args) # After time is up, kill the test @@ -41,17 +41,10 @@ def test_cluster(opt_on): proc.wait() timer.cancel() - # Use yb-admin to flush all writes to RocksDB - subprocess.check_call( - YUGABYTE_DIR + - "/build/latest/bin/yb-admin -master_addresses 127.0.0.1 flush_table " + - "ybdemo_keyspace cassandrakeyvalue 60", - shell=True) - # Restart the cluster subprocess.check_call([YUGABYTE_DIR + "/bin/yb-ctl", "stop"]) subprocess.check_call( - YUGABYTE_DIR + "/bin/yb-ctl start --tserver_flags \"skip_flushed_entries=" + + YUGABYTE_DIR + "/bin/yb-ctl start --tserver_flags \"skip_flushed_entries=1,log_min_segments_to_retain=4,skip_log_reader_double_open=" + str(opt_on).lower() + "\"", shell=True) time.sleep(10) @@ -81,7 +74,7 @@ def remake_cluster(): subprocess.check_call(YUGABYTE_DIR + "/bin/yb-ctl stop", shell=True) subprocess.check_call(YUGABYTE_DIR + "/bin/yb-ctl destroy", shell=True) subprocess.check_call( - YUGABYTE_DIR + "/bin/yb-ctl --num_shards_per_tserver " + str(NUM_TABLETS) + " create", + YUGABYTE_DIR + "/bin/yb-ctl --replication_factor 1 --num_shards_per_tserver " + str(NUM_TABLETS) + " create", shell=True) time.sleep(5) diff --git a/src/yb/tablet/tablet_bootstrap.cc b/src/yb/tablet/tablet_bootstrap.cc index 30620d626..2d7b56414 100644 --- a/src/yb/tablet/tablet_bootstrap.cc +++ b/src/yb/tablet/tablet_bootstrap.cc @@ -63,6 +63,11 @@ DEFINE_bool(skip_wal_rewrite, true, TAG_FLAG(skip_wal_rewrite, experimental); TAG_FLAG(skip_wal_rewrite, runtime); +DEFINE_bool(skip_log_reader_double_open, true, + "Do not open log reader twice if skip_wal_rewrite is on."); +TAG_FLAG(skip_log_reader_double_open, experimental); +TAG_FLAG(skip_log_reader_double_open, runtime); + DEFINE_test_flag(double, fault_crash_during_log_replay, 0.0, "Fraction of the time when the tablet will crash immediately " "after processing a log entry during log replay."); @@ -349,7 +354,7 @@ Status TabletBootstrap::Bootstrap(shared_ptr<TabletClass>* rebuilt_tablet, bool needs_recovery; RETURN_NOT_OK(PrepareToReplay(&needs_recovery)); - if (needs_recovery && !skip_wal_rewrite_) { + if (needs_recovery && (!skip_wal_rewrite_ || !FLAGS_skip_log_reader_double_open)) { RETURN_NOT_OK(OpenLogReader()); } ``` BenchmarkA Results: Without optimization: 1557139 1610705 1757151 1733478 2099114 1553073 1716388 1882349 1680560 1611217 Mean = 1720117.4, Stdev = 167126.8 (9.7% of Mean) With optimization: 1557828 2027292 1562075 1810216 1619249 1597930 1592700 1919192 1564048 1744308 Mean = 1699483.8, Stdev = 168471.8 (9.9% of Mean) Change in Mean = -1% (not significant) Setup: GCP - Ubuntu - Release build - 8 Cores - 16 GB RAM BenchmarkB: PF-1 Cluster Delay in bootstrapping all tablets - without Optimization: 110s Delay in bootstrapping all tablets - with Optimization: 102s Improvement: 7% LogReader::Open() overhead observed using `perf record -ag` and `perf report -i perf.data`: 2.4% Reviewers: rahuldesirazu Reviewed By: rahuldesirazu Subscribers: kannan, hector, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D7352
Did a perf record/report during local bootstrap and noticed the above picture.
It seems that we are opening a
LogReader
instance twice and taking about the same time under it (whooping ~20%). I suspect we could unify that into one single instance, created in theTabletBootstrap::OpenLogReader
path and then passed into the newLog
instance explicitly...cc @kmuthukk @rahuldesirazu @hectorgcr
The text was updated successfully, but these errors were encountered: