-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2DC] Starting replication after a backup restore should be efficient and safe #3319
Labels
area/cdc
Change Data Capture
Comments
hectorgcr
added a commit
that referenced
this issue
Feb 13, 2020
Summary: With this new API we will be able to start log retention by op id without setting up any replication. We want this so that we can take a snapshot and restore it in a new cluster and keep all the necessary log files that will be needed once replication is enabled on the new cluster. Test Plan: New unit tests TwoDCTest.SetupUniverseReplicationWithProducerBootstrapId and CDCServiceTest.TestBootstrapProducer. CDCServiceTest.TestBootstrapProducer tests that whenever a producer bootstrap is generated, the correct rows are inserted into cdc_state table. Also manual test: ``` ++ ./bin/yb-ctl create --rf 3 --data_dir /tmp/yb-datacenter-A --ip_start 11 Creating cluster. Waiting for cluster to be ready. ---------------------------------------------------------------------------------------------------- | Node Count: 3 | Replication Factor: 3 | ---------------------------------------------------------------------------------------------------- | JDBC : jdbc:postgresql://127.0.0.11:5433/postgres | | YSQL Shell : build/latest/bin/ysqlsh -h 127.0.0.11 | | YCQL Shell : build/latest/bin/cqlsh 127.0.0.11 | | YEDIS Shell : build/latest/bin/redis-cli -h 127.0.0.11 | | Web UI : http://127.0.0.11:7000/ | | Cluster Data : /tmp/yb-datacenter-A | ---------------------------------------------------------------------------------------------------- For more info, please use: yb-ctl --data_dir /tmp/yb-datacenter-A status ++ ./bin/yb-ctl create --rf 3 --data_dir /tmp/yb-datacenter-B --ip_start 22 Creating cluster. Waiting for cluster to be ready. ---------------------------------------------------------------------------------------------------- | Node Count: 3 | Replication Factor: 3 | ---------------------------------------------------------------------------------------------------- | JDBC : jdbc:postgresql://127.0.0.22:5433/postgres | | YSQL Shell : build/latest/bin/ysqlsh -h 127.0.0.22 | | YCQL Shell : build/latest/bin/cqlsh 127.0.0.22 | | YEDIS Shell : build/latest/bin/redis-cli -h 127.0.0.22 | | Web UI : http://127.0.0.22:7000/ | | Cluster Data : /tmp/yb-datacenter-B | ---------------------------------------------------------------------------------------------------- For more info, please use: yb-ctl --data_dir /tmp/yb-datacenter-B status ++ for i in 4 5 6 ++ ./bin/yb-ctl add_node --data_dir /tmp/yb-datacenter-A Adding node. Waiting for cluster to be ready. ---------------------------------------------------------------------------------------------------- | Node 4: yb-tserver (pid 28465) | ---------------------------------------------------------------------------------------------------- | JDBC : jdbc:postgresql://127.0.0.14:5433/postgres | | YSQL Shell : build/latest/bin/ysqlsh -h 127.0.0.14 | | YCQL Shell : build/latest/bin/cqlsh 127.0.0.14 | | YEDIS Shell : build/latest/bin/redis-cli -h 127.0.0.14 | | data-dir[0] : /tmp/yb-datacenter-A/node-4/disk-1/yb-data | | yb-tserver Logs : /tmp/yb-datacenter-A/node-4/disk-1/yb-data/tserver/logs | ---------------------------------------------------------------------------------------------------- ++ for i in 4 5 6 ++ ./bin/yb-ctl add_node --data_dir /tmp/yb-datacenter-A Adding node. Waiting for cluster to be ready. ---------------------------------------------------------------------------------------------------- | Node 5: yb-tserver (pid 28914) | ---------------------------------------------------------------------------------------------------- | JDBC : jdbc:postgresql://127.0.0.15:5433/postgres | | YSQL Shell : build/latest/bin/ysqlsh -h 127.0.0.15 | | YCQL Shell : build/latest/bin/cqlsh 127.0.0.15 | | YEDIS Shell : build/latest/bin/redis-cli -h 127.0.0.15 | | data-dir[0] : /tmp/yb-datacenter-A/node-5/disk-1/yb-data | | yb-tserver Logs : /tmp/yb-datacenter-A/node-5/disk-1/yb-data/tserver/logs | ---------------------------------------------------------------------------------------------------- ++ for i in 4 5 6 ++ ./bin/yb-ctl add_node --data_dir /tmp/yb-datacenter-A Adding node. Waiting for cluster to be ready. ---------------------------------------------------------------------------------------------------- | Node 6: yb-tserver (pid 29256) | ---------------------------------------------------------------------------------------------------- | JDBC : jdbc:postgresql://127.0.0.16:5433/postgres | | YSQL Shell : build/latest/bin/ysqlsh -h 127.0.0.16 | | YCQL Shell : build/latest/bin/cqlsh 127.0.0.16 | | YEDIS Shell : build/latest/bin/redis-cli -h 127.0.0.16 | | data-dir[0] : /tmp/yb-datacenter-A/node-6/disk-1/yb-data | | yb-tserver Logs : /tmp/yb-datacenter-A/node-6/disk-1/yb-data/tserver/logs | ---------------------------------------------------------------------------------------------------- ++ ./bin/cqlsh 127.0.0.11 -e 'create keyspace k; create table k.t1(k int primary key); create table k.t2(k int primary key); create table k.t3(k int primary key);' ++ ./bin/cqlsh 127.0.0.22 -e 'create keyspace k; create table k.t1(k int primary key); create table k.t2(k int primary key); create table k.t3(k int primary key);' ++ ./bin/cqlsh 127.0.0.11 -e 'insert into k.t1(k) values (0); insert into k.t2(k) values(0); insert into k.t3(k) values(0); insert into k.t1(k) values (1); insert into k.t2(k) values(1); insert into k.t3(k) values(1); insert into k.t1(k) values (2); insert into k.t2(k) values(2); insert into k.t3(k) values(2); insert into k.t1(k) values (3); insert into k.t2(k) values(3); insert into k.t3(k) values(3); insert into k.t1(k) values (4); insert into k.t2(k) values(4); insert into k.t3(k) values(4); insert into k.t1(k) values (5); insert into k.t2(k) values(5); insert into k.t3(k) values(5); insert into k.t1(k) values (6); insert into k.t2(k) values(6); insert into k.t3(k) values(6); insert into k.t1(k) values (7); insert into k.t2(k) values(7); insert into k.t3(k) values(7); insert into k.t1(k) values (8); insert into k.t2(k) values(8); insert into k.t3(k) values(8); insert into k.t1(k) values (9); insert into k.t2(k) values(9); insert into k.t3(k) values(9); insert into k.t1(k) values (10); insert into k.t2(k) values(10); insert into k.t3(k) values(10); insert into k.t1(k) values (11); insert into k.t2(k) values(11); insert into k.t3(k) values(11); insert into k.t1(k) values (12); insert into k.t2(k) values(12); insert into k.t3(k) values(12); insert into k.t1(k) values (13); insert into k.t2(k) values(13); insert into k.t3(k) values(13); insert into k.t1(k) values (14); insert into k.t2(k) values(14); insert into k.t3(k) values(14); insert into k.t1(k) values (15); insert into k.t2(k) values(15); insert into k.t3(k) values(15); insert into k.t1(k) values (16); insert into k.t2(k) values(16); insert into k.t3(k) values(16); insert into k.t1(k) values (17); insert into k.t2(k) values(17); insert into k.t3(k) values(17); insert into k.t1(k) values (18); insert into k.t2(k) values(18); insert into k.t3(k) values(18); insert into k.t1(k) values (19); insert into k.t2(k) values(19); insert into k.t3(k) values(19);' +++ grep -v transactions +++ awk '{print $9}' +++ grep 'Successfully created table' /tmp/yb-datacenter-A/node-1/disk-1/yb-data/master/logs/yb-master.INFO /tmp/yb-datacenter-A/node-2/disk-1/yb-data/master/logs/yb-master.INFO /tmp/yb-datacenter-A/node-3/disk-1/yb-data/master/logs/yb-master.INFO +++ sed 's/\[id=//g' +++ sed 's/\]//g' +++ paste -s -d, - ++ TABLE_IDS=52e4a27cf4f84fb691e5889501ee792f,89653b0617884dd0ac619550990b85bb,a3c51eec1e444d869cf93d9261bb335e ++ echo 'TABLE IDS: 52e4a27cf4f84fb691e5889501ee792f,89653b0617884dd0ac619550990b85bb,a3c51eec1e444d869cf93d9261bb335e' TABLE IDS: 52e4a27cf4f84fb691e5889501ee792f,89653b0617884dd0ac619550990b85bb,a3c51eec1e444d869cf93d9261bb335e ++ sleep 5 +++ ./build/latest/bin/yb-admin -master_addresses 127.0.0.11:7100,127.0.0.12:7100,127.0.0.13:7100 bootstrap_cdc_producer 52e4a27cf4f84fb691e5889501ee792f,89653b0617884dd0ac619550990b85bb,a3c51eec1e444d869cf93d9261bb335e +++ tee /dev/tty +++ grep bootstrap +++ awk '{print $7}' +++ paste -s -d, - I0213 02:36:27.881943 30371 mem_tracker.cc:249] MemTracker: hard memory limit is 53.314545 GB I0213 02:36:27.882117 30371 mem_tracker.cc:251] MemTracker: soft memory limit is 45.317360 GB table id: 52e4a27cf4f84fb691e5889501ee792f, CDC bootstrap id: 25ffb2eb13ef4f1ebb91e0187f5f5849 table id: 89653b0617884dd0ac619550990b85bb, CDC bootstrap id: f6aa7e0412f449929ec6fdd5da497a31 table id: a3c51eec1e444d869cf93d9261bb335e, CDC bootstrap id: 2ab79073e56847b5af0f47e7db4cf676 ++ BOOTSTRAP_IDS=25ffb2eb13ef4f1ebb91e0187f5f5849,f6aa7e0412f449929ec6fdd5da497a31,2ab79073e56847b5af0f47e7db4cf676 ++ build/latest/bin/yb-admin -master_addresses 127.0.0.22:7100,127.0.0.23:7100,127.0.0.24:7100 setup_universe_replication cluster-A 127.0.0.11:7100,127.0.0.12:7100,127.0.0.13:7100 52e4a27cf4f84fb691e5889501ee792f,89653b0617884dd0ac619550990b85bb,a3c51eec1e444d869cf93d9261bb335e 25ffb2eb13ef4f1ebb91e0187f5f5849,f6aa7e0412f449929ec6fdd5da497a31,2ab79073e56847b5af0f47e7db4cf676 I0213 02:36:28.768317 30600 mem_tracker.cc:249] MemTracker: hard memory limit is 53.314545 GB I0213 02:36:28.768566 30600 mem_tracker.cc:251] MemTracker: soft memory limit is 45.317360 GB Replication setup successfully ++ ./bin/cqlsh 127.0.0.11 -e 'select * from system.cdc_state' tablet_id | stream_id | checkpoint | data | last_replication_time ----------------------------------+----------------------------------+------------+------+--------------------------------- c93dc3a34bcf44368b24a3650020baf7 | f6aa7e0412f449929ec6fdd5da497a31 | 2.3 | null | 2020-02-13 02:36:28.997000+0000 ad541fac5134413e815480a303a5125f | f6aa7e0412f449929ec6fdd5da497a31 | 1.3 | null | 2020-02-13 02:36:28.999000+0000 d225c06ffe3f4138863b7938992259e3 | f6aa7e0412f449929ec6fdd5da497a31 | 1.2 | null | 2020-02-13 02:36:29.000000+0000 5a2765db337a49b9922ec5167c0dae8c | f6aa7e0412f449929ec6fdd5da497a31 | 2.6 | null | null 463baa30e99c4b569836301c24a9ee12 | f6aa7e0412f449929ec6fdd5da497a31 | 2.9 | null | null 6b78cf035db840aa838a2c6f85583635 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.3 | null | 2020-02-13 02:36:28.998000+0000 b902734faacf461e96cc91e8d2398636 | f6aa7e0412f449929ec6fdd5da497a31 | 1.3 | null | 2020-02-13 02:36:28.997000+0000 b39b0c6df6994f76b91b0cd18e2e92b0 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.5 | null | null 3ae0bcdce184400caebdda88d55800b0 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.5 | null | null d8ad0a9084d64bbcbe8c8dccd1577d2f | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.4 | null | 2020-02-13 02:36:28.999000+0000 bb0e47dea2564eecb28d993850853967 | f6aa7e0412f449929ec6fdd5da497a31 | 1.4 | null | 2020-02-13 02:36:28.998000+0000 e11aa509c6124d218549e7db0499acfa | f6aa7e0412f449929ec6fdd5da497a31 | 1.4 | null | 2020-02-13 02:36:28.997000+0000 8b6e97b3a1254e3c849b8067e6d2ef97 | 2ab79073e56847b5af0f47e7db4cf676 | 1.3 | null | 2020-02-13 02:36:28.998000+0000 593fc4e9898b48e4990154ff7b31fef5 | 2ab79073e56847b5af0f47e7db4cf676 | 2.9 | null | 2020-02-13 02:36:28.997000+0000 cf58343d949e421f9698dd72fcb3ee27 | 2ab79073e56847b5af0f47e7db4cf676 | 1.2 | null | null 785800f83bff4820a9c81d682f430ffc | 2ab79073e56847b5af0f47e7db4cf676 | 2.5 | null | 2020-02-13 02:36:28.999000+0000 52f8c76c45954c00899865807f8d7d52 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.3 | null | 2020-02-13 02:36:28.998000+0000 7ba88aa78e80483cb4f840fbea7cacb6 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.2 | null | null 20a6941b594740d98feb35edc309a588 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.4 | null | 2020-02-13 02:36:28.999000+0000 f86bff2c10334e6f9288e68e89eda553 | 2ab79073e56847b5af0f47e7db4cf676 | 1.4 | null | 2020-02-13 02:36:28.998000+0000 d4560bd99a7e41ae9a931bc426cfc0f6 | 2ab79073e56847b5af0f47e7db4cf676 | 1.6 | null | 2020-02-13 02:36:28.997000+0000 288fab7eca2c487182bac1beb6d62b8c | 2ab79073e56847b5af0f47e7db4cf676 | 2.5 | null | 2020-02-13 02:36:28.999000+0000 d8ddbf22c3824beba49fd5dbedb08015 | f6aa7e0412f449929ec6fdd5da497a31 | 1.6 | null | 2020-02-13 02:36:28.998000+0000 978166e97af24abdafab52a34c204c71 | f6aa7e0412f449929ec6fdd5da497a31 | 1.3 | null | 2020-02-13 02:36:28.999000+0000 147c4fc97c13484684d687c0e1228552 | 2ab79073e56847b5af0f47e7db4cf676 | 1.3 | null | 2020-02-13 02:36:29.006000+0000 b5fd8b8ba9de46ff9c9d349330d07fde | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.3 | null | 2020-02-13 02:36:28.998000+0000 63bbfb36c3b94eb8bdf57afaeaf89b87 | 2ab79073e56847b5af0f47e7db4cf676 | 3.7 | null | 2020-02-13 02:36:28.997000+0000 4794ba14fa4c4709af73ce9b189daa0b | 2ab79073e56847b5af0f47e7db4cf676 | 1.3 | null | 2020-02-13 02:36:28.998000+0000 33f4eefa8a9a4277acb1ccd617b64110 | f6aa7e0412f449929ec6fdd5da497a31 | 1.4 | null | 2020-02-13 02:36:28.999000+0000 acdaeb97fab143d5854ed5e634ef86aa | 2ab79073e56847b5af0f47e7db4cf676 | 1.5 | null | 2020-02-13 02:36:28.998000+0000 c260c0feda6942039dc686352064bfc7 | f6aa7e0412f449929ec6fdd5da497a31 | 1.3 | null | 2020-02-13 02:36:28.998000+0000 bea787315d9a41eca07f395d2266a4e2 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.2 | null | 2020-02-13 02:36:28.999000+0000 05e85008263b4902a01736e39453e534 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.5 | null | 2020-02-13 02:36:29.000000+0000 e11744c2e3bc4f79b2782504dcfe9323 | 2ab79073e56847b5af0f47e7db4cf676 | 1.3 | null | null f3028b7fccda42b8b5a51fb94e7fd49d | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.6 | null | 2020-02-13 02:36:28.998000+0000 dfba7f06f4724171a75a76ebf63168c5 | 25ffb2eb13ef4f1ebb91e0187f5f5849 | 1.4 | null | 2020-02-13 02:36:28.998000+0000 (36 rows) ``` Reviewers: nicolas, rahuldesirazu, bogdan, neha Reviewed By: neha Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D7712
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We need a way for the producer or the consumer to know what is a safe checkpoint to start replication from once a backup has been restored in the consumer. In other words, once the backup has been restored:
This will be achieved by introducing a new step which allows the user to create a checkpoint of the most recent op ids for all the tablets of a specific table. We call this step
bootstrapping
the producer. In practice this will be done by creating a newyb-admin
commandbootstrap_cdc_producer
. When this command is executed:cdc_state
table if neededtablet_id, stream_id, <latest op id for that tablet>
,To make log retention work based on the op id, the flag
enable_log_retention_by_op_idx
needs to be set to true. This is necessary so that the user has enough time to make a backup, restore the backup on the producer, and start replication.After the producer is bootstrapped, a backup of the producer is taken. This backup is guaranteed to have all the entries with op ids created during the bootstrap process.
Then, the backup is restored on the consumer. Finally, unidirectional replication is started on the consumer (producer -> consumer) and the bootstrap ids are specified for each table. This will allow the producer to reuse the
cdc_state
entries created during the bootstrap step. When the first GetChanges request for a tablet is sent, it doesn't contain an op_id to start from. In this case, the producer will use the op id inserted during the bootstrap step.In summary, these will be the steps:
enable_log_retention_by_op_idx
to true in all the tserversyb-admin
commandbootstrap_cdc_producer
The text was updated successfully, but these errors were encountered: