You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Created a 3-node YugabyteDB universe with encryption at rest enabled.
Loaded a bunch of data. Txn logs and SSTable files (storage files) all were encryted as expected.
Expanded universe from 3 to 6 nodes while workload was still running. Most tablets rebalanced to the new nodes.. but the balancing seemed to get stuck at some point.
Upon inspection, the cluster balance as seen from yb-master leader logs was waiting on this:
W0928 05:07:36.013736 24768 cluster_balance.cc:227] Skipping add replicas for 260b2de60d1b4b93b1ad7dbab31a1190: Operation failed. Try again. (yb/master/cluster_balance.cc:496): Cannot add replicas. Currently have a total overreplication of 1, when max allowed is 1
W0928 05:07:36.013907 24768 cluster_balance.cc:227] Skipping add replicas for c412bfaf16b543c89da9c5898b2adf70: Operation failed. Try again. (yb/master/cluster_balance.cc:489): Cannot add replicas. Currently remote bootstrapping 4 tablets, when our max allowed is 2
But real cause seems of error seems to be this error message in yb-tserver logs on one of the new nodes:
I0928 05:18:59.487653 24528 tablet.cc:556] Opening RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489454 24528 db_impl.cc:401] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [R]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489470 24528 db_impl.cc:439] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [R]: Pending 0 compactions and 0 flushes
E0928 05:18:59.489503 24528 tablet.cc:560] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Failed to open a RocksDB database in directory /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34: Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
I0928 05:18:59.489540 24528 tablet_bootstrap.cc:420] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Time spent opening tablet: real 0.003s user 0.000s sys 0.001s
E0928 05:18:59.489579 24528 ts_tablet_manager.cc:1114] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Tablet failed to bootstrap: Illegal state (yb/tablet/tablet.cc:565): Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
I0928 05:18:59.489596 24528 tablet_peer.cc:974] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [state=FAILED]: Changed state from BOOTSTRAPPING to FAILED
I0928 05:18:59.489605 24528 ts_tablet_manager.cc:1086] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Time spent bootstrapping tablet: real 0.003s user 0.000s sys 0.001s
I0928 05:18:59.489614 24528 tablet_peer.cc:335] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [state=FAILED]: Initiating TabletPeer shutdown
I0928 05:18:59.489619 24528 tablet_peer.cc:349] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [state=QUIESCING]: Started shutdown from state: FAILED
W0928 05:18:59.489629 24528 ts_tablet_manager.cc:1869] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Remote bootstrap: OpenTablet() failed: Illegal state (yb/tablet/tablet.cc:565): Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
I0928 05:18:59.489637 24528 ts_tablet_manager.cc:1872] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Tombstoning tablet after failed remote bootstrap
I0928 05:18:59.489642 24528 ts_tablet_manager.cc:1830] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Deleting tablet data with delete state TABLET_DATA_TOMBSTONED
I0928 05:18:59.489656 24528 tablet_metadata.cc:385] Destroying regular db at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489832 24528 tablet_metadata.cc:391] Successfully destroyed regular DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489981 24528 tablet_metadata.cc:402] Successfully destroyed provisional records DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34.intents
I0928 05:18:59.494750 24528 ts_tablet_manager.cc:1840] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Tablet deleted. Last logged OpId: { term: 0 index: 0 }
I0928 05:18:59.494773 24528 log.cc:1006] T 356fd22e59574e6f85c6773ab681aa34P 4318548741284f6cabb29fa70b07c2a9: Deleting WAL dir /mnt/d0/yb-data/tserver/wals/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.494849 24528 ts_tablet_manager.cc:1913] Deleted transition in progress remote bootstrapping tablet from peer 97b3b1e7151f4d62aa1d8e67f15e64ab for tablet 356fd22e59574e6f85c6773ab681aa34
W0928 05:18:59.494864 24528 tablet_service.cc:1751] Start remote bootstrap failed: Illegal state (yb/tablet/tablet.cc:565): Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
The text was updated successfully, but these errors were encountered:
Summary:
In encrypted clusters, we use two envs, an encrypted and checkpoint env. The encrypted env decrypts on read and encrypts on write, while the checkpoint env treats the files as plaintext and is used for checkpointing files.
As an example, let's say we have a 100 byte MANIFEST file with 50 bytes encryption header and 50 bytes content. The `plaintext_env_size(file) = 100`, while `encrypted_env_size(file) = 50`. When we add a file to the MANIFEST, we update the version set with the newly calculated `plaintext_env_size(file)`. However, on initial MANIFEST creation, we use the `encrypted_file_size(file)`. The checkpoint uses the locked version set to get the MANIFEST file size. So this means that if there are no newly added files to the MANIFEST, the checkpoint will cut off the MANIFEST, since it thinks the file size 50, when it is actually 100.
The fix is to use the checkpoint_env to get the file size on both creation and when adding new files.
Test Plan: Two new integration tests to test adding a server on an empty table and enabling encryption in a plaintext cluster.
Reviewers: sergei, mikhail, hector
Reviewed By: hector
Subscribers: kannan, ybase, bogdan
Differential Revision: https://phabricator.dev.yugabyte.com/D7320
Created a 3-node YugabyteDB universe with encryption at rest enabled.
Loaded a bunch of data. Txn logs and SSTable files (storage files) all were encryted as expected.
Expanded universe from 3 to 6 nodes while workload was still running. Most tablets rebalanced to the new nodes.. but the balancing seemed to get stuck at some point.
Upon inspection, the cluster balance as seen from yb-master leader logs was waiting on this:
But real cause seems of error seems to be this error message in yb-tserver logs on one of the new nodes:
The text was updated successfully, but these errors were encountered: