Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encryption-at-rest cluster expand test: Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4 #2462

Closed
kmuthukk opened this issue Sep 28, 2019 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority

Comments

@kmuthukk
Copy link
Collaborator

  1. Created a 3-node YugabyteDB universe with encryption at rest enabled.

  2. Loaded a bunch of data. Txn logs and SSTable files (storage files) all were encryted as expected.

  3. Expanded universe from 3 to 6 nodes while workload was still running. Most tablets rebalanced to the new nodes.. but the balancing seemed to get stuck at some point.

Upon inspection, the cluster balance as seen from yb-master leader logs was waiting on this:

W0928 05:07:36.013736 24768 cluster_balance.cc:227] Skipping add replicas for 260b2de60d1b4b93b1ad7dbab31a1190: Operation failed. Try again. (yb/master/cluster_balance.cc:496): Cannot add replicas. Currently have a total overreplication of 1, when max allowed is 1
W0928 05:07:36.013907 24768 cluster_balance.cc:227] Skipping add replicas for c412bfaf16b543c89da9c5898b2adf70: Operation failed. Try again. (yb/master/cluster_balance.cc:489): Cannot add replicas. Currently remote bootstrapping 4 tablets, when our max allowed is 2

But real cause seems of error seems to be this error message in yb-tserver logs on one of the new nodes:

I0928 05:18:59.487653 24528 tablet.cc:556] Opening RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489454 24528 db_impl.cc:401] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [R]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489470 24528 db_impl.cc:439] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [R]: Pending 0 compactions and 0 flushes
E0928 05:18:59.489503 24528 tablet.cc:560] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Failed to open a RocksDB database in directory /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34: Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
I0928 05:18:59.489540 24528 tablet_bootstrap.cc:420] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Time spent opening tablet: real 0.003s      user 0.000s     sys 0.001s
E0928 05:18:59.489579 24528 ts_tablet_manager.cc:1114] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Tablet failed to bootstrap: Illegal state (yb/tablet/tablet.cc:565): Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
I0928 05:18:59.489596 24528 tablet_peer.cc:974] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [state=FAILED]: Changed state from BOOTSTRAPPING to FAILED
I0928 05:18:59.489605 24528 ts_tablet_manager.cc:1086] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Time spent bootstrapping tablet: real 0.003s      user 0.000s     sys 0.001s
I0928 05:18:59.489614 24528 tablet_peer.cc:335] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [state=FAILED]: Initiating TabletPeer shutdown
I0928 05:18:59.489619 24528 tablet_peer.cc:349] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9 [state=QUIESCING]: Started shutdown from state: FAILED
W0928 05:18:59.489629 24528 ts_tablet_manager.cc:1869] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Remote bootstrap: OpenTablet() failed: Illegal state (yb/tablet/tablet.cc:565): Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
I0928 05:18:59.489637 24528 ts_tablet_manager.cc:1872] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Tombstoning tablet after failed remote bootstrap
I0928 05:18:59.489642 24528 ts_tablet_manager.cc:1830] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Deleting tablet data with delete state TABLET_DATA_TOMBSTONED
I0928 05:18:59.489656 24528 tablet_metadata.cc:385] Destroying regular db at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489832 24528 tablet_metadata.cc:391] Successfully destroyed regular DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.489981 24528 tablet_metadata.cc:402] Successfully destroyed provisional records DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34.intents
I0928 05:18:59.494750 24528 ts_tablet_manager.cc:1840] T 356fd22e59574e6f85c6773ab681aa34 P 4318548741284f6cabb29fa70b07c2a9: Tablet deleted. Last logged OpId: { term: 0 index: 0 }
I0928 05:18:59.494773 24528 log.cc:1006] T 356fd22e59574e6f85c6773ab681aa34P 4318548741284f6cabb29fa70b07c2a9: Deleting WAL dir /mnt/d0/yb-data/tserver/wals/table-c412bfaf16b543c89da9c5898b2adf70/tablet-356fd22e59574e6f85c6773ab681aa34
I0928 05:18:59.494849 24528 ts_tablet_manager.cc:1913] Deleted transition in progress remote bootstrapping tablet from peer 97b3b1e7151f4d62aa1d8e67f15e64ab for tablet 356fd22e59574e6f85c6773ab681aa34
W0928 05:18:59.494864 24528 tablet_service.cc:1751] Start remote bootstrap failed: Illegal state (yb/tablet/tablet.cc:565): Invalid argument (yb/tserver/header_manager_impl.cc:135): Error parsing field universe key id: expect 4 bytes found 4
@kmuthukk kmuthukk added area/docdb YugabyteDB core features priority/high High Priority labels Sep 28, 2019
@kmuthukk kmuthukk added the kind/bug This issue is a bug label Sep 28, 2019
rahuldesirazu added a commit that referenced this issue Oct 2, 2019
Summary:
In encrypted clusters, we use two envs, an encrypted and checkpoint env. The encrypted env decrypts on read and encrypts on write, while the checkpoint env treats the files as plaintext and is used for checkpointing files.

As an example, let's say we have a 100 byte MANIFEST file with 50 bytes encryption header and 50 bytes content. The `plaintext_env_size(file) = 100`, while `encrypted_env_size(file) = 50`.  When we add a file to the MANIFEST, we update the version set with the newly calculated `plaintext_env_size(file)`. However, on initial MANIFEST creation, we use the `encrypted_file_size(file)`. The checkpoint uses the locked version set to get the MANIFEST file size. So this means that if there are no newly added files to the MANIFEST, the checkpoint will cut off the MANIFEST, since it thinks the file size 50, when it is actually 100.

The fix is to use the checkpoint_env to get the file size on both creation and when adding new files.

Test Plan: Two new integration tests to test adding a server on an empty table and enabling encryption in a plaintext cluster.

Reviewers: sergei, mikhail, hector

Reviewed By: hector

Subscribers: kannan, ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7320
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority
Projects
None yet
Development

No branches or pull requests

2 participants