-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional block checksum mismatch when trying to read an encrypted file #3707
Labels
area/docdb
YugabyteDB core features
Comments
mbautin
added a commit
that referenced
this issue
Mar 5, 2020
Summary: This is a workaround for the encrypted file corruption issue described at #3707. When reading a block from an encrypted SSTable and getting a checksum mismatch, we now try to increment byte 11 of the initialization vector (carrying over into earlier bytes when encountering 0xff) and decrypt and verify checksum once again. Similarly, when reading an SSTable footer, we use magic number comparison instead of checksum verification. This workaround is turned on by default and is controlled using the new `--encryption_counter_overflow_read_path_workaround` flag. This diff is not fixing the underlying ignored overflow issue yet. While it would be easy to do so, the old read path would not be able to read files written with the new write path. Instead, we will deploy this workaround, disable encryption, perform major compactions on all data, and then deploy the real fix and re-enable encryption. Also we restrict the range of the unsigned 32-bit randomly-generated initial counter value to [0, 0x7fffffff] by default. This reduces the effective key size by 1 bit but eliminates the overflow issue for all files up to 32 GiB in size. This range could be customized using the new flags: `--encryption_counter_min` and `--encryption_counter_max` (both of these bounds are inclusive). Also avoid storing a reference to a shared pointer in TableReaderOptions. This fixes an ASAN issue in the new test. Test Plan: Jenkins New test, encrypted_sstable-test, that fails without the fix and passes with the fix. Reviewers: rahuldesirazu, kannan, sergei, bogdan Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8020
rahuldesirazu
pushed a commit
that referenced
this issue
Mar 25, 2020
Summary: This is a workaround for the encrypted file corruption issue described at #3707. When reading a block from an encrypted SSTable and getting a checksum mismatch, we now try to increment byte 11 of the initialization vector (carrying over into earlier bytes when encountering 0xff) and decrypt and verify checksum once again. Similarly, when reading an SSTable footer, we use magic number comparison instead of checksum verification. This workaround is turned on by default and is controlled using the new `--encryption_counter_overflow_read_path_workaround` flag. This diff is not fixing the underlying ignored overflow issue yet. While it would be easy to do so, the old read path would not be able to read files written with the new write path. Instead, we will deploy this workaround, disable encryption, perform major compactions on all data, and then deploy the real fix and re-enable encryption. Also we restrict the range of the unsigned 32-bit randomly-generated initial counter value to [0, 0x7fffffff] by default. This reduces the effective key size by 1 bit but eliminates the overflow issue for all files up to 32 GiB in size. This range could be customized using the new flags: `--encryption_counter_min` and `--encryption_counter_max` (both of these bounds are inclusive). Also avoid storing a reference to a shared pointer in TableReaderOptions. This fixes an ASAN issue in the new test. Test Plan: Jenkins New test, encrypted_sstable-test, that fails without the fix and passes with the fix. Reviewers: rahuldesirazu, kannan, sergei, bogdan Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8020
rahuldesirazu
added a commit
that referenced
this issue
Mar 25, 2020
Summary: Support the following yb-admin commands for encryption at rest: ``` add_universe_key_to_all_masters(String keyid, String key_path) all_masters_have_universe_key_in_memory(String keyId) rotate_universe_key_in_memory(String keyId) disable_encryption_in_memory() ``` [#3455] Modify ldb Tool to be Encryption Aware Add `--key_file` option to ldb to pass in a `<key_id>:</path/to/key/file>` and an option `--only_verify_checksums` flag to `scan` to not print out entries and only do checksum validation. Also provide a yb-admin `write_universe_key_to_file` command to retrieve the universe key and write it to a local file. [#3454] Make OpenSSL usage thread-safe https://www.openssl.org/docs/man1.0.2/man3/CRYPTO_set_locking_callback.html Set a locking and thread id callback to enable thread-safe usage for openssl. Move the openssl init process to a once per-process initialization in RpcServerBase. [#3707] A workaround for encryption counter overflow This is a workaround for the encrypted file corruption issue described at #3707. When reading a block from an encrypted SSTable and getting a checksum mismatch, we now try to increment byte 11 of the initialization vector (carrying over into earlier bytes when encountering 0xff) and decrypt and verify checksum once again. Similarly, when reading an SSTable footer, we use magic number comparison instead of checksum verification. This workaround is turned on by default and is controlled using the new `--encryption_counter_overflow_read_path_workaround` flag. This diff is not fixing the underlying ignored overflow issue yet. While it would be easy to do so, the old read path would not be able to read files written with the new write path. Instead, we will deploy this workaround, disable encryption, perform major compactions on all data, and then deploy the real fix and re-enable encryption. Also we restrict the range of the unsigned 32-bit randomly-generated initial counter value to [0, 0x7fffffff] by default. This reduces the effective key size by 1 bit but eliminates the overflow issue for all files up to 32 GiB in size. This range could be customized using the new flags: `--encryption_counter_min` and `--encryption_counter_max` (both of these bounds are inclusive). Also avoid storing a reference to a shared pointer in TableReaderOptions. This fixes an ASAN issue in the new test. #3974 Enable checksum verification for meta blocks of encrypted files Enable checksum verification for meta blocks of encrypted files. When I tried to enable checksum verification for meta blocks unconditionally, a lot of RocksDB unit tests fail, so I am leaving checksum verification for meta blocks of unencrypted files for a future diff. Also enable an encrypted SSTable test that is fixed by this change and fails without this change. [#3976] Fix encryption format for newly created files For newly created encrypted files, overflow the counter into the rest of the initialization vector to match OpenSSL behavior. To do this add a proto field to encrypted files to indicate whether it is of the old or new format. For old block-based files, do the checksum retry read path (first try non-overflow, then overflow key). For new block-based files, use the old read path that only tries to decrypt using overflowed keys. Test Plan: manual testing with yb-admin Jenkins: build platform: mac Jenkins: compile only ybd --cxx-test ctr_cipher_stream-test --gtest_filter TestCipherStream.ConcurrentEncryption Jenkins New test, encrypted_sstable-test, that fails without the fix and passes with the fix. Jenkins Unit tests at the stream level with counter set right before overflow. Integration cluster tests that test series of flushes and compactions. Subscribers: kannan, bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8183
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In some of our encrypted cluster tests we encountered the following error on the write path. It turned out it was really an error on the compaction read path that was stored in RocksDB's background operation status variable and surfaced during the next write attempt.
When analyzing the corrupted file using @rahuldesirazu's encryption-aware version of the ldb tool, it turned out that corrupted blocks occupied a contiguous segment of the corrupted metadata sstable file (offsets 9660259 to 10352680, 692421 bytes). 59 blocks out of total of 1332 were corrupted in that file (approximately 14 MB total file size). Examining the hex dump of good vs. bad blocks, the bad blocks looked like they were decrypted with the wrong key. Every RocksDB block is suffixed with a compression type byte (1 for snappy) and 4 bytes of a CRC32 checksum, and in addition to the checksum being wrong, the compression type byte was also invalid for these blocks.
Then, when analyzing the exact OpenSSL codepath we were using, the following aspect came up:
(from
ctr128.c
).However, in our own code, when adding the block index to the initial value of the counter for the file, and writing that to the last 4 bytes of the initialization vector, we ignore the overflow. This explains the decryption issue: if data is encrypted in a large block and written to a file (e.g. using buffering in
WritableFileWriter
) where the starting offset is small enough that the counter does not overflow yet at that offset (and it overflows internally in OpenSSL code, incrementing byte 11 of the initialization vector) and a portion of that data, with the starting offset after the overflow offset is later loaded and decrypted, then we are effectively using a different initialization vector for decryption compared to what was used for encryption.The text was updated successfully, but these errors were encountered: