Occasional block checksum mismatch when trying to read an encrypted file #3707

mbautin · 2020-02-22T06:26:57Z

In some of our encrypted cluster tests we encountered the following error on the write path. It turned out it was really an error on the compaction read path that was stored in RocksDB's background operation status variable and surfaced during the next write attempt.

F20200213 01:48:56 ../../../../../src/yb/tablet/tablet.cc:953] T fedf3ed2d2e6471d86bbdea04d9e832b P 110bfdd115f24969b9e3338cf3de50e1: Failed to write a batch with 1000 operations into RocksDB: Corruption (yb/rocksdb/table/format.cc:321): block checksum mismatch in file: /mnt/d0/yb-data/tserver/data/rocksdb/table-11fd1f27765b46658112ede0cd881ba4/tablet-fedf3ed2d2e6471d86bbdea04d9e832b/252991.sst, block handle: BlockHandle { offset: 9660259 size: 11743 }
   @     0x7f25d7b1856c  yb::LogFatalHandlerSink::send()
   @     0x7f25d6d0d336  google::LogMessage::SendToLog()
   @     0x7f25d6d0a79a  google::LogMessage::Flush()
   @     0x7f25d6d0d869  google::LogMessageFatal::~LogMessageFatal()
   @     0x7f25dee8b881  yb::tablet::Tablet::WriteToRocksDB()
   @     0x7f25dee930c2  yb::tablet::Tablet::ApplyKeyValueRowOperations()
   @     0x7f25dee932dc  yb::tablet::Tablet::ApplyRowOperations()
   @     0x7f25deef9a63  yb::tablet::WriteOperation::DoReplicated()
   @     0x7f25deeeb88c  yb::tablet::Operation::Replicated()
   @     0x7f25deef1c6b  yb::tablet::OperationDriver::ApplyTask()
   @     0x7f25deef200d  yb::tablet::OperationDriver::ApplyOperation()
   @     0x7f25deef2a14  yb::tablet::OperationDriver::ReplicationFinished()
   @     0x7f25deb3cce9  yb::consensus::ConsensusRound::NotifyReplicationFinished()
   @     0x7f25deb91568  yb::consensus::ReplicaState::NotifyReplicationFinishedUnlocked()
   @     0x7f25deb96fae  yb::consensus::ReplicaState::ApplyPendingOperationsUnlocked()
   @     0x7f25deb97a9f  yb::consensus::ReplicaState::AdvanceCommittedOpIdUnlocked()
   @     0x7f25deb7e5f3  yb::consensus::RaftConsensus::EarlyCommitUnlocked()
   @     0x7f25deb84b9f  yb::consensus::RaftConsensus::UpdateReplica()
   @     0x7f25deb85ab3  yb::consensus::RaftConsensus::Update()
   @     0x7f25df73ecf1  yb::tserver::ConsensusServiceImpl::UpdateConsensus()
   @     0x7f25dd33a6aa  yb::consensus::ConsensusServiceIf::Handle()
   @     0x7f25d95d0c19  yb::rpc::ServicePoolImpl::Handle()
   @     0x7f25d95748f4  yb::rpc::InboundCall::InboundCallTask::Run()
   @     0x7f25d95dc868  yb::rpc::(anonymous namespace)::Worker::Execute()
   @     0x7f25d7ba5e0f  yb::Thread::SuperviseThread()
   @     0x7f25d2664694  start_thread
   @     0x7f25d1da141d  __clone
   @              (nil)  (unknown)

When analyzing the corrupted file using @rahuldesirazu's encryption-aware version of the ldb tool, it turned out that corrupted blocks occupied a contiguous segment of the corrupted metadata sstable file (offsets 9660259 to 10352680, 692421 bytes). 59 blocks out of total of 1332 were corrupted in that file (approximately 14 MB total file size). Examining the hex dump of good vs. bad blocks, the bad blocks looked like they were decrypted with the wrong key. Every RocksDB block is suffixed with a compression type byte (1 for snappy) and 4 bytes of a CRC32 checksum, and in addition to the checksum being wrong, the compression type byte was also invalid for these blocks.

Then, when analyzing the exact OpenSSL codepath we were using, the following aspect came up:

/*
 * The input encrypted as though 128bit counter mode is being used.  The
 * extra state information to record how much of the 128bit block we have
 * used is contained in *num, and the encrypted counter is kept in
 * ecount_buf.  Both *num and ecount_buf must be initialised with zeros
 * before the first call to CRYPTO_ctr128_encrypt(). This algorithm assumes
 * that the counter is in the x lower bits of the IV (ivec), and that the
 * application has full control over overflow and the rest of the IV.  This
 * implementation takes NO responsability for checking that the counter
 * doesn't overflow into the rest of the IV when incremented.
 */

(from ctr128.c).

However, in our own code, when adding the block index to the initial value of the counter for the file, and writing that to the last 4 bytes of the initialization vector, we ignore the overflow. This explains the decryption issue: if data is encrypted in a large block and written to a file (e.g. using buffering in WritableFileWriter) where the starting offset is small enough that the counter does not overflow yet at that offset (and it overflows internally in OpenSSL code, incrementing byte 11 of the initialization vector) and a portion of that data, with the starting offset after the overflow offset is later loaded and decrypted, then we are effectively using a different initialization vector for decryption compared to what was used for encryption.

  // Set the last 4 bytes of the iv based on counter + block_index.
  uint8_t iv[EncryptionParams::kBlockSize];
  memcpy(iv, encryption_params_->nonce, EncryptionParams::kBlockSize - 4);
  uint64_t start_index = encryption_params_->counter + block_index;
  BigEndian::Store32(iv + EncryptionParams::kBlockSize - 4, start_index);

The text was updated successfully, but these errors were encountered:

Summary: This is a workaround for the encrypted file corruption issue described at #3707. When reading a block from an encrypted SSTable and getting a checksum mismatch, we now try to increment byte 11 of the initialization vector (carrying over into earlier bytes when encountering 0xff) and decrypt and verify checksum once again. Similarly, when reading an SSTable footer, we use magic number comparison instead of checksum verification. This workaround is turned on by default and is controlled using the new `--encryption_counter_overflow_read_path_workaround` flag. This diff is not fixing the underlying ignored overflow issue yet. While it would be easy to do so, the old read path would not be able to read files written with the new write path. Instead, we will deploy this workaround, disable encryption, perform major compactions on all data, and then deploy the real fix and re-enable encryption. Also we restrict the range of the unsigned 32-bit randomly-generated initial counter value to [0, 0x7fffffff] by default. This reduces the effective key size by 1 bit but eliminates the overflow issue for all files up to 32 GiB in size. This range could be customized using the new flags: `--encryption_counter_min` and `--encryption_counter_max` (both of these bounds are inclusive). Also avoid storing a reference to a shared pointer in TableReaderOptions. This fixes an ASAN issue in the new test. Test Plan: Jenkins New test, encrypted_sstable-test, that fails without the fix and passes with the fix. Reviewers: rahuldesirazu, kannan, sergei, bogdan Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8020

Summary: Support the following yb-admin commands for encryption at rest: ``` add_universe_key_to_all_masters(String keyid, String key_path) all_masters_have_universe_key_in_memory(String keyId) rotate_universe_key_in_memory(String keyId) disable_encryption_in_memory() ``` [#3455] Modify ldb Tool to be Encryption Aware Add `--key_file` option to ldb to pass in a `<key_id>:</path/to/key/file>` and an option `--only_verify_checksums` flag to `scan` to not print out entries and only do checksum validation. Also provide a yb-admin `write_universe_key_to_file` command to retrieve the universe key and write it to a local file. [#3454] Make OpenSSL usage thread-safe https://www.openssl.org/docs/man1.0.2/man3/CRYPTO_set_locking_callback.html Set a locking and thread id callback to enable thread-safe usage for openssl. Move the openssl init process to a once per-process initialization in RpcServerBase. [#3707] A workaround for encryption counter overflow This is a workaround for the encrypted file corruption issue described at #3707. When reading a block from an encrypted SSTable and getting a checksum mismatch, we now try to increment byte 11 of the initialization vector (carrying over into earlier bytes when encountering 0xff) and decrypt and verify checksum once again. Similarly, when reading an SSTable footer, we use magic number comparison instead of checksum verification. This workaround is turned on by default and is controlled using the new `--encryption_counter_overflow_read_path_workaround` flag. This diff is not fixing the underlying ignored overflow issue yet. While it would be easy to do so, the old read path would not be able to read files written with the new write path. Instead, we will deploy this workaround, disable encryption, perform major compactions on all data, and then deploy the real fix and re-enable encryption. Also we restrict the range of the unsigned 32-bit randomly-generated initial counter value to [0, 0x7fffffff] by default. This reduces the effective key size by 1 bit but eliminates the overflow issue for all files up to 32 GiB in size. This range could be customized using the new flags: `--encryption_counter_min` and `--encryption_counter_max` (both of these bounds are inclusive). Also avoid storing a reference to a shared pointer in TableReaderOptions. This fixes an ASAN issue in the new test. #3974 Enable checksum verification for meta blocks of encrypted files Enable checksum verification for meta blocks of encrypted files. When I tried to enable checksum verification for meta blocks unconditionally, a lot of RocksDB unit tests fail, so I am leaving checksum verification for meta blocks of unencrypted files for a future diff. Also enable an encrypted SSTable test that is fixed by this change and fails without this change. [#3976] Fix encryption format for newly created files For newly created encrypted files, overflow the counter into the rest of the initialization vector to match OpenSSL behavior. To do this add a proto field to encrypted files to indicate whether it is of the old or new format. For old block-based files, do the checksum retry read path (first try non-overflow, then overflow key). For new block-based files, use the old read path that only tries to decrypt using overflowed keys. Test Plan: manual testing with yb-admin Jenkins: build platform: mac Jenkins: compile only ybd --cxx-test ctr_cipher_stream-test --gtest_filter TestCipherStream.ConcurrentEncryption Jenkins New test, encrypted_sstable-test, that fails without the fix and passes with the fix. Jenkins Unit tests at the stream level with counter set right before overflow. Integration cluster tests that test series of flushes and compactions. Subscribers: kannan, bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8183

mbautin self-assigned this Feb 22, 2020

mbautin mentioned this issue Mar 16, 2020

Checksums are not being verified in meta_blocks.cc #3974

Closed

bmatican added the area/docdb YugabyteDB core features label Jun 24, 2020

bmatican closed this as completed Jun 24, 2020

def- mentioned this issue Sep 19, 2022

[DocDB] FATAL: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/rocksdb/table/format.cc:350): Block checksum mismatch in file #14064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasional block checksum mismatch when trying to read an encrypted file #3707

Occasional block checksum mismatch when trying to read an encrypted file #3707

mbautin commented Feb 22, 2020

Occasional block checksum mismatch when trying to read an encrypted file #3707

Occasional block checksum mismatch when trying to read an encrypted file #3707

Comments

mbautin commented Feb 22, 2020