Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional block checksum mismatch when trying to read an encrypted file #3707

Closed
mbautin opened this issue Feb 22, 2020 · 0 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features

Comments

@mbautin
Copy link
Contributor

mbautin commented Feb 22, 2020

In some of our encrypted cluster tests we encountered the following error on the write path. It turned out it was really an error on the compaction read path that was stored in RocksDB's background operation status variable and surfaced during the next write attempt.

F20200213 01:48:56 ../../../../../src/yb/tablet/tablet.cc:953] T fedf3ed2d2e6471d86bbdea04d9e832b P 110bfdd115f24969b9e3338cf3de50e1: Failed to write a batch with 1000 operations into RocksDB: Corruption (yb/rocksdb/table/format.cc:321): block checksum mismatch in file: /mnt/d0/yb-data/tserver/data/rocksdb/table-11fd1f27765b46658112ede0cd881ba4/tablet-fedf3ed2d2e6471d86bbdea04d9e832b/252991.sst, block handle: BlockHandle { offset: 9660259 size: 11743 }
   @     0x7f25d7b1856c  yb::LogFatalHandlerSink::send()
   @     0x7f25d6d0d336  google::LogMessage::SendToLog()
   @     0x7f25d6d0a79a  google::LogMessage::Flush()
   @     0x7f25d6d0d869  google::LogMessageFatal::~LogMessageFatal()
   @     0x7f25dee8b881  yb::tablet::Tablet::WriteToRocksDB()
   @     0x7f25dee930c2  yb::tablet::Tablet::ApplyKeyValueRowOperations()
   @     0x7f25dee932dc  yb::tablet::Tablet::ApplyRowOperations()
   @     0x7f25deef9a63  yb::tablet::WriteOperation::DoReplicated()
   @     0x7f25deeeb88c  yb::tablet::Operation::Replicated()
   @     0x7f25deef1c6b  yb::tablet::OperationDriver::ApplyTask()
   @     0x7f25deef200d  yb::tablet::OperationDriver::ApplyOperation()
   @     0x7f25deef2a14  yb::tablet::OperationDriver::ReplicationFinished()
   @     0x7f25deb3cce9  yb::consensus::ConsensusRound::NotifyReplicationFinished()
   @     0x7f25deb91568  yb::consensus::ReplicaState::NotifyReplicationFinishedUnlocked()
   @     0x7f25deb96fae  yb::consensus::ReplicaState::ApplyPendingOperationsUnlocked()
   @     0x7f25deb97a9f  yb::consensus::ReplicaState::AdvanceCommittedOpIdUnlocked()
   @     0x7f25deb7e5f3  yb::consensus::RaftConsensus::EarlyCommitUnlocked()
   @     0x7f25deb84b9f  yb::consensus::RaftConsensus::UpdateReplica()
   @     0x7f25deb85ab3  yb::consensus::RaftConsensus::Update()
   @     0x7f25df73ecf1  yb::tserver::ConsensusServiceImpl::UpdateConsensus()
   @     0x7f25dd33a6aa  yb::consensus::ConsensusServiceIf::Handle()
   @     0x7f25d95d0c19  yb::rpc::ServicePoolImpl::Handle()
   @     0x7f25d95748f4  yb::rpc::InboundCall::InboundCallTask::Run()
   @     0x7f25d95dc868  yb::rpc::(anonymous namespace)::Worker::Execute()
   @     0x7f25d7ba5e0f  yb::Thread::SuperviseThread()
   @     0x7f25d2664694  start_thread
   @     0x7f25d1da141d  __clone
   @              (nil)  (unknown)

When analyzing the corrupted file using @rahuldesirazu's encryption-aware version of the ldb tool, it turned out that corrupted blocks occupied a contiguous segment of the corrupted metadata sstable file (offsets 9660259 to 10352680, 692421 bytes). 59 blocks out of total of 1332 were corrupted in that file (approximately 14 MB total file size). Examining the hex dump of good vs. bad blocks, the bad blocks looked like they were decrypted with the wrong key. Every RocksDB block is suffixed with a compression type byte (1 for snappy) and 4 bytes of a CRC32 checksum, and in addition to the checksum being wrong, the compression type byte was also invalid for these blocks.

Then, when analyzing the exact OpenSSL codepath we were using, the following aspect came up:

/*
 * The input encrypted as though 128bit counter mode is being used.  The
 * extra state information to record how much of the 128bit block we have
 * used is contained in *num, and the encrypted counter is kept in
 * ecount_buf.  Both *num and ecount_buf must be initialised with zeros
 * before the first call to CRYPTO_ctr128_encrypt(). This algorithm assumes
 * that the counter is in the x lower bits of the IV (ivec), and that the
 * application has full control over overflow and the rest of the IV.  This
 * implementation takes NO responsability for checking that the counter
 * doesn't overflow into the rest of the IV when incremented.
 */

(from ctr128.c).

However, in our own code, when adding the block index to the initial value of the counter for the file, and writing that to the last 4 bytes of the initialization vector, we ignore the overflow. This explains the decryption issue: if data is encrypted in a large block and written to a file (e.g. using buffering in WritableFileWriter) where the starting offset is small enough that the counter does not overflow yet at that offset (and it overflows internally in OpenSSL code, incrementing byte 11 of the initialization vector) and a portion of that data, with the starting offset after the overflow offset is later loaded and decrypted, then we are effectively using a different initialization vector for decryption compared to what was used for encryption.

  // Set the last 4 bytes of the iv based on counter + block_index.
  uint8_t iv[EncryptionParams::kBlockSize];
  memcpy(iv, encryption_params_->nonce, EncryptionParams::kBlockSize - 4);
  uint64_t start_index = encryption_params_->counter + block_index;
  BigEndian::Store32(iv + EncryptionParams::kBlockSize - 4, start_index);
@mbautin mbautin self-assigned this Feb 22, 2020
mbautin added a commit that referenced this issue Mar 5, 2020
Summary:
This is a workaround for the encrypted file corruption issue described
at #3707. When reading a
block from an encrypted SSTable and getting a checksum mismatch, we now
try to increment byte 11 of the initialization vector (carrying over
into earlier bytes when encountering 0xff) and decrypt and verify
checksum once again. Similarly, when reading an SSTable footer, we use
magic number comparison instead of checksum verification.  This
workaround is turned on by default and is controlled using the new
`--encryption_counter_overflow_read_path_workaround` flag.

This diff is not fixing the underlying ignored overflow issue yet. While
it would be easy to do so, the old read path would not be able to read
files written with the new write path. Instead, we will deploy this
workaround, disable encryption, perform major compactions on all data,
and then deploy the real fix and re-enable encryption.

Also we restrict the range of the unsigned 32-bit randomly-generated
initial counter value to [0, 0x7fffffff] by default. This reduces the
effective key size by 1 bit but eliminates the overflow issue for all
files up to 32 GiB in size. This range could be customized using the new
flags: `--encryption_counter_min` and `--encryption_counter_max` (both
of these bounds are inclusive).

Also avoid storing a reference to a shared pointer in
TableReaderOptions.  This fixes an ASAN issue in the new test.

Test Plan:
Jenkins
New test, encrypted_sstable-test, that fails without the fix and passes
with the fix.

Reviewers: rahuldesirazu, kannan, sergei, bogdan

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D8020
rahuldesirazu pushed a commit that referenced this issue Mar 25, 2020
Summary:
This is a workaround for the encrypted file corruption issue described
at #3707. When reading a
block from an encrypted SSTable and getting a checksum mismatch, we now
try to increment byte 11 of the initialization vector (carrying over
into earlier bytes when encountering 0xff) and decrypt and verify
checksum once again. Similarly, when reading an SSTable footer, we use
magic number comparison instead of checksum verification.  This
workaround is turned on by default and is controlled using the new
`--encryption_counter_overflow_read_path_workaround` flag.

This diff is not fixing the underlying ignored overflow issue yet. While
it would be easy to do so, the old read path would not be able to read
files written with the new write path. Instead, we will deploy this
workaround, disable encryption, perform major compactions on all data,
and then deploy the real fix and re-enable encryption.

Also we restrict the range of the unsigned 32-bit randomly-generated
initial counter value to [0, 0x7fffffff] by default. This reduces the
effective key size by 1 bit but eliminates the overflow issue for all
files up to 32 GiB in size. This range could be customized using the new
flags: `--encryption_counter_min` and `--encryption_counter_max` (both
of these bounds are inclusive).

Also avoid storing a reference to a shared pointer in
TableReaderOptions.  This fixes an ASAN issue in the new test.

Test Plan:
Jenkins
New test, encrypted_sstable-test, that fails without the fix and passes
with the fix.

Reviewers: rahuldesirazu, kannan, sergei, bogdan

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D8020
rahuldesirazu added a commit that referenced this issue Mar 25, 2020
Summary:
Support the following yb-admin commands for encryption at rest:

```
add_universe_key_to_all_masters(String keyid, String key_path)
all_masters_have_universe_key_in_memory(String keyId)
rotate_universe_key_in_memory(String keyId)
disable_encryption_in_memory()
```

[#3455] Modify ldb Tool to be Encryption Aware

Add `--key_file` option to ldb to pass in a `<key_id>:</path/to/key/file>` and an option `--only_verify_checksums` flag to `scan` to not print out entries and only do checksum validation. Also provide a yb-admin `write_universe_key_to_file` command to retrieve the universe key and write it to a local file.

[#3454] Make OpenSSL usage thread-safe

https://www.openssl.org/docs/man1.0.2/man3/CRYPTO_set_locking_callback.html

Set a locking and thread id callback to enable thread-safe usage for openssl. Move the openssl init process to a once per-process initialization in RpcServerBase.

[#3707] A workaround for encryption counter overflow

This is a workaround for the encrypted file corruption issue described
at #3707. When reading a
block from an encrypted SSTable and getting a checksum mismatch, we now
try to increment byte 11 of the initialization vector (carrying over
into earlier bytes when encountering 0xff) and decrypt and verify
checksum once again. Similarly, when reading an SSTable footer, we use
magic number comparison instead of checksum verification.  This
workaround is turned on by default and is controlled using the new
`--encryption_counter_overflow_read_path_workaround` flag.

This diff is not fixing the underlying ignored overflow issue yet. While
it would be easy to do so, the old read path would not be able to read
files written with the new write path. Instead, we will deploy this
workaround, disable encryption, perform major compactions on all data,
and then deploy the real fix and re-enable encryption.

Also we restrict the range of the unsigned 32-bit randomly-generated
initial counter value to [0, 0x7fffffff] by default. This reduces the
effective key size by 1 bit but eliminates the overflow issue for all
files up to 32 GiB in size. This range could be customized using the new
flags: `--encryption_counter_min` and `--encryption_counter_max` (both
of these bounds are inclusive).

Also avoid storing a reference to a shared pointer in
TableReaderOptions.  This fixes an ASAN issue in the new test.

#3974 Enable checksum verification for meta blocks of encrypted files

Enable checksum verification for meta blocks of encrypted files. When I
tried to enable checksum verification for meta blocks unconditionally,
a lot of RocksDB unit tests fail, so I am leaving checksum verification
for meta blocks of unencrypted files for a future diff.

Also enable an encrypted SSTable test that is fixed by this change
and fails without this change.

[#3976] Fix encryption format for newly created files

For newly created encrypted files, overflow the counter into the rest
of the initialization vector to match OpenSSL behavior. To do this add
a proto field to encrypted files to indicate whether it is of the old
or new format. For old block-based files, do the checksum retry read
path (first try non-overflow, then overflow key). For new block-based
files, use the old read path that only tries to decrypt using
overflowed keys.

Test Plan:
manual testing with yb-admin

Jenkins: build platform: mac
Jenkins: compile only

ybd --cxx-test ctr_cipher_stream-test --gtest_filter TestCipherStream.ConcurrentEncryption

Jenkins
New test, encrypted_sstable-test, that fails without the fix and passes
with the fix.

Jenkins

Unit tests at the stream level with counter set right before overflow.

Integration cluster tests that test series of flushes and compactions.

Subscribers: kannan, bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D8183
@bmatican bmatican added the area/docdb YugabyteDB core features label Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Projects
None yet
Development

No branches or pull requests

2 participants