LevelIterator to avoid gap after prefix bloom filters out a file #5861

siying · 2019-09-27T23:20:46Z

Summary:
Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match.

Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test

Summary: Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match. Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test

maysamyabandeh · 2019-10-08T18:45:35Z

db/version_set.cc

+    // We've skipped the file we initially positioned to. In the prefix
+    // seek case, it is likely that the file is skipped because of
+    // prefix bloom or hash, where more keys are skipped. We then check
+    // the current key and invalidate the iterator if the prefix is


Based on the inline comment, it is not clear to me that if this is an optimization or bug fix.

It's not an optimization nor bug. It's to make the internal interface clearer.

What happens if we do not invalidate? Does it get invalidated anyway somewhere else? Is the goal to invalidate early?

The comment below says "This avoid LevelIterator to skip keys". So before the patch it did not skip the keys? Was it a bug?

It may ever be invalidated, but it's still follow the contract. The goal is not to invalidate earlier. The purpose of the PR is not to put it in a very surprising location. Skipping keys is not a correctness bug if the keys skipped don't belong to the same prefix as the seek key.

maysamyabandeh · 2019-10-08T18:52:46Z

db/version_set.cc

+        (!prefix_extractor_->InDomain(file_user_key) ||
+         user_comparator_.Compare(
+             prefix_extractor_->Transform(target_user_key),
+             prefix_extractor_->Transform(file_user_key)) != 0)) {


It is not clear to me that why we should invalidate the iterator if the "first key" of the file has a different prefix that the target key. Could not the key after that have the same prefix as our target key?

We don't have to invalidate the iterator, but we can. The contract of prefix iterating after the prefix allows both cases. Is your question about why we want to pick the former case here?

"The contract of prefix iterating after the prefix allows both cases." I think this would be a helpful sentence in inline comments. I am thinking of a future refactorer trying to figure how essential this step. Saying that the contract allow both but we change it to do invalidate, and mentioning the benefit would help the future refactorer.
Currently it only says "This avoids LevelIterator to skip keys. " which is quite ambiguous to me. The it talks about a side-benefit, which sounds like an optimization. If that is the case, it is better to be mentioned.

What I understood so far from your comments is this: "The contract allow two implementations: to invalidate or not to invalidate. We change the implementation to "to invalidate" to optimize for performance.

Does this require using user_comparator for comparison? I think using Slice's own compare method is sufficient. Using the former to compare prefixes might result in undefined behavior.

maysamyabandeh · 2019-10-14T21:39:31Z

db/version_set.cc

+    // been exhausted, it can jump any key that is larger. Here we are
+    // enforcing a stricter contract than that, in order to make it easier for
+    // higher layers (merging and DB iterator) to reason the correctness:
+    // 1. Within the prefix, the result should very accurate.


Is this a typo?
s/very accurate/be accurate

facebook-github-bot

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2019-10-21T18:21:14Z

The Travis failure is a timeout and is not related.

facebook-github-bot

@siying is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-10-21T20:39:05Z

This pull request has been merged in a0cd920.

…e delete Summary: Recent change facebook#5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case: Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added. Fix the bug by setting prefix_extractor to be null if total order seek is used. Test Plan: Add a unit test which fails without the fix.

…e delete (#6028) Summary: Recent change #5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case: Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added. Fix the bug by setting prefix_extractor to be null if total order seek is used. Pull Request resolved: #6028 Test Plan: Add a unit test which fails without the fix. Differential Revision: D18479063 fbshipit-source-id: ac075f013029fcf69eb3a598f14c98cce3e810b3

…ebook#5861) Summary: Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match. Pull Request resolved: facebook#5861 Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test Differential Revision: D17918213 fbshipit-source-id: f06b47d937c7cc8919001f18dcc3af5b28c9cdac

…e delete (facebook#6028) Summary: Recent change facebook#5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case: Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added. Fix the bug by setting prefix_extractor to be null if total order seek is used. Pull Request resolved: facebook#6028 Test Plan: Add a unit test which fails without the fix. Differential Revision: D18479063 fbshipit-source-id: ac075f013029fcf69eb3a598f14c98cce3e810b3

siying requested a review from maysamyabandeh September 27, 2019 23:20

facebook-github-bot added the CLA Signed label Sep 27, 2019

maysamyabandeh reviewed Oct 8, 2019

View reviewed changes

Further improved the comment.

4f1b5d6

maysamyabandeh approved these changes Oct 14, 2019

View reviewed changes

maysamyabandeh reviewed Oct 14, 2019

View reviewed changes

Fix typo

264c7bb

facebook-github-bot reviewed Oct 14, 2019

View reviewed changes

Update HISTORY.md

ee65eb6

facebook-github-bot reviewed Oct 21, 2019

View reviewed changes

facebook-github-bot closed this in a0cd920 Oct 21, 2019

facebook-github-bot added the Merged label Oct 21, 2019

siying mentioned this pull request Nov 13, 2019

Fix a regression bug on total order seek with prefix enabled and range delete #6028

Closed

ajkr mentioned this pull request Aug 22, 2022

Skip swaths of range tombstone covered keys in merging iterator (2022 edition) #10449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LevelIterator to avoid gap after prefix bloom filters out a file #5861

LevelIterator to avoid gap after prefix bloom filters out a file #5861

siying commented Sep 27, 2019

maysamyabandeh Oct 8, 2019

siying Oct 10, 2019

maysamyabandeh Oct 14, 2019

siying Oct 14, 2019

maysamyabandeh Oct 8, 2019

siying Oct 10, 2019

maysamyabandeh Oct 14, 2019

wqshr12345 Jul 12, 2024

maysamyabandeh Oct 14, 2019

facebook-github-bot left a comment

siying commented Oct 21, 2019

facebook-github-bot left a comment

facebook-github-bot commented Oct 21, 2019

LevelIterator to avoid gap after prefix bloom filters out a file #5861

LevelIterator to avoid gap after prefix bloom filters out a file #5861

Conversation

siying commented Sep 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

siying commented Oct 21, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 21, 2019