-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LevelIterator to avoid gap after prefix bloom filters out a file #5861
Conversation
Summary: Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match. Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test
// We've skipped the file we initially positioned to. In the prefix | ||
// seek case, it is likely that the file is skipped because of | ||
// prefix bloom or hash, where more keys are skipped. We then check | ||
// the current key and invalidate the iterator if the prefix is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the inline comment, it is not clear to me that if this is an optimization or bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an optimization nor bug. It's to make the internal interface clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we do not invalidate? Does it get invalidated anyway somewhere else? Is the goal to invalidate early?
The comment below says "This avoid LevelIterator to skip keys". So before the patch it did not skip the keys? Was it a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may ever be invalidated, but it's still follow the contract. The goal is not to invalidate earlier. The purpose of the PR is not to put it in a very surprising location. Skipping keys is not a correctness bug if the keys skipped don't belong to the same prefix as the seek key.
(!prefix_extractor_->InDomain(file_user_key) || | ||
user_comparator_.Compare( | ||
prefix_extractor_->Transform(target_user_key), | ||
prefix_extractor_->Transform(file_user_key)) != 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me that why we should invalidate the iterator if the "first key" of the file has a different prefix that the target key. Could not the key after that have the same prefix as our target key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to invalidate the iterator, but we can. The contract of prefix iterating after the prefix allows both cases. Is your question about why we want to pick the former case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The contract of prefix iterating after the prefix allows both cases." I think this would be a helpful sentence in inline comments. I am thinking of a future refactorer trying to figure how essential this step. Saying that the contract allow both but we change it to do invalidate, and mentioning the benefit would help the future refactorer.
Currently it only says "This avoids LevelIterator to skip keys. " which is quite ambiguous to me. The it talks about a side-benefit, which sounds like an optimization. If that is the case, it is better to be mentioned.
What I understood so far from your comments is this: "The contract allow two implementations: to invalidate or not to invalidate. We change the implementation to "to invalidate" to optimize for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require using user_comparator for comparison? I think using Slice's own compare method is sufficient. Using the former to compare prefixes might result in undefined behavior.
db/version_set.cc
Outdated
// been exhausted, it can jump any key that is larger. Here we are | ||
// enforcing a stricter contract than that, in order to make it easier for | ||
// higher layers (merging and DB iterator) to reason the correctness: | ||
// 1. Within the prefix, the result should very accurate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo?
s/very accurate/be accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
The Travis failure is a timeout and is not related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This pull request has been merged in a0cd920. |
…e delete Summary: Recent change facebook#5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case: Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added. Fix the bug by setting prefix_extractor to be null if total order seek is used. Test Plan: Add a unit test which fails without the fix.
…e delete (#6028) Summary: Recent change #5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case: Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added. Fix the bug by setting prefix_extractor to be null if total order seek is used. Pull Request resolved: #6028 Test Plan: Add a unit test which fails without the fix. Differential Revision: D18479063 fbshipit-source-id: ac075f013029fcf69eb3a598f14c98cce3e810b3
…ebook#5861) Summary: Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match. Pull Request resolved: facebook#5861 Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test Differential Revision: D17918213 fbshipit-source-id: f06b47d937c7cc8919001f18dcc3af5b28c9cdac
…e delete (facebook#6028) Summary: Recent change facebook#5861 mistakely use "prefix_extractor_ != nullptr" as the condition to determine whehter prefix bloom filter isused. It fails to consider read_options.total_order_seek, so it is wrong. The result is that an optimization for non-total-order seek is mistakely applied to total order seek, and introduces a bug in following corner case: Because of RangeDelete(), a file's largest key is extended. Seek key falls into the range deleted file, so level iterator seeks into the previous file without getting any key. The correct behavior is to place the iterator to the first key of the next file. However, an optimization is triggered and invalidates the iterator because it is out of the prefix range, causing wrong results. This behavior is reproduced in the unit test added. Fix the bug by setting prefix_extractor to be null if total order seek is used. Pull Request resolved: facebook#6028 Test Plan: Add a unit test which fails without the fix. Differential Revision: D18479063 fbshipit-source-id: ac075f013029fcf69eb3a598f14c98cce3e810b3
Summary:
Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match.
Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test