-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete by prefix operator in kvdb #360
Conversation
…_del_range Block on liftime for column familly.
probably not an issue, dbiterator is one as it will fail on close db).
@@ -129,3 +149,20 @@ pub trait KeyValueDB: Sync + Send + parity_util_mem::MallocSizeOf { | |||
IoStats::empty() | |||
} | |||
} | |||
|
|||
/// Return for a start inclusive prefix, the non inclusive end. | |||
pub fn end_prefix(prefix: &[u8]) -> Vec<u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function can be simplified
fn end_prefix(mut prefix: Vec<u8>) -> Vec<u8> {
while let Some(0xff) = prefix.last() {
prefix.pop();
}
if let Some(byte) = prefix.last_mut() {
*byte += 1;
}
prefix
}
I'm not sure we want to expose it kvdb
, I see it's used as a utility function in kvdb-rocksdb
and kvdb-memorydb
, but for example, we don't expose other_io_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if the prefix is empty or 0xfffffff...ff, will it delete the whole db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we could reuse this function for iter_from_prefix by setting https://docs.rs/rocksdb/0.13.0/rocksdb/struct.ReadOptions.html#method.set_iterate_upper_bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for empty prefix it does indeed delete every values (not in a very efficient way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about 0xff
prefix? I think we should add some tests and document the difference between 0xff
and 0xff00
as it's not intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to document that, I will add a test it may make thing more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups, I actually meant 0xff
and 0x00ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to document that,
Let's add a few examples and include interesting cases such as 0xff
and 0xff00
among the examples?
} else { | ||
// delete the whole column | ||
let end_range = &[255u8]; | ||
batch.delete_range_cf(cf, &prefix[..], &end_range[..]).map_err(other_io_err)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to delete a range, if we're deleting the whole column later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not delete the whole column (at first I tried to but it requires mutable access and some changes overall). So I end up thinking that if we use the delete range to delete the column we are doing something wrong, so using delete range for it felt more consistent.
(I will change the comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch.delete_range_cf(cf, &prefix[..], &end_range[..]).map_err(other_io_err)?; | |
batch.delete_range_cf(cf, &[], &end_range[..]).map_err(other_io_err)?; |
(the CI can be fixed with |
Co-Authored-By: Andronik Ordian <write@reusable.software>
Co-Authored-By: Andronik Ordian <write@reusable.software>
Co-Authored-By: Andronik Ordian <write@reusable.software>
Co-Authored-By: Andronik Ordian <write@reusable.software>
…n into upgraded_del_range_pr
let end_range = Bound::Excluded(kvdb::end_prefix(&prefix[..])); | ||
let keys: Vec<_> = col.range((start_range, end_range)).map(|(k, _)| k.clone()).collect(); | ||
for key in keys.into_iter() { | ||
col.remove(&key[..]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be optimized by using split_off
and append
, but currently append is slow: rust-lang/rust#34666 and I think memorydb is only used for tests and kvdb-web, so it's fine for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall (modulo a couple of unresolved comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits.
kvdb-rocksdb/src/lib.rs
Outdated
let end_range = kvdb::end_prefix(&prefix[..]); | ||
batch.delete_range_cf(cf, &prefix[..], &end_range[..]).map_err(other_io_err)?; | ||
} else { | ||
// delete every values in the column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// delete every values in the column | |
// deletes all values in the column |
@@ -80,6 +83,11 @@ impl DBTransaction { | |||
pub fn delete(&mut self, col: u32, key: &[u8]) { | |||
self.ops.push(DBOp::Delete { col, key: DBKey::from_slice(key) }); | |||
} | |||
|
|||
/// Delete all values with the given key prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document what happens if called with an empty prefix (deletes all values)?
@@ -129,3 +149,20 @@ pub trait KeyValueDB: Sync + Send + parity_util_mem::MallocSizeOf { | |||
IoStats::empty() | |||
} | |||
} | |||
|
|||
/// Return for a start inclusive prefix, the non inclusive end. | |||
pub fn end_prefix(prefix: &[u8]) -> Vec<u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to document that,
Let's add a few examples and include interesting cases such as 0xff
and 0xff00
among the examples?
kvdb/src/lib.rs
Outdated
fn end_prefix_test() { | ||
assert_eq!(end_prefix(&[5, 6, 7]), vec![5, 6, 8]); | ||
assert_eq!(end_prefix(&[5, 6, 255]), vec![5, 7]); | ||
// this is incorrect as the result is before start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense is it incorrect? Is it "misusage"? Or is the end_prefix()
handling this incorrectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought this comment was a bit confusing, but the test below uses assert not equal
, which maybe redundant in the presence of assert_eq
below that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is redundant, it is mostly documentation (at some point I was not remembering why we can recurse on popping the 0xff).
Which use-cases of generic range deletion can you think of? |
Co-Authored-By: David <dvdplm@gmail.com>
Co-Authored-By: David <dvdplm@gmail.com>
I do not have any, I was just considering that having range in operation could make things more extendable (it is doable with all three implementations, and probably most db key value impl will get more efficient implementation for range (not radix tree)). So not 'needed'. |
* master: kvdb-rocksdb: optimize and rename iter_from_prefix (#365) bump parity-util-mem (#376) parity-util-mem: fix for windows (#375) keccak-hash: fix bench and add one for range (#372) [parity-crypto] Release 0.6.1 (#373) keccak-hash: bump version to 0.5.1 (#371) keccak-hash: add keccak256_range and keccak512_range functions (#370) Allow pubkey recovery for all-zero messages (#369) Delete by prefix operator in kvdb (#360) kvdb: no overlay (#313) Ban duplicates of parity-uil-mem from being linked into the same program (#363) Use correct license ID (#362) Memtest example for Rocksdb (#349) Prep for release (#361) parity-util-mem: prepare release for 0.5.2 (#359) travis: test parity-util-mem on android (#358) parity-util-mem: update mimalloc feature (#352) kvdb: remove parity-bytes dependency (#351) parity-util-mem: use malloc for usable_size on android (#355) CI: troubleshoot macOS build (#356)
* master: (56 commits) primitive-types: add no_std support for serde feature (#385) Add Rocksdb Secondary Instance Api (#384) kvdb-rocksdb: update rocksdb to 0.14 (#379) prepare releases for a few crates (#382) uint: fix UB in uint::from_big_endian (#381) Fix limit prefix delete case (#368) Add arbitrary trait implementation (#378) kvdb-rocksdb: optimize and rename iter_from_prefix (#365) bump parity-util-mem (#376) parity-util-mem: fix for windows (#375) keccak-hash: fix bench and add one for range (#372) [parity-crypto] Release 0.6.1 (#373) keccak-hash: bump version to 0.5.1 (#371) keccak-hash: add keccak256_range and keccak512_range functions (#370) Allow pubkey recovery for all-zero messages (#369) Delete by prefix operator in kvdb (#360) kvdb: no overlay (#313) Ban duplicates of parity-uil-mem from being linked into the same program (#363) Use correct license ID (#362) Memtest example for Rocksdb (#349) ...
This PR add a delete by prefix operator in kvdb.
This is something that I want to use in a PR that bulk delete a substrate child trie (substrate child trie got all their keys prefixed by a unique id, so a deletion by prefix at rocksdb level can be use).
Since it is currently only related to this future child trie deletion PR (in progress), it should not be merge until really needed (unless it seems good to have either way).
I am still creating the PR for review, feedback and visibility.
Writing this PR description, I realize that we could also use an operator that delete a range (start inclusive, end exclusive), which could allow more use cases but is not really needed.