-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor stream implementation #499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty good! I like the pattern. I'm pretty sure we can use a concrete type for children_iter
though.
enum IteratorState<'a> { | ||
struct BranchIterator { | ||
// The nibbles of the key at this node. | ||
key_nibbles: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to keep this as a Box<[u8]>
. It's a little bit smaller (no capacity
) and assures that no one tries go grow or shrink the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might also be a good opportunity to bytes::Bytes
to cut down on allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried making this a Box<[u8]>
but ran into some compiler issues. Maybe you could help me out with that next time we chat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to fix it here
key_nibbles: Vec<u8>, | ||
// Returns the non-empty children of this node | ||
// and their positions in the node's children array. | ||
children_iter: Box<dyn Iterator<Item = (DiskAddress, u8)> + Send>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getting rid of this dynamic dispatch should be fairly straight forward... I'll try to remember to come back up here and comment after reading the rest of the code.
let mut child_key_nibbles = branch_iter.key_nibbles.clone(); // TODO reduce¸cloning | ||
child_key_nibbles.push(pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any way to actually reduce cloning here without storing partial-paths as some sort of borrow. However, in theory, that push
could cause the new Vec
to double in size. If you use iterators, the compiler is more likely to know the exact size to allocate:
let mut child_key_nibbles = branch_iter.key_nibbles.clone(); // TODO reduce¸cloning | |
child_key_nibbles.push(pos); | |
let child_key_nibbles = branch_iter.key_nibbles.iter().copied().chain(Some(pos)).collect(); |
In this case, it's likely that the compiler would have known to allocate the proper size to begin with. There are articles out there that explain this, but my keyword game wasn't quite good enough today to find any quickly. Here's a thread with some explanation though:
https://users.rust-lang.org/t/performance-difference-between-iterator-and-for-loop/50254
I'm not 100% sure, but I think the Alice
from the thread is Alice Rhyl, who's one of the core Tokio contributors. She's got some of the better answers and explanations on that platform.
if let Some(value) = branch.value.as_ref() { | ||
let value = value.to_vec(); | ||
return Poll::Ready(Some(Ok(( | ||
key_from_nibble_iter(child_key_nibbles.into_iter().skip(1)), // skip the sentinel node leading 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just construct child_key_nibbles
without the leading 0 in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd have to special case so that we don't push the 0 index that points from the sentinel node to the root but it's certainly possible
Do you mind sharing what is the major motivation for the refactor? Is it for readability or for better performance? |
Just readability. I haven't tested the performance of this relative to the existing implementation. |
How hard do you think it would be to pop the starting logic into its own method? impl<'a, S: ShaleStore<Node> + Send + Sync, T> Stream for MerkleKeyValueStream<'a, S, T> {
type Item = Result<(Key, Value), api::Error>;
fn poll_next(
mut self: std::pin::Pin<&mut Self>,
_cx: &mut std::task::Context<'_>,
) -> Poll<Option<Self::Item>> {
// destructuring is necessary here because we need mutable access to `key_state`
// at the same time as immutable access to `merkle`
let Self {
key_state,
merkle_root,
merkle,
} = &mut *self;
match key_state {
IteratorState::StartAtKey(key) => {
self.start_at(key)? // might look slightly different than this
self.poll_next(_cx)
}
IteratorState::Iterating { branch_iter_stack } => {
// ...
}
}
}
} This is definitely something that I should have done in the previous pass. |
firewood/src/v2/api.rs
Outdated
#[error("Child not found")] | ||
ChildNotFound, | ||
|
||
#[error("Extension node is at the root; should be branch node")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only referring to sentinel root? as his can happen for the real root, right? The sentinel root we force it to be branch node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated name
firewood/src/merkle/stream.rs
Outdated
@@ -465,6 +444,84 @@ mod tests { | |||
check_stream_is_done(merkle.iter(root)).await; | |||
} | |||
|
|||
#[tokio::test] | |||
async fn no_start_key() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this somewhat overlapped with test at L407? If so, can we incorporate into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR will also work with the extension node removal PR, right?
Not as it is right now. We will need to update the traversal logic in this PR to account for the fact that branch nodes may have partial paths in them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will complete the remainder from L130 on after standups
enum IteratorState<'a> { | ||
struct BranchIterator { | ||
// The nibbles of the key at this node. | ||
key_nibbles: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to fix it here
) -> Result<IteratorState, api::Error> { | ||
let root_node = merkle | ||
.get_node(root_node) | ||
.map_err(|e| api::Error::InternalError(Box::new(e)))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling is poor here; if there is an IO error it gets mapped to an internal error. The right fix for this is to write From
implementations for MerkleError
and ShaleError
for v2::api::Error
.
This should be done in a follow up though.
|
||
// Get each node in [path_to_key]. Mark the last node as being the last | ||
// so we can use that information in the while loop below. | ||
let path_to_key = path_to_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a much easier way than doing all this work up front, see next two comments.
|
||
let mut matched_key_nibbles = vec![]; | ||
|
||
for node in path_to_key { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for node in path_to_key { | |
let mut path_iterator = path_to_key.into_iter().peekable(); | |
while let Some(node) = path_iterator.next() { |
|
||
match node.inner() { | ||
NodeType::Leaf(_) => (), | ||
NodeType::Branch(branch) if is_last => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeType::Branch(branch) if is_last => { | |
NodeType::Branch(branch) if path_iterator.peek().is_none() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you do this, I'd argue the iterator shouldn't actually pre-fetch all the nodes and should just be the iterator over path_to_key
, reading nodes as it needs them.
NodeType::Branch(branch) if is_last => { | ||
// This is the last node in the path to [key]. | ||
// Figure out whether to start iterating over this node's | ||
// children at [pos] or [pos + 1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🏆 level commenting
// Get the child at [pos], if any. | ||
let Some(child) = branch.children.get(pos as usize) else { | ||
// This should never happen -- [pos] should never be OOB. | ||
return Err(api::Error::InternalError(Box::new( | ||
api::Error::ChildNotFound, | ||
))); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a valid case for a panic, as the code prevents this from ever happening.
See https://blog.burntsushi.net/unwrap/#so-one-should-never-use-unwrap-or-expect
We have a rule for prohibiting unwrap() but not expect() exactly for this case.
// Get the child at [pos], if any. | |
let Some(child) = branch.children.get(pos as usize) else { | |
// This should never happen -- [pos] should never be OOB. | |
return Err(api::Error::InternalError(Box::new( | |
api::Error::ChildNotFound, | |
))); | |
}; | |
// Get the child at [pos], if any. | |
let child = branch.children.get(pos as usize).expect("pos cannot be OOB"); |
let child = child | ||
.map(|child_addr| merkle.get_node(child_addr)) | ||
.transpose() | ||
.map_err(|e| api::Error::InternalError(Box::new(e)))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about errors as above. Should be fixed in a followup
let comparer = if child.is_none() { | ||
// The child doesn't exist; we don't need to iterate over [pos]. | ||
<u8 as PartialOrd>::gt | ||
} else { | ||
// The child does exist; the first key to iterate over must be at [pos]. | ||
<u8 as PartialOrd>::ge | ||
}; | ||
|
||
let children_iter = get_children_iter(branch) | ||
.filter(move |(_, child_pos)| comparer(child_pos, &pos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments significantly improve readability but in this case I'd just include the logic in the closure, which makes it even more readable:
let comparer = if child.is_none() { | |
// The child doesn't exist; we don't need to iterate over [pos]. | |
<u8 as PartialOrd>::gt | |
} else { | |
// The child does exist; the first key to iterate over must be at [pos]. | |
<u8 as PartialOrd>::ge | |
}; | |
let children_iter = get_children_iter(branch) | |
.filter(move |(_, child_pos)| comparer(child_pos, &pos)); | |
let children_iter = get_children_iter(branch).filter(move |(_, child_pos)| { | |
if child.is_none() { | |
child_pos > &pos | |
} else { | |
child_pos >= &pos | |
} | |
}); |
Marking as WIP. Going to try to implement this by implementing using a node iterator. See #399. |
closing in favor of #517 |
This PR refactors the stream implementation. The iterator maintains a stack where each element represents an ongoing iteration of a branch node. In each iteration, the element at the top of the stack tells us which key-value pair to traverse next.
The special casing in the "setup" code that deals with branch/extension nodes which are the last element of
path_to_key
is kind of annoying and not ideal, but we can probably refactor this further if/when we implement a function that returns an iterator over all nodes on the path to a given key.