-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement append and split_off for BTreeMap and BTreeSet #26227
Conversation
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
reason = "recently added as part of collections reform 2")] | ||
pub fn append(&mut self, other: &mut Self) { | ||
// Read all values from `other` into `self`. | ||
// The use of `ptr::read` is safe as we clear `other` afterwards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look safe -- clear will run drop on all keys & values, so it leads to double drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. 😳
I already have a fix for that.
I'm not familiar enough with the search stack code, so I'd love if that part were reviewed by some one else. |
bd1f270
to
6d959cc
Compare
Updated with fixes to bluss's remarks |
r? @bluss (I'm handing this off officially, since you've started reviewing it anyway!) |
@glaebhoerl is the most experienced with the current design, but I'll be happy to review this a bit later. |
@gankro I think that must have been a typo? |
Oops, yes I meant @gereeter |
Thanks, the parts I looked at look great now |
{ | ||
// `unwrap` won't panic because `self.len()` > 0. | ||
if at <= self.keys().next().unwrap().borrow() { | ||
should_swap = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this condition just be assigned to should_swap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course.
This looks like this should work fine, but I'm not sure about the algorithms. |
First, thanks everyone for review and input, this is great! |
I can't seem to find any references talking about splitting or merging B-Trees, unfortunately - it just isn't an operation that most users have to do. For For Suppose we want to split the following tree at 7:
We start at the root, looking for where seven would go:
Since that point is in the middle of the root, we split the root into two piece, one larger and one smaller:
Once we've done that, we move on to the next node searching for 7:
At that point, as in the root node, we split the node into a greater part and a lesster part:
Since that node we just split was a leaf node (also, because we actually found our splitting key), we don't need to do any more splitting, and we are left with two trees, one greater than our key and the other less than our key. There is still some more work involved, as this splitting process probably left many of the nodes that we split underfull, requiring steps to recoalesce nodes, but once that is done, the B-Tree is split. Note that since this just goes up and down to search path the the node, it only take |
Note: Since the splitting and merging algorithms are fairly involved and badly documented, I would not at all be opposed to merging this PR as is and opening performance issues to use the better algorithms. |
That sounds interesting, thank you for the explanation! I will try to implement something like that. |
☔ The latest upstream changes (presumably #26192) made this pull request unmergeable. Please resolve the merge conflicts. |
6d959cc
to
8026390
Compare
I pushed a linear time implementation of |
|
||
// Second, we build a tree from the sorted sequence in linear time. | ||
self.length = elements.len(); | ||
let (depth, root) = Node::from_sorted_iter(elements.into_iter(), self.length, self_b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can build from an iterator, it would be more efficient to make an iterator that merges two iterators instead of going through a Vec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't like allocating a new Vec
here, too, but for the algorithm in from_sorted_iter
to work, I have to know how many elements I have after the merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bleh, I hadn't though of the equal key case. It still might be possible and not too difficult, as in the worst case, only the "right edge" of the BTree will be left underfull, and every underfull node except for the root (which, if I remember correctly, is allowed to be underfull) is adjacent to a completely full node, allowing a simple steal to fix things.
|
||
loop { | ||
// Determine how many nodes we need on this level. | ||
let num_nodes = num_elements / (capacity + 1) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining why this calculation is correct?
@jooert what's the status of this PR? |
Changes the internal SearchStack API to return the key on removal as well.
8026390
to
5f1a116
Compare
@gankro Sorry for the very long delay and thanks for the kind words! I've just pushed a new version of I haven't started implementing the |
@jooert So I've become increasingly convinced that BTree needs to be refactored to use parent pointers (parent_ptr, edge_index). This would remove all allocations except for Is this something that you think would simplify your code? Something that you'd be interested in doing? Note that one can actually implement parent pointers but not bother to replace all the search stacks to start. That is it's theoretically possible for the two to co-exist temporarily. |
Parent pointers would simplify my code in so far as I wouldn't need to keep track of the different levels of the tree using the |
To my knowledge, it's a performance slam dunk. This is the strategy used in Google's https://code.google.com/p/cpp-btree/ |
@jooert Possibly stupid question. You said two pointers because we'd need not only a "parent pointer" but also an "index in parent"? We can always use u8 for len/cap/parent_index and move those to the end of the struct. That'd save quite a bit of space. The google implementation linked by @gankro uses u8 as default. |
@arthurprs Yes, exactly. |
Rust will pad the struct to have a size that's a multiple of its alignment, though. So having using a u8 doesn't actually save you anything if it will just be rounded up to what a u64 would have done. |
@gankro not really, it depends where in the struct you want to have your smaller types. If you stick a u8 in between 2 usizes it'll have 7 bytes of padding to allow aligned access on the second. At least in Cish standard ABI. What I'm proposing is sticking them at the end. Example: http://is.gd/b7ChEQ This way the new Node will have the same size as the current one (40 bytes in 64bit builds), the only limitation is having a B <= 255. If that's not enough we can use u16s instead and still keep the 40 bytes size, not true for 32bit builds though. |
Yes if you fold other values to be smaller, you will get savings. I was assuming you were just suggesting only making one field a u8 (which would be useless). |
Note that in a defunct PR I removed the ability to set B at all, so we can safely just make it a constant that "happens" to work. (the current default has always seemed fine, and gains from changing it are small to trivial). |
Cool, so if we ever go this route (parent pointers) we should consider using these smaller integers types to save space. |
Is anyone working on the parent pointer implementation? |
Am 08.08.2015 um 21:25 schrieb Andrew Paseltiner:
|
☔ The latest upstream changes (presumably #28043) made this pull request unmergeable. Please resolve the merge conflicts. |
What's the status of this? Blocked on the parent pointer rewrite? Ready to go with a rebase? Outstanding concerns? |
I think this should wait pending the rewrite at https://github.com/gereeter/btree-rewrite. |
Ah ok, in that case I'm gonna close this for now (clearing out the queue). |
Implement `append` for b-trees. I have finally found time to revive #26227, this time only with an `append` implementation. The algorithm implemented here is linear in the size of the two b-trees. It firsts creates a `MergeIter` from the two b-trees and then builds a new b-tree by pushing key-value pairs from the `MergeIter` into nodes at the right heights. Three functions for stealing have been added to the implementation of `Handle` as well as a getter for the height of a `NodeRef`. The docs have been updated with performance information about `BTreeMap::append` and the remark about B has been removed now that it is the same for all instances of `BTreeMap`. cc @gereeter @gankro @apasel422
Changes the internal SearchStack API to return the key on removal as well.