Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make functional BST more-functional. #6

Merged
merged 1 commit into from
Nov 24, 2021

Conversation

mlodato517
Copy link
Owner

This Commit
Adds history to the functional BST and no longer requires ownership to
insert/delete.

Why?
Mainly to make benchmarking easier but this has the benefit of more
closely matching a functional data structure in that, when new trees are
created from "mutating" operations, the history of trees/operations is
still accessible.

@mlodato517 mlodato517 self-assigned this Oct 21, 2021
@mlodato517 mlodato517 force-pushed the mlodato517-more-fully-functional-bst branch from 7cee4c3 to cce27e3 Compare October 21, 2021 16:59
@mlodato517 mlodato517 changed the title [WIP - testing GH Action] Make functional BST more-functional. Make functional BST more-functional. Oct 21, 2021
//! # Examples
//!
//! ```
//! use bst::functional::Tree;
Copy link
Owner Author

@mlodato517 mlodato517 Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of duplicative with other doctests but it makes for nice docs.

src/functional.rs Show resolved Hide resolved
where
K: cmp::Ord,
{
match self {
Tree::Leaf => self,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have to return an owned Self and self is a borrowed Self. We could add a Clone implementation to Tree (I think we'll want it later anyway) but this seems fine for now.

left: Box<Tree<K, V>>,
right: Box<Tree<K, V>>,
key: Rc<K>,
value: Rc<V>,
Copy link
Owner Author

@mlodato517 mlodato517 Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm conflicted here. I have this here because of how delete_largest works. It returns an owned key and value and we can't do that unless we Rc each individual field. I guess it could instead return Rc<Node>? I'm guessing we'd basically clone the predecessor node and return that. But it'd have to have a different subtree and we can't mutate it 🤔

Going to think on it more but I'm also guessing one of Chris's patented Good Questions™️ will clear this up presently.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still going to wait for Chris's wisdom but I'm semi-convinced this is correct (reference counting all the fields of the Node, that is) - if a new Node is inserted with the same key as another, we must make a new node that references the key and subtrees of the original Node and I think that's only possible by cloning each field.

mlodato517 added a commit that referenced this pull request Oct 25, 2021
**Why?**
I didn't think I could do this but the issue was that I had `derive`d
`Clone` on `Node` when I should've implemented it directly because it's
generic and leverages reference counting.

See #6 (comment)
@mlodato517
Copy link
Owner Author

@patientstreetlight bump <3

left: Box::new(n.left.insert(key, value)),
..n
left: Rc::new(n.left.insert(key, value)),
..n.clone()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patientstreetlight wondered if this syntax was wasteful - if we were incrementing all the reference counts just to have to decrement the unused left's reference count after. The docs seem to indicate this would be true:

The entire expression uses the given values for the fields that were specified and moves or copies the remaining fields from the base expression.

(since here the base expression, which is n.clone() would need to be evaluated fully and then the relevant fields moved out). A test in godbolt confirms. Our options are:

  1. explicitly clone the three needed fields every time to save on cloning the one field we don't need unnecessarily
  2. accept the performance hit of incrementing and decrementing the counter for the readability

I lean towards the first one but won't die on either hill.

Copy link
Owner Author

@mlodato517 mlodato517 Nov 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on performance - while we’re just incrementing and decrementing a counter the compiler does emit conditional instructions since it has to check:

  1. Will this increment overflow? If so, I should panic
  2. Will this decrement result in zero in which case we should drop the underlying value

so those may have a realistic performance impact (we could even leave it in for fun and remove it after adding the benchmarks to see!)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd lean towards (1) as well just because it feels semantically cleaner to me to not do the unneeded increments/decrements, even if it is syntactically uglier.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlodato517 added a commit that referenced this pull request Nov 22, 2021
**This Commit**
Removes instances of instantiating new `Node`s using the functional
update syntax of a clone of an existing `Node`. That is it replaces:

```rust
Node {
  some_field: some_value,
  ..existing_node.clone()
}
```

with

```rust
Node {
  some_field: some_value,
  other_field: existing_node.other_field.clone(),
  // etc.
}
```

**Why?**
For clarity - when using `..node.clone()` what's
happening is we are first cloning the node in its entirety (which
results in bumping the reference count for all its fields), moving the
fields we care about to the new `Node`, and then dropping the remaining
fields (and decrementing their reference count). This is a surprise to
the reader and is needless and not what we want to express. It also may
have a performance impact but that isn't measured.

For more details see [this comment thread][0].

[0]: #6 (comment)
..n
left: Rc::new(n.left.insert(key, value)),
key: n.key.clone(),
right: n.right.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is some convention of preferring Rc::clone(&rcVal) over rcVal.clone() for reference counted pointers.

The Rust Book also says

We could have called a.clone() rather than Rc::clone(&a), but Rust’s convention is to use Rc::clone in this case. The implementation of Rc::clone doesn’t make a deep copy of all the data like most types’ implementations of clone do. The call to Rc::clone only increments the reference count, which doesn’t take much time. Deep copies of data can take a lot of time. By using Rc::clone for reference counting, we can visually distinguish between the deep-copy kinds of clones and the kinds of clones that increase the reference count. When looking for performance problems in the code, we only need to consider the deep-copy clones and can disregard calls to Rc::clone.

I'd say it's worth following that convention here as well.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! 18f0d44

mlodato517 added a commit that referenced this pull request Nov 24, 2021
**This Commit**
Denies the use of `.clone()` on reference counted pointers as advised by
[this `clippy` lint][0] and [The Rust Book][1]. It also addresses
instances where the lint was violated.

**Why?**
It's best practice to make clear the clone is cheap. There is [some
debate][2] on the issue and the [standard library no longer explicitly
recommends it][3] but it is still noted as more idiomatic.

In any case, for us there are no issues making the change and if there
is a possibility for being more expressive, we should take it.

See #6 (comment)
for more details.

[0]: https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_ref_ptr
[1]: https://doc.rust-lang.org/book/ch15-04-rc.html#using-rct-to-share-data
[2]: rust-lang/rust-clippy#2048
[3]: rust-lang/rust#76138
Copy link
Collaborator

@patientstreetlight patientstreetlight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Comment on lines +171 to +172
(Tree::Leaf, Tree::Node(right)) => Tree::Node(right.clone()),
(Tree::Node(left), Tree::Leaf) => Tree::Node(left.clone()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More just curious - are these 2 .clone() calls intentionally not Rc::clone()?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - right and left are of type Node instead of Rc so we can't use that syntax. We could use Node::clone(left) but that isn't obviously clearer (in my opinion).

**This Commit**
Adds history to the functional BST and no longer requires ownership to
`insert`/`delete`.

**Why?**
Mainly to make benchmarking easier but this has the benefit of more
closely matching a functional data structure in that, when new trees are
created from "mutating" operations, the history of trees/operations is
still accessible.

Use struct update syntax for conciseness

**Why?**
I didn't think I could do this but the issue was that I had `derive`d
`Clone` on `Node` when I should've implemented it directly because it's
generic and leverages reference counting.

See #6 (comment)

Explicitly clone needed fields of nodes

**This Commit**
Removes instances of instantiating new `Node`s using the functional
update syntax of a clone of an existing `Node`. That is it replaces:

```rust
Node {
  some_field: some_value,
  ..existing_node.clone()
}
```

with

```rust
Node {
  some_field: some_value,
  other_field: existing_node.other_field.clone(),
  // etc.
}
```

**Why?**
For clarity - when using `..node.clone()` what's
happening is we are first cloning the node in its entirety (which
results in bumping the reference count for all its fields), moving the
fields we care about to the new `Node`, and then dropping the remaining
fields (and decrementing their reference count). This is a surprise to
the reader and is needless and not what we want to express. It also may
have a performance impact but that isn't measured.

For more details see [this comment thread][0].

[0]: #6 (comment)

Deny `clippy::clone_on_ref_ptr`

**This Commit**
Denies the use of `.clone()` on reference counted pointers as advised by
[this `clippy` lint][0] and [The Rust Book][1]. It also addresses
instances where the lint was violated.

**Why?**
It's best practice to make clear the clone is cheap. There is [some
debate][2] on the issue and the [standard library no longer explicitly
recommends it][3] but it is still noted as more idiomatic.

In any case, for us there are no issues making the change and if there
is a possibility for being more expressive, we should take it.

See #6 (comment)
for more details.

[0]: https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_ref_ptr
[1]: https://doc.rust-lang.org/book/ch15-04-rc.html#using-rct-to-share-data
[2]: rust-lang/rust-clippy#2048
[3]: rust-lang/rust#76138
@mlodato517 mlodato517 force-pushed the mlodato517-more-fully-functional-bst branch from 18f0d44 to 42dd0c6 Compare November 24, 2021 23:47
@mlodato517 mlodato517 merged commit 7c30204 into main Nov 24, 2021
@mlodato517 mlodato517 deleted the mlodato517-more-fully-functional-bst branch November 24, 2021 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants