Skip to content

Commit

Permalink
Major rewrite based on enum design and comments
Browse files Browse the repository at this point in the history
  • Loading branch information
Gankra committed Sep 16, 2014
1 parent 9a77de8 commit b175db9
Showing 1 changed file with 88 additions and 83 deletions.
171 changes: 88 additions & 83 deletions active/0000-collection-views.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@

# Summary

Add additional iterator-like View objects to collections.
Views provide a composable mechanism for in-place observation and mutation of a
Add additional iterator-like Entry objects to collections.
Entries provide a composable mechanism for in-place observation and mutation of a
single element in the collection, without having to "re-find" the element multiple times.
This deprecates several "internal mutation" methods like hashmap's `find_or_insert_with`.

# Motivation

As we approach 1.0, we'd like to normalize the standard APIs to be consistent, composable,
and simple. However, this currently stands in opposition to manipulating the collections in
an *efficient* manner. For instance, if we wish to build an accumulating map on top of one
of our concrete maps, we need to distinguish between the case when the element we're inserting
an *efficient* manner. For instance, if one wishes to build an accumulating map on top of one
of the concrete maps, they need to distinguish between the case when the element they're inserting
is *already* in the map, and when it's *not*. One way to do this is the following:

```
Expand All @@ -27,11 +27,11 @@ if map.contains_key(&key) {
}
```

However, this requires us to search for `key` *twice* on every operation.
We might be able to squeeze out the `update` re-do by matching on the result
However, searches for `key` *twice* on every operation.
The second search can be squeezed out the `update` re-do by matching on the result
of `find_mut`, but the `insert` case will always require a re-search.

To solve this problem, we have an ad-hoc mix of "internal mutation" methods which
To solve this problem, Rust currently has an ad-hoc mix of "internal mutation" methods which
take multiple values or closures for the collection to use contextually. Hashmap in particular
has the following methods:

Expand All @@ -50,111 +50,120 @@ the same value (even though only one will ever be called). `find_with_or_insert_
is also actually performing the role of `insert_with_or_update_with`,
suggesting that these aren't well understood.

Rust has been in this position before: internal iteration. Internal iteration was (I'm told)
Rust has been in this position before: internal iteration. Internal iteration was (author's note: I'm told)
confusing and complicated. However the solution was simple: external iteration. You get
all the benefits of internal iteration, but with a much simpler interface, and greater
composability. Thus, we propose the same solution to the internal mutation problem.
composability. Thus, this RFC proposes the same solution to the internal mutation problem.

# Detailed design

A fully tested "proof of concept" draft of this design has been implemented on top of pczarn's
pending hashmap PR, as hashmap seems to be the worst offender, while still being easy
to work with. You can
[read the diff here](https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612).
A fully tested "proof of concept" draft of this design has been implemented on top of hashmap,
as it seems to be the worst offender, while still being easy to work with. You can
[read the diff here](https://github.com/Gankro/rust/commit/39a1fa7c7362a3e22e59ab6601ac09475daff39b).

We replace all the internal mutation methods with a single method on a collection: `view`.
The signature of `view` will depend on the specific collection, but generally it will be similar to
the signature for searching in that structure. `view` will in turn return a `View` object, which
All the internal mutation methods are replaced with a single method on a collection: `entry`.
The signature of `entry` will depend on the specific collection, but generally it will be similar to
the signature for searching in that structure. `entry` will in turn return an `Entry` object, which
captures the *state* of a completed search, and allows mutation of the area.

For convenience, we will use the hashmap draft as an example.

```
pub fn view<'a>(&'a mut self, key: K) -> Entry<'a, K, V>;
/// Get an Entry for where the given key would be inserted in the map
pub fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V>;
/// A view into a single occupied location in a HashMap
pub struct OccupiedEntry<'a, K, V>{ ... }
/// A view into a single empty location in a HashMap
pub struct VacantEntry<'a, K, V>{ ... }
/// A view into a single location in a HashMap
pub enum Entry<'a, K, V> {
/// An occupied Entry
Occupied(OccupiedEntry<'a, K, V>),
/// A vacant Entry
Vacant(VacantEntry<'a, K, V>),
}
```

Of course, the real meat of the API is in the View's interface (impl details removed):

```
impl<'a, K, V> Entry<'a, K, V> {
/// Get a reference to the value at the Entry's location
pub fn get(&self) -> Option<&V>;
impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Get a reference to the value of this Entry
pub fn get(&self) -> &V;
/// Get a mutable reference to the value at the Entry's location
pub fn get_mut(&mut self) -> Option<&mut V>;
/// Get a mutable reference to the value of this Entry
pub fn get_mut(&mut self) -> &mut V;
/// Get a reference to the key at the Entry's location
pub fn get_key(&self) -> Option<&K>;
/// Set the value stored in this Entry
pub fn set(mut self, value: V) -> V;
/// Return whether the Entry's location contains anything
pub fn is_empty(&self) -> bool;
/// Get a reference to the Entry's key
pub fn key(&self) -> &K;
/// Set the key and value of the location pointed to by the Entry, and return any old
/// key and value that might have been there
pub fn set(self, value: V) -> Option<(K, V)>;
/// Take the value stored in this Entry
pub fn take(self) -> V;
}
/// Retrieve the Entry's key
pub fn into_key(self) -> K;
impl<'a, K, V> VacantEntry<'a, K, V> {
/// Set the value stored in this Entry
pub fn set(self, value: V);
}
```

There are definitely some strange things here, so let's discuss the reasoning!

First, `view` takes a `key` by value, because we observe that this is how all the internal mutation
methods work. Further, taking the `key` up-front allows us to avoid *validating* provided keys if
we require an owned `key` later. This key is effectively a *guarantor* of the view.
To compensate, we provide an `into_key` method which converts the entry back into its guarantor.
We also provide a `key` method for getting an immutable reference to the guarantor, in case its
value effects any computations one wishes to perform.
First, `entry` takes a `key` by value, because this is the observed behaviour of the internal mutation
methods. Further, taking the `key` up-front allows implementations to avoid *validating* provided keys if
they require an owned `key` later for insertion. This key is effectively a *guarantor* of the entry.

Taking the key by-value might change once collections reform lands, and Borrow and ToOwned are available.
For now, it's an acceptable solution, because in particular, the primary use case of this functionality
is when you're *not sure* if you need to insert, in which case you should be prepared to insert.
Otherwise, `find_mut` is likely sufficient.

Taking the key by-value might change once associated types land,
and we successfully tackle the "equiv" problem. For now, it's an acceptable solution in our mind.
In particular, the primary use case of this functionality is when you're *not sure* if you need to
insert, in which case you should be prepared to insert. Otherwise, `find_mut` is likely sufficient.
The result is actually an enum, that will either be Occupied or Vacant. These two variants correspond
to concrete types for when the key matched something in the map, and when the key didn't, repsectively.

Next, we provide a nice simple suite of "standard" methods:
`get`, `get_mut`, `get_key`, and `is_empty`.
These do exactly what you would expect, and allow you to query the view to see if it is logically
empty, and if not, what it contains.
If there isn't a match, the user has exactly one option: insert a value using `set`, which will also insert
the guarantor, and destroy the Entry. This is to avoid the costs of maintaining the structure, which
otherwise isn't particularly interesting anymore.

Finally, we provide a `set` method which inserts the provided value using the guarantor key,
and yields the old key-value pair if it existed. Note that `set` consumes the View, because
we lose the guarantor, and the collection might have to shift around a lot to compensate.
Maintaining the entry after an insertion would add significant cost and complexity for no
clear gain.
If there is a match, a more robust set of options is provided. `get` and `get_mut` provide access to the
value found in the location. `set` behaves as the vacant variant, but also yields the old value. `take`
simply removes the found value, and destroys the entry for similar reasons as `set`.

Let's look at how we now `insert_or_update`:
Let's look at how we one now writes `insert_or_update`:

```
let mut view = map.view(key);
if !view.is_empty() {
let v = view.get_mut().unwrap();
let new_v = *v + 1;
*v = new_v;
} else {
view.set(1);
match map.entry(key) {
Occupied(entry) => {
let v = entry.get_mut();
let new_v = *v + 1;
*v = new_v;
}
Vacant(entry) => {
entry.set(1);
}
}
```

We can now write our "intuitive" inefficient code, but it is now as efficient as the complex
One can now write something equivalent to the "intuitive" inefficient code, but it is now as efficient as the complex
`insert_or_update` methods. In fact, this matches so closely to the inefficient manipulation
that users could reasonable ignore views *until performance becomes an issue*. At which point
it's an almost trivial migration. We also don't need closures to dance around the fact that we
want to avoid generating some values unless we have to, because that falls naturally out of our
that users could reasonable ignore Entries *until performance becomes an issue*. At which point
it's an almost trivial migration. Closures also aren't needed to dance around the fact that one may
want to avoid generating some values unless they have to, because that falls naturally out of
normal control flow.

If you look at the actual patch that does this, you'll see that Entry itself is exceptional
If you look at the actual patch that does this, you'll see that Entry itself is exceptionally
simple to implement. Most of the logic is trivial. The biggest amount of work was just
capturing the search state correctly, and even that was mostly a cut-and-paste job.

With Views, we also open up the gate for... *adaptors*!
You really want `insert_or_update`? We can provide that for you! Generically!
However we believe such discussion is out-of-scope for this RFC. Adaptors can
be tackled in a back-compat manner after this has landed, and we have a better sense
of what we need or want.
With Entries, the gate is also opened for... *adaptors*!
Really want `insert_or_update` back? That can be written on top of this generically with ease.
However, such discussion is out-of-scope for this RFC. Adaptors can
be tackled in a back-compat manner after this has landed, and usage is observed. Also, this
proposal does not provide any generic trait for Entries, preferring concrete implementations for
the time-being.

# Drawbacks

Expand All @@ -180,21 +189,17 @@ However, preventing invalidation would be more expensive, and it's not clear tha
cursor semantics would make sense on e.g. a HashMap, as you can't insert *any* key
in *any* location.

# Unresolved questions

One thing omitted from the design was a "take" method on the Entry. The reason for this
is primarily that this doesn't seem to be a thing people are interested in having for
internal manipulation. However, it also just would have meant more complexity, especially
if it *didn't* consume the View. Do we want this functionality?
* This RFC originally [proposed a design without enums that was substantially more complex]
(https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612).
However it had some interesting ideas about Key manipulation, so we mention it here for
historical purposes.

# Unresolved questions
The internal mutation methods cannot actually be implemented in terms of the View, because
they return a mutable reference at the end, and there's no way to do that with the current
View design. However, it's not clear why this is done by them. We believe it's simply to
validate what the method *actually did*. If this is the case, then Views make this functionality
obsolete. However, if this is *still* desirable, we could tweak `set` to do this as well.
Do we want this functionality?

Do we want to introduce a proper standard trait, or keep it all concrete and ad-hoc for a while
to figure out what does and doesn't work?
obsolete. However, if this is *still* desirable, `set` could be tweaked to do this as well.
However for some structures it may incur additional cost. Is this desirable functionality?

Naming bikesheds!

0 comments on commit b175db9

Please sign in to comment.