Skip to content

Commit

Permalink
Merge remote-tracking branch 'aturon/slice-notation'
Browse files Browse the repository at this point in the history
  • Loading branch information
alexcrichton committed Sep 11, 2014
2 parents 61c64a2 + c233ec7 commit 7e4046e
Showing 1 changed file with 227 additions and 0 deletions.
227 changes: 227 additions & 0 deletions active/0000-slice-notation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
- Start Date: (fill me in with today's date, 2014-08-12)
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

This RFC adds *overloaded slice notation*:

- `foo[]` for `foo.as_slice()`
- `foo[n..m]` for `foo.slice(n, m)`
- `foo[n..]` for `foo.slice_from(n)`
- `foo[..m]` for `foo.slice_to(m)`
- `mut` variants of all the above

via two new traits, `Slice` and `SliceMut`.

It also changes the notation for range `match` patterns to `...`, to
signify that they are inclusive whereas `..` in slices are exclusive.

# Motivation

There are two primary motivations for introducing this feature.

### Ergonomics

Slicing operations, especially `as_slice`, are a very common and basic thing to
do with vectors, and potentially many other kinds of containers. We already
have notation for indexing via the `Index` trait, and this RFC is essentially a
continuation of that effort.

The `as_slice` operator is particularly important. Since we've moved away from
auto-slicing in coercions, explicit `as_slice` calls have become extremely
common, and are one of the
[leading ergonomic/first impression](https://github.com/rust-lang/rust/issues/14983)
problems with the language. There are a few other approaches to address this
particular problem, but these alternatives have downsides that are discussed
below (see "Alternatives").

### Error handling conventions

We are gradually moving toward a Python-like world where notation like `foo[n]`
calls `fail!` when `n` is out of bounds, while corresponding methods like `get`
return `Option` values rather than failing. By providing similar notation for
slicing, we open the door to following the same convention throughout
vector-like APIs.

# Detailed design

The design is a straightforward continuation of the `Index` trait design. We
introduce two new traits, for immutable and mutable slicing:

```rust
trait Slice<Idx, S> {
fn as_slice<'a>(&'a self) -> &'a S;
fn slice_from(&'a self, from: Idx) -> &'a S;
fn slice_to(&'a self, to: Idx) -> &'a S;
fn slice(&'a self, from: Idx, to: Idx) -> &'a S;
}

trait SliceMut<Idx, S> {
fn as_mut_slice<'a>(&'a mut self) -> &'a mut S;
fn slice_from_mut(&'a mut self, from: Idx) -> &'a mut S;
fn slice_to_mut(&'a mut self, to: Idx) -> &'a mut S;
fn slice_mut(&'a mut self, from: Idx, to: Idx) -> &'a mut S;
}
```

(Note, the mutable names here are part of likely changes to naming conventions
that will be described in a separate RFC).

These traits will be used when interpreting the following notation:

*Immutable slicing*

- `foo[]` for `foo.as_slice()`
- `foo[n..m]` for `foo.slice(n, m)`
- `foo[n..]` for `foo.slice_from(n)`
- `foo[..m]` for `foo.slice_to(m)`

*Mutable slicing*

- `foo[mut]` for `foo.as_mut_slice()`
- `foo[mut n..m]` for `foo.slice_mut(n, m)`
- `foo[mut n..]` for `foo.slice_from_mut(n)`
- `foo[mut ..m]` for `foo.slice_to_mut(m)`

Like `Index`, uses of this notation will auto-deref just as if they were method
invocations. So if `T` implements `Slice<uint, [U]>`, and `s: Smaht<T>`, then
`s[]` compiles and has type `&[U]`.

Note that slicing is "exclusive" (so `[n..m]` is the interval `n <= x
< m`), while `..` in `match` patterns is "inclusive". To avoid
confusion, we propose to change the `match` notation to `...` to
reflect the distinction. The reason to change the notation, rather
than the interpretation, is that the exclusive (respectively
inclusive) interpretation is the right default for slicing
(respectively matching).

## Rationale for the notation

The choice of square brackets for slicing is straightforward: it matches our
indexing notation, and slicing and indexing are closely related.

Some other languages (like Python and Go -- and Fortran) use `:` rather than
`..` in slice notation. The choice of `..` here is influenced by its use
elsewhere in Rust, for example for fixed-length array types `[T, ..n]`. The `..`
for slicing has precedent in Perl and D.

See [Wikipedia](http://en.wikipedia.org/wiki/Array_slicing) for more on the
history of slice notation in programming languages.

### The `mut` qualifier

It may be surprising that `mut` is used as a qualifier in the proposed
slice notation, but not for the indexing notation. The reason is that
indexing includes an implicit dereference. If `v: Vec<Foo>` then
`v[n]` has type `Foo`, not `&Foo` or `&mut Foo`. So if you want to get
a mutable reference via indexing, you write `&mut v[n]`. More
generally, this allows us to do resolution/typechecking prior to
resolving the mutability.

This treatment of `Index` matches the C tradition, and allows us to
write things like `v[0] = foo` instead of `*v[0] = foo`.

On the other hand, this approach is problematic for slicing, since in
general it would yield an unsized type (under DST) -- and of course,
slicing is meant to give you a fat pointer indicating the size of the
slice, which we don't want to immediately deref. But the consequence
is that we need to know the mutability of the slice up front, when we
take it, since it determines the type of the expression.

# Drawbacks

The main drawback is the increase in complexity of the language syntax. This
seems minor, especially since the notation here is essentially "finishing" what
was started with the `Index` trait.

## Limitations in the design

Like the `Index` trait, this forces the result to be a reference via
`&`, which may rule out some generalizations of slicing.

One way of solving this problem is for the slice methods to take
`self` (by value) rather than `&self`, and in turn to implement the
trait on `&T` rather than `T`. Whether this approach is viable in the
long run will depend on the final rules for method resolution and
auto-ref.

In general, the trait system works best when traits can be applied to
types `T` rather than borrowed types `&T`. Ultimately, if Rust gains
higher-kinded types (HKT), we could change the slice type `S` in the
trait to be higher-kinded, so that it is a *family* of types indexed
by lifetime. Then we could replace the `&'a S` in the return value
with `S<'a>`. It should be possible to transition from the current
`Index` and `Slice` trait designs to an HKT version in the future
without breaking backwards compatibility by using blanket
implementations of the new traits (say, `IndexHKT`) for types that
implement the old ones.

# Alternatives

For improving the ergonomics of `as_slice`, there are two main alternatives.

## Coercions: auto-slicing

One possibility would be re-introducing some kind of coercion that automatically
slices.
We used to have a coercion from (in today's terms) `Vec<T>` to
`&[T]`. Since we no longer coerce owned to borrowed values, we'd probably want a
coercion `&Vec<T>` to `&[T]` now:

```rust
fn use_slice(t: &[u8]) { ... }

let v = vec!(0u8, 1, 2);
use_slice(&v) // automatically coerce here
use_slice(v.as_slice()) // equivalent
```

Unfortunately, adding such a coercion requires choosing between the following:

* Tie the coercion to `Vec` and `String`. This would reintroduce special
treatment of these otherwise purely library types, and would mean that other
library types that support slicing would not benefit (defeating some of the
purpose of DST).

* Make the coercion extensible, via a trait. This is opening pandora's box,
however: the mechanism could likely be (ab)used to run arbitrary code during
coercion, so that any invocation `foo(a, b, c)` might involve running code to
pre-process each of the arguments. While we may eventually want such
user-extensible coercions, it is a *big* step to take with a lot of potential
downside when reasoning about code, so we should pursue more conservative
solutions first.

## Deref

Another possibility would be to make `String` implement `Deref<str>` and
`Vec<T>` implement `Deref<[T]>`, once DST lands. Doing so would allow explicit
coercions like:

```rust
fn use_slice(t: &[u8]) { ... }

let v = vec!(0u8, 1, 2);
use_slice(&*v) // take advantage of deref
use_slice(v.as_slice()) // equivalent
```

There are at least two downsides to doing so, however:

* It is not clear how the method resolution rules will ultimately interact with
`Deref`. In particular, a leading proposal is that for a smart pointer `s: Smaht<T>`
when you invoke `s.m(...)` only *inherent* methods `m` are considered for
`Smaht<T>`; *trait* methods are only considered for the maximally-derefed
value `*s`.

With such a resolution strategy, implementing `Deref` for `Vec` would make it
impossible to use trait methods on the `Vec` type except through UFCS,
severely limiting the ability of programmers to usefully implement new traits
for `Vec`.

* The idea of `Vec` as a smart pointer around a slice, and the use of `&*v` as
above, is somewhat counterintuitive, especially for such a basic type.

Ultimately, notation for slicing seems desireable on its own merits anyway, and
if it can eliminate the need to implement `Deref` for `Vec` and `String`, all
the better.

0 comments on commit 7e4046e

Please sign in to comment.