Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add pub fn identity<T>(x: T) -> T { x } to core::convert #2306

Merged
merged 7 commits into from
Aug 19, 2018
201 changes: 201 additions & 0 deletions text/0000-convert-id.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
- Feature Name: convert_id
- Start Date: 2018-01-19
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Adds an identity function `pub fn id<T>(x: T) -> T { x }` as `core::convert::id`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it should be called identity, id is really short and ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id has pecedent in other languages like Haskell. I could go either way, but it being short and known as the identity function elsewhere is nice to have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity is a better name in the prelude, if we keep that possibility open. I think we go for not overly abbreviated names of functions in Rust

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also called identity in Java, which is probably more used than Haskell. Though that function may be used less in Java.

Familiarity arguments work if there's uniformity; there isn't, and it feels like plenty of folks won't have ever consciously noticed/used it, so I'd prefer to optimize for folks who don't know what it is. identity is clear and also should be obvious to those used to id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishearth I've provided this motivation for the naming:

Naming the function identity instead of id is a possibility. However, to make the id function more appetizing than using a |x| x, it is preferrable for the identity function to have a shorter but still clear name.

Why is id ambiguous? Are you referring to a potential ambiguity with "identifier"?
And why is this potential ambiguity an overriding concern to the one I've listed?
I think id is clear from context - If there are other functions around in the context that look like T -> T, you know it's not going to be "identifier"...

Yes - I suspect it is used way less in Java than in Haskell. I prefer to optimize for folks who are most likely to use the operation, which are probably on balance more functional programmers.

Copy link
Contributor Author

@Centril Centril Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not find length to be that pressing a concern when it comes to ergonomics.

Just to clarify: it is the relative length I am concerned about, not the absolute. I'd like to encourage people to use id instead of |x| x, but that might be harder with identity... Does that make sense to you?

[..] more short things [..]

Out of curiosity - since you say more short things - which ones are you referring to?

[..] (but I can't articulate why :/ ) [..]

But if you can, that will make your concern more actionable for me =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point still stands. identity is more ergonomic than the closure even if it is longer, by virtue of being just an identifier. It also can be used in more contexts because it has the same type as other functions (almost).

I do not think that I'd being shorter than the closure has any benefit. We are rarely trying to optimize for less typing. See also

We already have short functions like drop in the prelude. drop is fine and well known, but adding more feels iffy to me. I feel this is going to clash with locals quite often. Clashes in the prelude are fine, but make things murkier because someone reading the code would expect it to be a local.

(Actually, for that matter there's a general argument to be made against adding functions to the prelude -- there are very few and it's going to be surprising for folks to see a different function that doesn't seem to be imported anywhere. Unless it becomes ubiquitous. Which I suspect it won't.)

Copy link
Contributor Author

@Centril Centril Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'll change it to identity then by popular demand. What tipped me over was this note by @nox:

88,618 lines with 'id' in Servo.

I think it is necessary and essential that this be in the prelude however and there's certainly precedent for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How embarrassing.

For the record, I forgot to filter the grep command to only *.rs files and thus it also included our local clone of web-platform-tests, a massive repository of Web tests running in browsers. There are actually only 1,533 lines with the word id in Servo.

That's still a huge number to me though, but nowhere near the one I mistakenly reported at first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the function shall henceforth be known as identity.

The function is also re-exported to `std::convert::id` as well as the prelude of
both libcore and libstd.

# Motivation
[motivation]: #motivation

## The identity function is useful

While it might seem strange to have a function that just returns back the input,
there are some cases where the function is useful.

### Using `id` to do nothing among a collection of mappers

When you have collections such as maps or arrays of mapping functions like
below and you watch to dispatch to those you sometimes need the identity
function as a way of not transforming the input. You can use the identity
function to achieve this.

```rust
// Let's assume that this and other functions do something non-trivial.
fn do_interesting_stuff(x: u32) -> u32 { .. }

// A dispatch-map of mapping functions:
let mut map = HashMap::new();
map.insert("foo", do_interesting_stuff);
map.insert("bar", other_stuff);
map.insert("baz", id);
```

### Using `id` as a no-op function in a conditional

This reasoning also applies to simpler yes/no dispatch as below:

```rust
let mapper = if condition { some_manipulation } else { id };

// do more interesting stuff inbetween..

do_stuff(42);
```

### Using `id` to concatenate an iterator of iterators

Given the law `join = (>>= id)`, we use the identity function to perform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence needlessly complicates a simple operation; I don't see any use of including the monad reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation .flat_map(id) is equivalent to join and may have some use for Haskllers who are reading. I can make the used syntax more Rusty tho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, don't mention monad at all. Rust doesn't typically use that term. There is no reason to include it here. You're writing for Rustaceans, not Haskellers, just say "flatten" and "flat_map"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to choose my battles here =) I tried to come up with a compromise solution... But I don't like the "don't mention the M word" attitude in general.. Rustaceans say "and_then", "flat_map", etc... but they may also say bind, "monad", and "join" at times since Rustaceans are a diverse group and some of them come from Haskell, Scala, F# and other such places. Also, I don't think that shying away from the word Monad helps learnability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's the point. You aren't optimizing for Haskellers Rustaceans by using "monad". Haskeller Rustaceans know the Rust terminology perfectly well. You only end up disoptimizing for non-FP Rustaceans.

Rustaceans are a diverse group, but there's an option here that is adequate for all Rustaceans.

This is literally the kind of thing that made folks confused by the pi types RFC. Now, this is at a much smaller scale, but I'm trying to stomp out the attitude of using unfamiliar terminology where not necessary in RFCs as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a fundamental disagreement here.

Haskeller Rustaceans know the Rust terminology perfectly well.

Haskeller Rustaceans might not know that particular part of Rust... And nonetheless, there might be non-Rustacean functional programmers that might be interested in reading our RFCs or other documentation at some point. Besides, the current text of the RFC explains it in a way that is adequate for all Rustaceans since it includes the following language as well:

In other words we are concatenating an iterator of iterators into a single iterator,

However, in an effort to not make this RFC block on this disagreement I'll agree to remove the language from the RFC.

a monadic join on iterators in this example.

```rust
let vec_vec = vec![vec![1, 3, 4], vec![5, 6]];
let iter_iter = vec_vec.into_iter().map(Vec::into_iter);
let concatenated = iter_iter.flat_map(id).collect::<Vec<_>>();
assert_eq!(vec![1, 3, 4, 5, 6], concatenated);
```

### Using `id` to keep the `Some` variants of an iterator of `Option<T>`

We can keep all the maybe variants by simply `iter.filter_map(id)`.

```rust
let iter = vec![Some(1), None, Some(3)].into_iter();
let filtered = iter.filter_map(id).collect::<Vec<_>>();
assert_eq!(vec![1, 3], filtered);
```

### To be clear that you intended to use an identity conversion

If you instead use a closure as in `|x| x` when you need an
identity conversion, it is less clear that this was intentional.
With `id`, this intent becomes clearer.

## The `drop` function as a precedent

The `drop` function in `core::mem` is defined as `pub fn drop<T>(_x: T) { }`.
The same effect can be achieved by writing `{ _x; }`. This presents us
with a precendent that such trivial functions are considered useful and
includable inside the standard library even tho they can be written easily
Centril marked this conversation as resolved.
Show resolved Hide resolved
inside a user's crate.

## Avoiding repetition in user crates

Here are a few examples of the identity function being defined and used:

+ https://docs.rs/functils/0.0.2/functils/fn.identity.html
+ https://docs.rs/tool/0.2.0/tool/fn.id.html
+ https://github.com/hephex/api/blob/ef67b209cd88d0af40af10b4a9f3e0e61a5924da/src/lib.rs

There's a smattering of more examples. To reduce duplication, it
should be provided in the standard library as a common place it is defined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this argument is persuasive for functions. There's a case for things like traits and structs having a common definition, since they're nominal and thus the commonality can improve interoperability (not that that place needs to be std even there). But that's not the case for functions, where it doesn't matter if you used the functils or tools one.


## Precedent from other languages

There are other languages that include an identity function in
their standard libraries, among these are:

+ [Haskell](http://hackage.haskell.org/package/base-4.10.1.0/docs/Prelude.html#v:id), which also exports this to the prelude.
+ [Scala](https://www.scala-lang.org/api/current/scala/Predef$.html#identity[A](x:A):A), which also exports this to the prelude.
+ [Java](https://docs.oracle.com/javase/8/docs/api/java/util/function/Function.html#identity--), which is a widely used language.
+ [Idris](https://www.idris-lang.org/docs/1.0/prelude_doc/docs/Prelude.Basics.html), which also exports this to the prelude.
+ [Ruby](http://ruby-doc.org/core-2.5.0/Object.html#method-i-itself), which exports it to what amounts to the top type.
+ [Racket](http://docs.racket-lang.org/reference/values.html)
+ [Julia](https://docs.julialang.org/en/release-0.4/stdlib/base/#Base.identity)
+ [R](https://stat.ethz.ch/R-manual/R-devel/library/base/html/identity.html)
+ [F#](https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/operators.id%5B%27t%5D-function-%5Bfsharp%5D)
+ [Clojure](https://clojuredocs.org/clojure.core/identity)
+ [Agda](http://www.cse.chalmers.se/~nad/repos/lib/src/Function.agda)
+ [Elm](http://package.elm-lang.org/packages/elm-lang/core/latest/Basics#identity)

## The case for inclusion in the prelude

Let's compare the effort required, assuming that each letter
typed has a uniform cost wrt. effort.

```rust
use std::convert::id; iter.filter_map(id)

fn id<T>(x: T) -> T { x } iter.filter_map(id)

iter.filter_map(::std::convert::id)

iter.filter_map(id)
```

Comparing the length of these lines, we see that there's not much difference in
length when defining the function yourself or when importing or using an absolute
path. But the prelude-using variant is considerably shorter. To encourage the
use of the function, exporting to the prelude is therefore a good idea.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the prelude-using variant is considerably shorter.

This is true for literally every possible library addition, so I don't think its a good argument.


In addition, there's an argument to be made from similarity to other things in
`core::convert` as well as `drop` all of which are in the prelude. This is
especially relevant in the case of `drop` which is also a trivial function.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

An identity function is a mapping of one type onto itself such that the output
is the same as the input. In other words, a function `id : T -> T` for some
type `T` defined as `id(x) = x`. This RFC adds such a function for all types
in Rust into libcore at the module `core::convert` and defines it as:

```rust
pub fn id<T>(x: T) -> T { x }
```

This function is also re-exported to `std::convert::id` as well as
the prelude of both libcore and libstd.

It is important to note that the input `x` passed to the function is
moved since Rust uses move semantics by default.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

An identity function defined as `pub fn id<T>(x: T) -> T { x }` exists in
`core::convert::id`. The function is also re-exported to `std::convert::id`
as well as the prelude of both libcore and libstd.

Note that the identity function is not always equivalent to a closure
such as `|x| x` since the closure may coerce `x` into a different type
while the identity function never changes the type.

# Drawbacks
[drawbacks]: #drawbacks

It is already possible to do this in user code by:

+ using an identity closure: `|x| x`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific advantages do you see of identity over |x| x? identity is longer, and I think its meaning would be less obvious to most users, as they'd then have to go looking for the definition of a function called identity. I suppose there are potential codegen advantages, but those are small at best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A previous version of the RFC used id, but it was changed because id is ambiguous and for those who haven't heard of identity functions very non obvious (it's easy to guess what it means if you see "identity", but not "id")

I assert that length is not the defining factor of ergonomics. identity is still easier to read and type over the closure, but I agree that the closure just easier to grok for those who haven't seen this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identity is still easier to read and type

I guess that's what I'm disputing, but it's totally a matter of opinion. I'd expect people familiar with Rust's syntax to quickly grok |x| x, while identity could require looking around for a function called identity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak for everyone, but I always double take when I see |x| x and the way I grok it is by going "oh right it's just the identity function"

Copy link
Contributor Author

@Centril Centril Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cramertj

[..] I think its meaning would be less obvious to most users [..]

What users are you referring to specifically here? Current Rust users or all programmers in general? I think I believe you are correct in either case, but still - I have no evidence to offer myself that your assertion is correct, which is less than satisfactory.

What specific advantages do you see of identity over |x| x? identity is longer [..]

Now that the length "advantage" (if we believe it is that) of id has been removed with the renaming to identity, I can only offer these thoughts:

  • A lot of functional languages and others have this function, so functional programmers may expect the function to exist while they may be new to the language and don't understand what |x| x means without reading about it. Consistency with those (quite many) languages is an argument on its own.

  • Understanding identity for those that don't already understand (I think functional programmers will) it is a one time cost - this can also be said of closure syntax or any other identifier in the standard library, you have to learn it and that too is a one time setup cost. If you previously didn't know about identity, reading about it can also be a useful experience learning-wise.

  • I believe that using identity is more clearly showing that the identity-conversion was intentional compared to |x| x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Centril Thanks for explaining! Those arguments seem reasonable to me. I personally still think that identity is less clear, but I'll leave it to others to voice their opinions one way or another, and I won't object to the feature if general consensus is that it would be beneficial.

+ writing the identity function as defined in the RFC yourself.

These are contrasted with the [motivation] for including the function
in the standard library.

# Rationale and alternatives
[alternatives]: #alternatives

The rationale for including this in `convert` and not `mem` is that the
former generally deals with conversions, and identity conversion" is a used
phrase. Meanwhile, `mem` does not relate to `identity` other than that both
deal with move semantics. Therefore, `convert` is the better choice. Including
it in `mem` is still an alternative, but as explained, it isn't fitting.

The rationale for including this in the prelude has been previously
explained in the [motivation] section. It is an alternative to not do that.
If the function is not in the prelude, the utility is so low that it may
be a better idea to not add the function at all.

Naming the function `identity` instead of `id` is a possibility.
However, to make the `id` function more appetizing than using a `|x| x`, it is
preferrable for the identity function to have a shorter but still clear name.

# Unresolved questions
[unresolved]: #unresolved-questions

There are no unresolved questions.