Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coherence: terminology, rationale, alternatives considered #624

Merged
merged 25 commits into from
Dec 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8bf642c
Good start
josh11b Jul 2, 2021
3f50459
Name overlap rule
josh11b Jul 4, 2021
ed24dfa
Checkpoint progress.
josh11b Jul 9, 2021
80f5448
Checkpoint progress.
josh11b Jul 9, 2021
f98822b
Another problem with dynamic impl binding
josh11b Jul 12, 2021
83d0939
Add reference
josh11b Jul 12, 2021
1271b6e
Good start
josh11b Jul 2, 2021
4046b24
Name overlap rule
josh11b Jul 4, 2021
dc9e09b
Checkpoint progress.
josh11b Jul 9, 2021
dc4666a
Checkpoint progress.
josh11b Jul 9, 2021
26f6a70
Another problem with dynamic impl binding
josh11b Jul 12, 2021
6506357
Add reference
josh11b Jul 12, 2021
3cd03ed
Checkpoint progress.
josh11b Jul 12, 2021
6de0aee
Merge remote-tracking branch 'origin/coherence' into coherence
josh11b Jul 13, 2021
53f0806
Merge remote-tracking branch 'upstream/trunk' into coherence
josh11b Jul 14, 2021
5c78cdb
TODO to fix link to Google doc
josh11b Jul 15, 2021
9c5df8d
Apply suggestions from code review
josh11b Jul 19, 2021
5182e05
Make all incoherence choices one rejected alternative
josh11b Jul 27, 2021
f16c6f8
Updates re: context sensistivity
josh11b Aug 3, 2021
683b4cd
Add Swift alternative
josh11b Aug 12, 2021
19d7e87
Apply suggestions from code review
josh11b Dec 1, 2021
147c6e5
Merge remote-tracking branch 'upstream/trunk' into coherence
josh11b Dec 2, 2021
d47dd80
Add link to Rust orphan rule concerns
josh11b Dec 6, 2021
ba53310
Add suggested caveat of dynamic dispatch
josh11b Dec 7, 2021
61cffe7
Merge remote-tracking branch 'upstream/trunk' into coherence
josh11b Dec 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions docs/design/generics/appendix-coherence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Carbon: alternatives to coherence

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

This document explains the rationale for choosing to make
[implementation coherence](terminology.md#coherence)
[a goal for Carbon](goals.md#coherence), and the alternatives considered.

<!-- toc -->

## Table of contents

- [Approach taken: coherence](#approach-taken-coherence)
- [The "Hashtable Problem"](#the-hashtable-problem)
- [Rejected alternative: no orphan rule](#rejected-alternative-no-orphan-rule)
- [Rejected alternative: incoherence](#rejected-alternative-incoherence)
- [Incoherence means context sensitivity](#incoherence-means-context-sensitivity)
- [Rejected variation: dynamic implementation binding](#rejected-variation-dynamic-implementation-binding)
- [Rejected variation: manual conflict resolution](#rejected-variation-manual-conflict-resolution)

<!-- tocstop -->

## Approach taken: coherence

The main thing to understand is that coherence is a desirable property, but to
get that property we need an orphan rule, and that rule has a cost. It in
particular limits how much control users of a type have over how that type
implements interfaces. There are a few main problematic use cases to consider:

- Selecting between multiple implementations of an interface for a type. For
example selecting the implementation of the `Comparable` interface for a
`Song` type to support "by title", "by artist", and "by album" orderings.
- Implementing an interface for a type when there is no relationship between
the libraries defining the interface and the type.
- When the implementation of an interface for a type uses an associated type
that can't be referenced from the file or files where the implementation is
allowed to be defined.

These last two cases are highlighted as concerns in Rust in
[Rust RFC #1856: orphan rules are stricter than we would like](https://github.com/rust-lang/rfcs/issues/1856).

Since Carbon is bundling interface implementations into types, for the
convenience and expressiveness that provides, we satisfy those use cases by
giving the user control over the type of a value. This means having facilities
for defining new [compatible types](terminology#compatible-types) with different
interface implementations, and casting between those types as needed.

## The "Hashtable Problem"

The "Hashtable problem" is that the specific hash function used to compute the
hash of keys in a hashtable must be the same when adding an entry, when looking
it up, and other operations like resizing. So a hashtable type is dependent on
both the key type, and the key type's implementation of the `Hashable`
interface. If the key type can have more than one implementation of `Hashable`,
there needs to be some mechanism for choosing a single one to be used
consistently by the hashtable type, or the invariants of the type will be
violated.

Without the orphan rule to enforce coherence, we might have a situation like
this:

- Package `Container` defines a `HashSet` type.

```
package Container;
struct HashSet(Key:! Hashable) { ... }
```

- A `Song` type is defined in package `SongLib`.
- Package `SongHashArtistAndTitle` defines an implementation of `Hashable` for
`SongLib.Song`.

```
package SongHashArtistAndTitle;
import SongLib;
impl SongLib.Song as Hashable {
fn Hash[me: Self]() -> u64 { ... }
}
```

- Package `SongUtil` uses the `Hashable` implementation from
`SongHashArtistAndTitle` to define a function `IsInHashSet`.

```
package SongUtil;
import SongLib;
import SongHashArtistAndTitle;
import Containers;

fn IsInHashSet(
s: SongLib.Song,
h: Containers.HashSet(SongLib.Song)*) -> bool {
return h->Contains(s);
}
```

- Package `SongHashAppleMusicURL` defines a different implementation of
`Hashable` for `SongLib.Song` than package `SongHashArtistAndTitle`.

```
package SongHashAppleMusicURL;
import SongLib;
impl SongLib.Song as Hashable {
fn Hash[me: Self]() -> u64 { ... }
}
```

- Finally, package `Trouble` imports `SongHashAppleMusicURL`, creates a hash
set, and then calls the `IsInHashSet` function from package `SongUtil`.

```
package Trouble;
import SongLib;
import SongHashAppleMusicURL;
import Containers;
import SongUtil;

fn SomethingWeirdHappens() {
var unchained_melody: SongLib.Song = ...;
var song_set: auto = Containers.HashSet(SongLib.Song).Create();
song_set.Add(unchained_melody);
// Either this is a compile error or does something unexpected.
if (SongUtil.IsInHashSet(unchained_melody, &song_set)) {
Print("This is expected, but doesn't happen.");
} else {
Print("This is what happens even though it is unexpected.");
}
}
```

The issue is that in package `Trouble`, the `song_set` is created in a context
where `SongLib.Song` has a `Hashable` implementation from
`SongHashAppleMusicURL`, and stores `unchained_melody` under that hash value.
When we go to look up the same song in `SongUtil.IsInHashSet`, it uses the hash
function from `SongHashArtistAndTitle` which returns a different hash value for
`unchained_melody`, and so reports the song is missing.

**Background:** [This post](https://gist.github.com/nikomatsakis/1421744)
discusses the hashtable problem in the context of Haskell, and
[this 2011 Rust followup](https://mail.mozilla.org/pipermail/rust-dev/2011-December/001036.html)
discusses how to detect problems at compile time.

## Rejected alternative: no orphan rule

In Swift an implementation of an interface, or a "protocol" as it is called in
Swift, can be provided in any module. As long as any module provides an
implementation, that implementation is
[used globally throughout the program](https://stackoverflow.com/questions/48762971/swift-protocol-conformance-by-extension-between-frameworks).

In Swift, since some protocol implementations can come from the runtime
environment provided by the operating system, multiple implementations for a
protocol can arise as a runtime warning. When this happens, Swift picks one
implementation arbitrarily.

In Carbon, we could make this a build time error. However, there would be
nothing preventing two independent libraries from providing conflicting
implementations. Furthermore, the error would only be diagnosed at link time.

## Rejected alternative: incoherence

### Incoherence means context sensitivity

The undesirable result of incoherence is that the interpretation of source code
changes based on imports. In particular, imagine there is a function call that
depends on a type implementing an interface, and two different implementations
are defined in two different libraries. A call to that function will be treated
differently depending on which of those two libraries are imported:

- If neither is imported, it is an error.
- If both are imported, it is ambiguous.
- If only one is imported, you get totally different code executed depending
on which it is.

Furthermore, this means that the behavior of a file can depend on an import even
if nothing from that package is referenced explicitly. In general, Carbon is
[avoiding this sort of context sensitivity](/docs/project/principles/low_context_sensitivity.md).
This context sensitivity would make moving code between files when refactoring
more difficult and less safe.

### Rejected variation: dynamic implementation binding

One possible approach would be to bind interface implementations to a value at
the point it was created. In [the example above](#the-hashtable-problem), the
implementation of the `Hashable` interface for `Song` would be fixed for the
`song_set` `HashSet` object based on which implementation was in scope in the
body of the `SomethingWeirdHappens` function.

This idea is discussed briefly in section 5.4 on separate compilation of WG21
proposal n1848 for implementing "Indiana" C++0x concepts
([1](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.9526&rep=rep1&type=pdf),
and [2](https://wg21.link/n1848)).

This has some downsides:

- It is harder to reason about. The behavior of `SongUtil.IsInHashSet` depends
on the dynamic behavior of the program. At the time of the call, we may have
no idea where the `HashSet` argument was created.
- An object may be created far from a call that has a particular interface
requirement, with no guarantee that the object was created with any
implementation of the interface at all. This error would only be detected at
runtime, not at type checking time.
- It requires more data space at runtime because we need to store a pointer to
the witness table representing the implementation with the object, since it
varies instead of being known statically.
- It is slower to execute from dynamic dispatch and the inability to inline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and dynamic dispatch may not even be feasible in all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases are you thinking of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this things like we need to know the size of the type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, things like that to do with how we would generate code. I had in mind cases where, for example, a method in the interface returns an associated type, and we don't know the calling convention of the function without knowing some details about the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that example.

- In some cases it may not be feasible to use dynamic dispatch. For example,
if an interface method returns an associated type, we might not know the
calling convention of the function without knowing some details about the
type.

As a result, this doesn't make sense as the default behavior for Carbon based on
its [goals](/project/goals.md). That being said, this could be a feature added
later as opt-in behavior to either allow users to reduce code size or support
use cases that require dynamic dispatch.

### Rejected variation: manual conflict resolution

Carbon could alternatively provide some kind of manual disambiguation syntax to
resolve problems where they arise. The problems with this approach have been
[considered in the context of Rust](https://github.com/Ixrec/rust-orphan-rules#whats-wrong-with-incoherence).

A specific example of this approach is called
[scoped conformance](https://forums.swift.org/t/scoped-conformances/37159),
where the conflict resolution is based on limiting the visibility of
implementations to particular scopes. This hasn't been implemented, but it has
the drawbacks described above. Depending on the details of the implementation,
either:

- there are incompatible values with types that have the same name, or
- it is difficult to reason about the program's behavior because it behaves
like
[dynamic implementation binding](#rejected-alternative-dynamic-implementation-binding)
(though perhaps with a monomorphization cost instead of a runtime cost).
19 changes: 9 additions & 10 deletions docs/design/generics/goals.md
Original file line number Diff line number Diff line change
Expand Up @@ -439,12 +439,11 @@ will necessarily be less incremental.

### Coherence

We want the generics system to have the _coherence_ property. This means that
there is a single answer to the question "what is the implementation of this
interface for this type, if any?" independent of context, such as the libraries
imported into a given file. Since a generic function only depends on interface
implementations, they will always behave consistently on a given type,
independent of context. For more on this, see
We want the generics system to have the
[_coherence_ property](terminology.md#coherence), so that the implementation of
an interface for a type is well defined. Since a generic function only depends
on interface implementations, they will always behave consistently on a given
type, independent of context. For more on this, see
[this description of what coherence is and why Rust enforces it](https://github.com/Ixrec/rust-orphan-rules#what-is-coherence).

Coherence greatly simplifies the language design, since it reduces the need for
Expand All @@ -460,8 +459,8 @@ It also has a number of benefits for users:
Carbon template on that type.

The main downside of coherence is that there are some capabilities we would like
for interfaces which are in tension with the coherence property. For example, we
would like to address
for interfaces that are in tension with having an orphan rule limiting where
implementations may be defined. For example, we would like to address
[the expression problem](https://eli.thegreenplace.net/2016/the-expression-problem-and-its-solutions#another-clojure-solution-using-protocols).
We can get some of the way there by allowing the implementation of an interface
for a type to be defined with either the interface or the type. But some use
Expand All @@ -487,8 +486,8 @@ approaches that could work:
interface implementations. This is the approach used by Rust
([1](https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#using-the-newtype-pattern-to-implement-external-traits-on-external-types),
[2](https://github.com/Ixrec/rust-orphan-rules#user-content-why-are-the-orphan-rules-controversial)).
- Carbon could support
[scoped conformances](https://forums.swift.org/t/scoped-conformances/37159).

Alternatives to coherence are discussed in [an appendix](appendix-coherence.md).

### No novel name lookup

Expand Down
26 changes: 24 additions & 2 deletions docs/design/generics/terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Qualified and unqualified member names](#qualified-and-unqualified-member-names)
- [Compatible types](#compatible-types)
- [Subtyping and casting](#subtyping-and-casting)
- [Coherence](#coherence)
- [Adapting a type](#adapting-a-type)
- [Type erasure](#type-erasure)
- [Facet type](#facet-type)
Expand Down Expand Up @@ -438,6 +439,26 @@ required, an [implicit conversion](../expressions/implicit_conversions.md) is
performed if it is considered safe to do so. Such an implicit conversion, if
permitted, always has the same meaning as an explicit cast.

## Coherence

A generics system has the _implementation coherence_ property, or simply
_coherence_, if there is a single answer to the question "what is the
implementation of this interface for this type, if any?" independent of context,
such as the libraries imported into a given file.

This is typically enforced by making sure the definition of the implementation
must be imported if you import both the interface and the type. This may be done
by requiring the implementation to be in the same library as the interface or
type. This is called an _orphan rule_, meaning we don't allow an implementation
that is not with either of its parents (parent type or parent interface).

Note that in addition to an orphan rule that implementations are visible when
queried, coherence also requires a rule for resolving what happens if there are
multiple non-orphan implementations. In Rust, this is called the
[overlap rule or overlap check](https://rust-lang.github.io/chalk/book/clauses/coherence.html#chalk-overlap-check).
This could be just producing an error in that situation, or picking one using
some specialization rule.

## Adapting a type

A type can be adapted by creating a new type that is
Expand All @@ -452,11 +473,12 @@ between those two types without any dynamic checks or danger of
[object slicing](https://en.wikipedia.org/wiki/Object_slicing).

This is called "newtype" in Rust, and is used for capturing additional
information in types to improve type safety of move some checking to compile
information in types to improve type safety by moving some checking to compile
time ([1](https://doc.rust-lang.org/rust-by-example/generics/new_types.html),
[2](https://doc.rust-lang.org/book/ch19-04-advanced-types.html#using-the-newtype-pattern-for-type-safety-and-abstraction),
[3](https://www.worthe-it.co.za/blog/2020-10-31-newtype-pattern-in-rust.html))
and as a workaround for Rust's orphan rules for coherence.
and as a workaround for
[Rust's orphan rules for coherence](https://github.com/Ixrec/rust-orphan-rules#why-are-the-orphan-rules-controversial).

## Type erasure

Expand Down