diff --git a/docs/design/generics/details.md b/docs/design/generics/details.md index d985fc52d6e8d..3601b53d98622 100644 --- a/docs/design/generics/details.md +++ b/docs/design/generics/details.md @@ -21,12 +21,12 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Model](#model) - [Interfaces recap](#interfaces-recap) - [Type-of-types and facet types](#type-of-types-and-facet-types) -- [Structural interfaces](#structural-interfaces) +- [Named constraints](#named-constraints) - [Subtyping between type-of-types](#subtyping-between-type-of-types) - [Combining interfaces by anding type-of-types](#combining-interfaces-by-anding-type-of-types) - [Interface requiring other interfaces](#interface-requiring-other-interfaces) - [Interface extension](#interface-extension) - - [`extends` and `impl` with structural interfaces](#extends-and-impl-with-structural-interfaces) + - [`extends` and `impl` with named constraints](#extends-and-impl-with-named-constraints) - [Diamond dependency issue](#diamond-dependency-issue) - [Use case: overload resolution](#use-case-overload-resolution) - [Type compatibility](#type-compatibility) @@ -42,14 +42,40 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Model](#model-1) - [Parameterized interfaces](#parameterized-interfaces) - [Impl lookup](#impl-lookup) - - [Parameterized structural interfaces](#parameterized-structural-interfaces) + - [Parameterized named constraints](#parameterized-named-constraints) +- [Where constraints](#where-constraints) + - [Constraint use cases](#constraint-use-cases) + - [Set an associated constant to a specific value](#set-an-associated-constant-to-a-specific-value) + - [Same type constraints](#same-type-constraints) + - [Set an associated type to a specific value](#set-an-associated-type-to-a-specific-value) + - [Equal generic types](#equal-generic-types) + - [Satisfying both type-of-types](#satisfying-both-type-of-types) + - [Type bound for associated type](#type-bound-for-associated-type) + - [Type bounds on associated types in declarations](#type-bounds-on-associated-types-in-declarations) + - [Type bounds on associated types in interfaces](#type-bounds-on-associated-types-in-interfaces) + - [Combining constraints](#combining-constraints) + - [Recursive constraints](#recursive-constraints) + - [Parameterized type implements interface](#parameterized-type-implements-interface) + - [Another type implements parameterized interface](#another-type-implements-parameterized-interface) + - [Implied constraints](#implied-constraints) + - [Must be legal type argument constraints](#must-be-legal-type-argument-constraints) + - [Open question: referencing names in the interface being defined](#open-question-referencing-names-in-the-interface-being-defined) + - [Manual type equality](#manual-type-equality) + - [`observe` declarations](#observe-declarations) +- [Other constraints as type-of-types](#other-constraints-as-type-of-types) + - [Is a derived class](#is-a-derived-class) + - [Type compatible with another type](#type-compatible-with-another-type) + - [Same implementation restriction](#same-implementation-restriction) + - [Example: Multiple implementations of the same interface](#example-multiple-implementations-of-the-same-interface) + - [Example: Creating an impl out of other impls](#example-creating-an-impl-out-of-other-impls) + - [Type facet of another type](#type-facet-of-another-type) + - [Sized types and type-of-types](#sized-types-and-type-of-types) + - [Model](#model-2) + - [`TypeId`](#typeid) - [Future work](#future-work) - - [Constraints](#constraints) - [Conditional conformance](#conditional-conformance) - [Parameterized impls](#parameterized-impls) - [Lookup resolution and specialization](#lookup-resolution-and-specialization) - - [Other constraints as type-of-types](#other-constraints-as-type-of-types) - - [Sized types and type-of-types](#sized-types-and-type-of-types) - [Dynamic types](#dynamic-types) - [Runtime type parameters](#runtime-type-parameters) - [Runtime type fields](#runtime-type-fields) @@ -66,6 +92,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Generic type specialization](#generic-type-specialization) - [Bridge for C++ customization points](#bridge-for-c-customization-points) - [Variadic arguments](#variadic-arguments) + - [Range constraints on generic integers](#range-constraints-on-generic-integers) - [References](#references) @@ -748,23 +775,22 @@ type-of-type. This general structure of type-of-types holds not just for interfaces, but others described in the rest of this document. -## Structural interfaces +## Named constraints -If the nominal interfaces discussed above are the building blocks for -type-of-types, [structural interfaces](terminology.md#structural-interfaces) -describe how they may be composed together. Unlike nominal interfaces, the name -of a structural interface is not a part of its value. Two different structural -interfaces with the same definition are equivalent even if they have different -names. This is because types don't explicitly specify which structural -interfaces they implement, types automatically implement any structural -interfaces they can satisfy. +If the interfaces discussed above are the building blocks for type-of-types, +[generic named constraints](terminology.md#named-constraints) describe how they +may be composed together. Unlike interfaces which are nominal, the name of a +named constraint is not a part of its value. Two different named constraints +with the same definition are equivalent even if they have different names. This +is because types don't explicitly specify which named constraints they +implement, types automatically implement any named constraints they can satisfy. -A structural interface definition can contain interface requirements using -`impl` declarations and names using `alias` declarations. Note that this allows -us to declare the aspects of a type-of-type directly. +A named constraint definition can contain interface requirements using `impl` +declarations and names using `alias` declarations. Note that this allows us to +declare the aspects of a type-of-type directly. ``` -structural interface VectorLegoFish { +constraint VectorLegoFish { // Interface implementation requirements impl as Vector; impl as LegoFish; @@ -775,12 +801,12 @@ structural interface VectorLegoFish { } ``` -We don't expect users do directly define many structural interfaces, but other +We don't expect developers to directly define many named constraints, but other constructs we do expect them to use will be defined in terms of them. For example, we can define the Carbon builtin `Type` as: ``` -structural interface Type { } +constraint Type { } ``` That is, `Type` is the type-of-type with no requirements (so matches every @@ -805,11 +831,21 @@ type, and this function will be instantiated separately for every different type." This is consistent with the [use of `auto` in the C++20 Abbreviated function template feature](https://en.cppreference.com/w/cpp/language/function_template#Abbreviated_function_template). -In general we should support the same kinds of declarations in a -`structural interface` definitions as in an `interface`. Generally speaking -declarations in one kind of interface make sense in the other, and there is an -analogy between them. If an `interface` `I` has (non-`alias`) declarations `X`, -`Y`, and `Z`, like so: +In general, the declarations in `constraint` definition match a subset of the +declarations in an `interface`. Named constraints used with generics, as opposed +to templates, should only include required interfaces and aliases to named +members of those interfaces. + +To declare a named constraint that includes other declarations for use with +template parameters, use the `template` keyword before `constraint`. Method, +associated type, and associated function requirements may only be declared +inside a `template constraint`. Note that a generic constraint matches all +facets of a type if it matches any, but a template constraint can depend on the +specific names of members used in a particular facet. + +There is an analogy between declarations used in a `constraint` and in an +`interface` definition. If an `interface` `I` has (non-`alias`) declarations +`X`, `Y`, and `Z`, like so: ``` interface I { @@ -819,8 +855,6 @@ interface I { } ``` -(Here, `X` could be something like `fn F[me: Self]()`.) - Then a type implementing `I` would have `impl as I` with definitions for `X`, `Y`, and `Z`, as in: @@ -835,10 +869,11 @@ class ImplementsI { } ``` -But the corresponding `structural interface`, `S`: +But the corresponding `constraint` or `template constraint`, `S`: ``` -structural interface S { +// or template constraint S { +constraint S { X; Y; Z; @@ -856,6 +891,9 @@ class ImplementsS { } ``` +**TODO:** Move the `template constraint` and `auto` content to the template +design document, once it exists. + ### Subtyping between type-of-types There is a subtyping relationship between type-of-types that allows you to call @@ -872,11 +910,11 @@ implicitly converted to `T as I2`. For example: interface Printable { fn Print[me: Self](); } interface Renderable { fn Draw[me: Self](); } -structural interface PrintAndRender { +constraint PrintAndRender { impl as Printable; impl as Renderable; } -structural interface JustPrint { +constraint JustPrint { impl as Printable; } @@ -912,7 +950,7 @@ interface Renderable { } // `Printable & Renderable` is syntactic sugar for this type-of-type: -structural interface { +constraint { impl as Printable; impl as Renderable; alias Print = Printable.Print; @@ -954,7 +992,7 @@ interface EndOfGame { fn Winner[me: Self](player: i32); } // `Renderable & EndOfGame` is syntactic sugar for this type-of-type: -structural interface { +constraint { impl as Renderable; impl as EndOfGame; alias Center = Renderable.Center; @@ -966,11 +1004,11 @@ structural interface { ``` Conflicts can be resolved at the call site using -[the qualified name syntax](#qualified-member-names), or by defining a -structural interface explicitly and renaming the methods: +[the qualified name syntax](#qualified-member-names), or by defining a named +constraint explicitly and renaming the methods: ``` -structural interface RenderableAndEndOfGame { +constraint RenderableAndEndOfGame { impl as Renderable; impl as EndOfGame; alias Center = Renderable.Center; @@ -1013,14 +1051,14 @@ and `B [&] A` has the names of `B`. ``` // `Printable [&] Renderable` is syntactic sugar for this type-of-type: -structural interface { +constraint { impl as Printable; impl as Renderable; alias Print = Printable.Print; } // `Renderable [&] EndOfGame` is syntactic sugar for this type-of-type: -structural interface { +constraint { impl as Renderable; impl as EndOfGame; alias Center = Renderable.Center; @@ -1056,7 +1094,7 @@ requires all containers to also satisfy the requirements of `Swappable`. This is already a capability for [type-of-types in general](#type-of-types-and-facet-types). For consistency we will use the same semantics and syntax as we do for -[structural interfaces](#structural-interfaces): +[named constraints](#named-constraints): ``` interface Equatable { fn Equals[me: Self](that: Self) -> bool; } @@ -1082,9 +1120,9 @@ var x: Iota; DoAdvanceAndEquals(x); ``` -Like with structural interfaces, an interface implementation requirement doesn't -by itself add any names to the interface, but again those can be added with -`alias` declarations: +Like with named constraints, an interface implementation requirement doesn't by +itself add any names to the interface, but again those can be added with `alias` +declarations: ``` interface Hashable { @@ -1198,10 +1236,10 @@ interface PreferredConversion { } ``` -#### `extends` and `impl` with structural interfaces +#### `extends` and `impl` with named constraints The `extends` declaration makes sense with the same meaning inside a -[`structural interface`](#structural-interfaces), and so is also supported. +[`constraint`](#named-constraints) definition, and so is also supported. ``` interface Media { @@ -1211,7 +1249,7 @@ interface Job { fn Run[me: Self](); } -structural interface Combined { +constraint Combined { extends Media; extends Job; } @@ -1222,7 +1260,7 @@ This definition of `Combined` is equivalent to requiring both the `Media` and ``` // Equivalent -structural interface Combined { +constraint Combined { impl as Media; alias Play = Media.Play; impl as Job; @@ -1231,8 +1269,8 @@ structural interface Combined { ``` Notice how `Combined` has aliases for all the methods in the interfaces it -requires. That condition is sufficient to allow a type to `impl` the structural -interface: +requires. That condition is sufficient to allow a type to `impl` the named +constraint: ``` class Song { @@ -1260,7 +1298,7 @@ This is just like you get an implementation of `Equatable` by implementing `Hashable` when `Hashable` extends `Equatable`. This provides a tool useful for [evolution](#evolution). -Conversely, an `interface` can extend a `structural interface`: +Conversely, an `interface` can extend a `constraint`: ``` interface MovieCodec { @@ -2231,18 +2269,1132 @@ one can unambiguously be picked as most specific. [a goal for Carbon](goals.md#coherence). More detail can be found in [this appendix with the rationale and alternatives considered](appendix-coherence.md). -### Parameterized structural interfaces +### Parameterized named constraints -We should also allow the [structural interface](#structural-interfaces) -construct to support parameters. Parameters would work the same way as for -regular, that is nominal or non-structural, interfaces. +We should also allow the [named constraint](#named-constraints) construct to +support parameters. Parameters would work the same way as for interfaces. -## Future work +## Where constraints + +So far, we have restricted a generic type parameter by saying it has to +implement an interface or a set of interfaces. There are a variety of other +constraints we would like to be able to express, such as applying restrictions +to its associated types and associated constants. This is done using the `where` +operator that adds constraints to a type-of-type. + +The where operator can be applied to a type-of-type in a declaration context: + +``` +// Constraints on function parameters: +fn F[V:! D where ...](v: V) { ... } + +// Constraints on a class parameter: +class S(T:! B where ...) { + // Constraints on a method: + fn G[me: Self, V:! D where ...](v: V); +} + +// Constraints on an interface parameter: +interface A(T:! B where ...) { + // Constraints on an associated type: + let U:! C where ...; + // Constraints on an associated method: + fn G[me: Self, V:! D where ...](v: V); +} +``` + +We also allow you to name constraints using a `where` operator in a `let` or +`constraint` definition. The expressions that can follow the `where` keyword are +described in the ["constraint use cases"](#constraint-use-cases) section, but +generally look like boolean expressions that should evaluate to `true`. + +The result of applying a `where` operator to a type-of-type is another +type-of-type. Note that this expands the kinds of requirements that +type-of-types can have from just interface requirements to also include the +various kinds of constraints discussed later in this section. In addition, it +can introduce relationships between different type variables, such as that a +member of one is equal to the member of another. + +**Comparison with other languages:** Both Swift and Rust use `where` clauses on +declarations instead of in the expression syntax. These happen after the type +that is being constrained has been given a name and use that name to express the +constraint. + +Rust also supports +[directly passing in the values for associated types](https://rust-lang.github.io/rfcs/0195-associated-items.html#constraining-associated-types) +when using a trait as a constraint. This is helpful when specifying concrete +types for all associated types in a trait in order to +[make it object safe so it can be used to define a trait object type](https://rust-lang.github.io/rfcs/0195-associated-items.html#trait-objects). + +Rust is adding trait aliases +([RFC](https://github.com/rust-lang/rfcs/blob/master/text/1733-trait-alias.md), +[tracking issue](https://github.com/rust-lang/rust/issues/41517)) to support +naming some classes of constraints. + +### Constraint use cases + +#### Set an associated constant to a specific value + +We might need to write a function that only works with a specific value of an +[associated constant](#associated-constants) `N`. In this case, the name of the +associated constant is written first, followed by an `=`, and then the value: + +``` +fn PrintPoint2D[PointT:! NSpacePoint where .N = 2](p: PointT) { + Print(p.Get(0), ", ", p.Get(1)); +} +``` + +Similarly in an interface definition: + +``` +interface { + let PointT:! NSpacePoint where .N = 2; +} +``` + +To name such a constraint, you may use a `let` or a `constraint` declaration: + +``` +let Point2DInterface:! auto = NSpacePoint where .N = 2; +constraint Point2DInterface { + extends NSpacePoint where .N = 2; +} +``` + +**Concern:** Using `=` for this use case is not consistent with other `where` +clauses that write a boolean expression that evaluates to `true` when the +constraint is satisfied. + +A constraint to say that two associated constants should have the same value +without specifying what specific value they should have must use `==` instead of +`=`: + +``` +interface PointCloud { + let Dim:! i32; + let PointT:! NSpacePoint where .N == Dim; +} +``` + +#### Same type constraints + +##### Set an associated type to a specific value + +Functions accepting a generic type might also want to constrain one of its +associated types to be a specific, concrete type. For example, we might want to +have a function only accept stacks containing integers: + +``` +fn SumIntStack[T:! Stack where .ElementType = i32](s: T*) -> i32 { + var sum: i32 = 0; + while (!s->IsEmpty()) { + // s->Pop() has type `T.ElementType` == i32: + sum += s->Pop(); + } + return sum; +} +``` + +To name these sorts of constraints, we could use `let` statements or +`constraint` definitions: + +``` +let IntStack:! auto = Stack where .ElementType = i32; +constraint IntStack { + extends Stack where .ElementType = i32; +} +``` + +##### Equal generic types + +Alternatively, two generic types could be constrained to be equal to each other, +without specifying what that type is. This uses `==` instead of `=`. For +example, we could make the `ElementType` of an `Iterator` interface equal to the +`ElementType` of a `Container` interface as follows: + +``` +interface Iterator { + let ElementType:! Type; + ... +} +interface Container { + let ElementType:! Type; + let IteratorType:! Iterator where .ElementType == ElementType; + ... +} +``` + +Given an interface with two associated types + +``` +interface PairInterface { + let Left:! Type; + let Right:! Type; +} +``` + +we can constrain them to be equal in a function declaration: + +``` +fn F[MatchedPairType:! PairInterface where .Left == .Right] + (x: MatchedPairType*); +``` + +or in an interface definition: + +``` +interface HasEqualPair { + let P:! PairInterface where .Left == .Right; +} +``` + +This kind of constraint can be named: + +``` +let EqualPair:! auto = + PairInterface where .Left == .Right; +constraint EqualPair { + extends PairInterface where .Left == .Right; +} +``` + +Another example of same type constraints is when associated types of two +different interfaces are constrained to be equal: + +``` +fn Map[CT:! Container, + FT:! Function where .InputType == CT.ElementType] + (c: CT, f: FT) -> Vector(FT.OutputType); +``` + +###### Satisfying both type-of-types + +If the two types being constrained to be equal have been declared with different +type-of-types, then the actual type value they are set to will have to satisfy +both constraints. For example, if `SortedContainer.ElementType` is declared to +be `Comparable`, then in this declaration: + +``` +fn Contains + [SC:! SortedContainer, + CT:! Container where .ElementType == SC.ElementType] + (haystack: SC, needles: CT) -> bool; +``` + +the `where` constraint means `CT.ElementType` must satisfy `Comparable` as well. +However, inside the body of `Contains`, `CT.ElementType` will only act like the +implementation of `Comparable` is [external](#external-impl). That is, items +from the `needles` container won't have an unqualified `Compare` method member, +but can still be implicitly converted to `Comparable` and can still call +`Compare` using the qualified member syntax, `needle.(Comparable.Compare)(elt)`. +The rule is that an `==` `where` constraint between two type variables does not +modify the set of unqualified member names of either type. (If you write +`where .ElementType = String` with a `=` and a concrete type, then +`.ElementType` is actually set to `String` including the complete unqualified +`String` API.) + +Note that `==` constraints are symmetric, so the previous declaration of +`Contains` is equivalent to an alternative declaration where `CT` is declared +first and the `where` clause is attached to `SortedContainer`: + +``` +fn Contains + [CT:! Container, + SC:! SortedContainer where .ElementType == CT.ElementType] + (haystack: SC, needles: CT) -> bool; +``` + +#### Type bound for associated type + +A `where` clause can express that a type must implement an interface. This is +more flexible than the usual approach of including that interface in the type +since it can be applied to associated type members as well. + +##### Type bounds on associated types in declarations + +In the following example, normally the `ElementType` of a `Container` can be any +type. The `SortContainer` function, however, takes a pointer to a type +satisfying `Container` with the additional constraint that its `ElementType` +must satisfy the `Comparable` interface. + +``` +interface Container { + let ElementType:! Type; + ... +} + +fn SortContainer + [ContainerType:! Container where .ElementType is Comparable] + (container_to_sort: ContainerType*); +``` + +In contrast to [a same type constraint](#same-type-constraints), this does not +say what type `ElementType` exactly is, just that it must satisfy some +type-of-type. + +**Open question:** How do you spell that? Provisionally we are writing `is`, +following Swift, but maybe we should have another operator that more clearly +returns a boolean like `has_type`? + +**Note:** `Container` defines `ElementType` as having type `Type`, but +`ContainerType.ElementType` has type `Comparable`. This is because +`ContainerType` has type `Container where .ElementType is Comparable`, not +`Container`. This means we need to be a bit careful when talking about the type +of `ContainerType` when there is a `where` clause modifying it. + +##### Type bounds on associated types in interfaces + +Given these definitions (omitting `ElementType` for brevity): + +``` +interface IteratorInterface { ... } +interface ContainerInterface { + let IteratorType:! IteratorInterface; + ... +} +interface RandomAccessIterator { + extends IteratorInterface; + ... +} +``` + +We can then define a function that only accepts types that implement +`ContainerInterface` where its `IteratorType` associated type implements +`RandomAccessIterator`: + +``` +fn F[ContainerType:! ContainerInterface + where .IteratorType is RandomAccessIterator] + (c: ContainerType); +``` + +We would like to be able to name this constraint, defining a +`RandomAccessContainer` to be a type-of-type whose types satisfy +`ContainerInterface` with an `IteratorType` satisfying `RandomAccessIterator`. + +``` +let RandomAccessContainer:! auto = + ContainerInterface where .IteratorType is RandomAccessIterator; +// or +constraint RandomAccessContainer { + extends ContainerInterface + where .IteratorType is RandomAccessIterator; +} + +// With the above definition: +fn F[ContainerType:! RandomAccessContainer](c: ContainerType); +// is equivalent to: +fn F[ContainerType:! ContainerInterface + where .IteratorType is RandomAccessIterator] + (c: ContainerType); +``` + +#### Combining constraints -### Constraints +Constraints can be combined by separating constraint clauses with the `and` +keyword. This example expresses a constraint that two associated types are equal +and satisfy an interface: + +``` +fn EqualContainers + [CT1:! Container, + CT2:! Container where .ElementType is HasEquality + and .ElementType == CT1.ElementType] + (c1: CT1*, c2: CT2*) -> bool; +``` + +**Comparison with other languages:** Swift and Rust use commas `,` to separate +constraint clauses, but that only works because they place the `where` in a +different position in a declaration. In Carbon, the `where` is attached to a +type in a parameter list that is already using commas to separate parameters. + +#### Recursive constraints + +We sometimes need to constrain a type to equal one of its associated types. In +this first example, we want to represent the function `Abs` which will return +`Self` for some but not all types, so we use an associated type `MagnitudeType` +to encode the return type: + +``` +interface HasAbs { + extends Numeric; + let MagnitudeType:! Numeric; + fn Abs[me: Self]() -> MagnitudeType; +} +``` + +For types representing subsets of the real numbers, such as `i32` or `f32`, the +`MagnitudeType` will match `Self`, the type implementing an interface. For types +representing complex numbers, the types will be different. For example, the +`Abs()` applied to a `Complex64` value would produce a `f32` result. The goal is +to write a constraint to restrict to the first case. + +In a second example, when you take the slice of a type implementing `Container` +you get a type implementing `Container` which may or may not be the same type as +the original container type. However, taking the slice of a slice always gives +you the same type, and some functions want to only operate on containers whose +slice type is the same as the container type. + +To solve this problem, we think of `Self` as an actual associated type member of +every interface. We can then address it using `.Self` in a `where` clause, like +any other associated type member. + +``` +fn Relu[T:! HasAbs where .MagnitudeType == .Self](x: T) { + // T.MagnitudeType == T so the following is allowed: + return (x.Abs() + x) / 2; +} +fn UseContainer[T:! Container where .SliceType == .Self](c: T) -> bool { + // T.SliceType == T so `c` and `c.Slice(...)` can be compared: + return c == c.Slice(...); +} +``` + +Notice that in an interface definition, `Self` refers to the type implementing +this interface while `.Self` refers to the associated type currently being +defined. + +``` +interface Container { + let ElementType:! Type; + + let SliceType:! Container + where .ElementType == ElementType and + .SliceType == .Self; + + fn GetSlice[addr me: Self*] + (start: IteratorType, end: IteratorType) -> SliceType; +} +``` -We will need to be able to express constraints beyond "type implements these -interfaces." +These recursive constraints can be named: + +``` +let RealAbs:! auto = HasAbs where .MagnitudeType == .Self; +constraint RealAbs { + extends HasAbs where .MagnitudeType == Self; +} +let ContainerIsSlice:! auto = + Container where .SliceType == .Self; +constraint ContainerIsSlice { + extends Container where .SliceType == Self; +} +``` + +Note that using the `constraint` approach we can name these constraints using +`Self` instead of `.Self`, since they are facets of the same type: `Self` is the +facet corresponding to the containing interface and `.Self` is the facet +corresponding to the interface being extended. + +#### Parameterized type implements interface + +There are times when a function will pass a generic type parameter of the +function as an argument to a parameterized type, as in the previous case, and in +addition the function needs the result to implement a specific interface. + +``` +// Some parametized type. +class Vector(T:! Type) { ... } + +// Parameterized type implements interface only for some arguments. +external impl Vector(String) as Printable { ... } + +// Constraint: `T` such that `Vector(T)` implements `Printable` +fn PrintThree + [T:! Type where Vector(.Self) is Printable] + (a: T, b: T, c: T) { + var v: Vector(T) = (a, b, c); + Print(v); +} +``` + +**Comparison with other languages:** This use case was part of the +[Rust rationale for adding support for `where` clauses](https://rust-lang.github.io/rfcs/0135-where.html#motivation). + +#### Another type implements parameterized interface + +In this case, we need some other type to implement an interface parameterized by +a generic type parameter. The syntax for this case follows the previous case, +except now the `.Self` parameter is on the interface to the right of the `is`. +For example, we might need a type parameter `T` to support explicit conversion +from an integer type like `i32`: + +``` +interface As(T:! Type) { + fn Convert[me: Self]() -> T; +} + +fn Double[T:! Mul where i32 is As(.Self)](x: T) -> T { + return x * (2 as T); +} +``` + +### Implied constraints + +Imagine we have a generic function that accepts an arbitrary `HashMap`: + +``` +fn LookUp[KeyType:! Type](hm: HashMap(KeyType, i32)*, + k: KeyType) -> i32; + +fn PrintValueOrDefault[KeyType:! Printable, + ValueT:! Printable & HasDefault] + (map: HashMap(KeyType, ValueT), key: KeyT); +``` + +The `KeyType` in these declarations does not satisfy the requirements of +`HashMap`, which requires the type implement `Hashable` and other interfaces: + +``` +class HashMap( + KeyType:! Hashable & EqualityComparable & Movable, + ...) { ... } +``` + +In this case, `KeyType` gets `Hashable` and so on as _implied constraints_. +Effectively that means that these functions are automatically rewritten to add a +`where` constraint on `KeyType` attached to the `HashMap` type: + +``` +fn LookUp[KeyType:! Type] + (hm: HashMap(KeyType, i32)* + where KeyType is Hashable & EqualityComparable & Movable, + k: KeyType) -> i32; + +fn PrintValueOrDefault[KeyType:! Printable, + ValueT:! Printable & HasDefault] + (map: HashMap(KeyType, ValueT) + where KeyType is Hashable & EqualityComparable & Movable, + key: KeyT); +``` + +In this case, Carbon will accept the definition and infer the needed constraints +on the generic type parameter. This is both more concise for the author of the +code and follows the +["don't repeat yourself" principle](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). +This redundancy is undesirable since it means if the needed constraints for +`HashMap` are changed, then the code has to be updated in more locations. +Further it can add noise that obscures relevant information. In practice, any +user of these functions will have to pass in a valid `HashMap` instance, and so +will have already satisfied these constraints. + +This implied constraint is equivalent to the explicit constraint that each +parameter and return type [is legal](#must-be-legal-type-argument-constraints). + +**Note:** These implied constraints affect the _requirements_ of a generic type +parameter, but not its _unqualified member names_. This way you can always look +at the declaration to see how name resolution works, without having to look up +the definitions of everything it is used as an argument to. + +**Limitation:** To limit readability concerns and ambiguity, this feature is +limited to a single signature. Consider this interface declaration: + +``` +interface GraphNode { + let Edge:! Type; + fn EdgesFrom[me: Self]() -> HashSet(Edge); +} +``` + +One approach would be to say the use of `HashSet(Edge)` in the signature of the +`EdgesFrom` function would imply that `Edge` satisfies the requirements of an +argument to `HashSet`, such as being `Hashable`. Another approach would be to +say that the `EdgesFrom` would only be conditionally available when `Edge` does +satisfy the constraints on `HashSet` arguments. Instead, Carbon will reject this +definition, requiring the user to include all the constraints required for the +other declarations in the interface in the declaration of the `Edge` associated +type. Similarly, a parameter to a class must be declared with all the +constraints needed to declare the members of the class that depend on that +parameter. + +**Comparison with other languages:** Both Swift +([1](https://www.swiftbysundell.com/tips/inferred-generic-type-constraints/), +[2](https://github.com/apple/swift/blob/main/docs/Generics.rst#constraint-inference)) +and +[Rust](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0b2d645bd205f24a7a6e2330d652c32e) +support some form of this feature as part of their type inference. + +#### Must be legal type argument constraints + +Now consider the case that the generic type parameter is going to be used as an +argument to a parameterized type in a function body, not in the signature. If +the parameterized type was explicitly mentioned in the signature, the implied +constraint feature would ensure all of its requirements were met. The developer +can create a trivial +[parameterized type implements interface](#parameterized-type-implements-interface) +`where` constraint to just say the type is a legal with this argument, by saying +that the parameterized type implements `Type`, which all types do. + +For example, a function that adds its parameters to a `HashSet` to deduplicate +them, needs them to be `Hashable` and so on. To say "`T` is a type where +`HashSet(T)` is legal," we can write: + +``` +fn NumDistinct[T:! Type where HashSet(.Self) is Type] + (a: T, b: T, c: T) -> i32 { + var set: HashSet(T); + set.Add(a); + set.Add(b); + set.Add(c); + return set.Size(); +} +``` + +This has the same advantages over repeating the constraints on `HashSet` +arguments in the type of `T` as the general implied constraints above. + +### Open question: referencing names in the interface being defined + +Should the constraint in a `where` clause be required to only reference earlier +names from this scope, as in this example? + +``` +interface Graph { + let E: Edge; + let V: Vert where .E == E and .Self == E.V; +} +``` + +The downside is that if you could reference later names, there is a more +pleasingly symmetric formulation of those same constraints: + +``` +interface Graph { + let E: Edge where .V == V; + let V: Vert where .E == E; +} +``` + +**TODO:** Revisit this question once issue +[#472: Open question: Calling functions defined later in the same file](https://github.com/carbon-language/carbon-lang/issues/472) +and proposal +[#875: Principle: information accumulation](https://github.com/carbon-language/carbon-lang/pull/875) +are resolved. + +### Manual type equality + +Imagine we have some function with generic parameters: + +``` +fn F[T:! SomeInterface](x: T) { + x.G(x.H()); +} +``` + +We want to know if the return type of method `T.H` is the same as the parameter +type of `T.G` in order to typecheck the function. However, determining whether +two type expressions are transitively equal is in general undecidable, as +[has been shown in Swift](https://forums.swift.org/t/swift-type-checking-is-undecidable/39024). + +Carbon's approach is to only allow implicit conversions between two type +expressions that are constrained to be equal in a single where clause. This +means that if two type expressions are only transitively equal, the user will +need to include a sequence of casts or use an +[`observe` declaration](#observe-declarations) to convert between them. + +Given this interface `Transitive` that has associated types that are constrained +to all be equal, with interfaces `P`, `Q`, and `R`: + +``` +interface P { fn InP[me:Self](); } +interface Q { fn InQ[me:Self](); } +interface R { fn InR[me:Self](); } + +interface Transitive { + let A:! P; + let B:! Q where .Self == A; + let C:! R where .Self == B; + + fn GetA[me: Self]() -> A; + fn TakesC[me:Self](c: C); +} +``` + +A cast to `B` is needed to call `TakesC` with a value of type `A`, so each step +only relies on one equality: + +``` +fn F[T:! Transitive](t: T) { + // Allowed + t.TakesC(t.GetA() as T.B); + + // Allowed + let b: T.B = t.GetA(); + t.TakesC(b); + + // Not allowed: t.TakesC(t.GetA()); +} +``` + +A value of type `A`, such as the return value of `GetA()`, has the API of `P`. +Any such value also implements `Q`, and since the compiler can see that by way +of a single `where` equality, values of type `A` are treated as if they +implement `Q` externally. However, the compiler will require a cast to `B` or +`C` to see that the type implements `R`. + +``` +fn TakesPQR[U:! P & Q & R](u: U); + +fn G[T:! Transitive](t: T) { + var a: T.A = t.GetA(); + + // Allowed: `T.A` implements `P`. + a.InP(); + + // Allowed: `T.A` implements `Q` externally. + a.(Q.InQ)(); + + // Not allowed: a.InQ(); + + // Allowed: values of type `T.A` may be cast + // to `T.B`, which implements `Q`. + (a as T.B).InQ(); + + // Allowed: `T.B` implements `R` externally. + (a as T.B).(R.InR)(); + + // Not allowed: TakesPQR(a); + + // Allowed: `T.B` implements `P`, `Q`, and + // `R`, though the implementations of `P` + // and `R` are external. + TakesPQR(a as T.B); +} +``` + +The compiler may have several different `where` clauses to consider, +particularly when an interface has associated types that recursively satisfy the +same interface. For example, given this interface `Commute`: + +``` +interface Commute { + let X:! Commute; + let Y:! Commute where .X == X.Y; + + fn GetX[me: Self]() -> X; + fn GetY[me: Self]() -> Y; + fn TakesXXY[me:Self](xxy: X.X.Y); +} +``` + +and a function `H` taking a value with some type implementing this interface, +then the following would be legal statements in `H`: + +``` +fn H[C: Commute](c: C) { + // Legal: argument has type `C.X.X.Y` + c.TakesXXY(c.GetX().GetX().GetY()); + + // Legal: argument has type `C.X.Y.X` which is equal + // to `C.X.X.Y` following only one `where` clause. + c.TakesXXY(c.GetX().GetY().GetX()); + + // Legal: cast is legal since it matches a `where` + // clause, and produces an argument that has type + // `C.X.Y.X`. + c.TakesXXY(c.GetY().GetX().GetX() as C.X.Y.X); +} +``` + +That last call would not be legal without the cast, though. + +**Comparison with other languages:** Other languages such as Swift and Rust +instead perform automatic type equality. In practice this means that their +compiler can reject some legal programs based on heuristics simply to avoid +running for an unbounded length of time. + +The benefits of the manual approach include: + +- fast compilation, since the compiler does not need to explore a potentially + large set of combinations of equality restrictions, supporting + [Carbon's goal of fast and scalable development](/docs/project/goals.md#fast-and-scalable-development); +- expressive and predictable semantics, since there are no limitations on how + complex a set of constraints can be supported; and +- simplicity. + +The main downsides are: + +- manual work for the source code author to prove to the compiler that types + are equal; and +- verbosity. + +We expect that rich error messages and IDE tooling will be able to suggest +changes to the source code when a single equality constraint is not sufficient +to show two type expressions are equal, but a more extensive automated search +can find a sequence that prove they are equal. + +#### `observe` declarations + +An `observe` declaration lists a sequence of type expressions that are equal by +some same-type `where` constraints. These `observe` declarations may be included +in an `interface` definition or a function body, as in: + +``` +interface Commute { + let X:! Commute; + let Y:! Commute where .X == X.Y; + ... + observe X.X.Y == X.Y.X == Y.X.X; +} + +fn H[C: Commute](c: C) { + observe C.X.Y.Y == C.Y.X.Y == C.Y.Y.X; + ... +} +``` + +Every type expression after the first must be equal to some earlier type +expression in the sequence by a single `where` equality constraint. In this +example, + +``` +interface Commute { + let X:! Commute; + let Y:! Commute where .X == X.Y; + ... + // Legal: + observe X.X.Y.Y == X.Y.X.Y == Y.X.X.Y == X.Y.Y.X; +} +``` + +the expression `X.Y.Y.X` is one equality away from `X.Y.X.Y` and so it is +allowed. This is even though `X.Y.X.Y` isn't the type expression immediately +prior to `X.Y.Y.X`. + +After an `observe` declaration, all of the listed type expressions are +considered equal to each other using a single `where` equality. In this example, +the `observe` declaration in the `Transitive` interface definition provides the +link between associated types `A` and `C` that allows function `F` to type +check. + +``` +interface P { fn InP[me:Self](); } +interface Q { fn InQ[me:Self](); } +interface R { fn InR[me:Self](); } + +interface Transitive { + let A:! P; + let B:! Q where .Self == A; + let C:! R where .Self == B; + + fn GetA[me: Self]() -> A; + fn TakesC[me:Self](c: C); + + // Without this `observe` statement, the + // calls in `F` below would not be allowed. + observe A == B == C; +} + +fn TakesPQR[U:! P & Q & R](u: U); + +fn F[T:! Transitive](t: T) { + var a: T.A = t.GetA(); + + // Allowed: `T.A` == `T.C` + t.TakesC(a); + a.(R.InR()); + + // Allowed: `T.A` implements `P`, + // `T.A` == `T.B` that implements `Q`, and + // `T.A` == `T.C` that implements `R`. + TakesPQR(a); +} +``` + +Since adding an `observe` statement only adds external implementations of +interfaces to generic types, they may be added without breaking existing code. + +## Other constraints as type-of-types + +There are some constraints that we will naturally represent as named +type-of-types. These can either be used directly to constrain a generic type +parameter, or in a `where ... is ...` clause to constrain an associated type. + +The compiler determines which types implement these interfaces, developers can +not explicitly implement these interfaces for their own types. + +**Open question:** Are these names part of the prelude or in a standard library? + +### Is a derived class + +Given a type `T`, `Extends(T)` is a type-of-type whose values are types that are +derived from `T`. That is, `Extends(T)` is the set of all types `U` that are +subtypes of `T`. + +``` +fn F[T:! Extends(BaseType)](p: T*); +fn UpCast[T:! Type](p: T*, U:! Type where T is Extends(.Self)) -> U*; +fn DownCast[T:! Type](p: T*, U:! Extends(T)) -> U*; +``` + +**Open question:** Alternatively, we could define a new `extends` operator: + +``` +fn F[T:! Type where .Self extends BaseType](p: T*); +fn UpCast[T:! Type](p: T*, U:! Type where T extends .Self) -> U*; +fn DownCast[T:! Type](p: T*, U:! Type where .Self extends T) -> U*; +``` + +**Comparison to other languages:** In Swift, you can +[add a required superclass to a type bound using `&`](https://docs.swift.org/swift-book/LanguageGuide/Protocols.html#ID282). + +### Type compatible with another type + +Given a type `U`, define the type-of-type `CompatibleWith(U)` as follows: + +> `CompatibleWith(U)` is a type whose values are types `T` such that `T` and `U` +> are [compatible](terminology.md#compatible-types). That is values of types `T` +> and `U` can be cast back and forth without any change in representation (for +> example `T` is an [adapter](#adapting-types) for `U`). + +To support this, we extend the requirements that type-of-types are allowed to +have to include a "data representation requirement" option. + +`CompatibleWith` determines an equivalence relationship between types. +Specifically, given two types `T1` and `T2`, they are equivalent if +`T1 is CompatibleWith(T2)`. That is, if `T1` has the type `CompatibleWith(T2)`. + +**Note:** Just like interface parameters, we require the user to supply `U`, +they may not be deduced. Specifically, this code would be illegal: + +``` +fn Illegal[U:! Type, T:! CompatibleWith(U)](x: T*) ... +``` + +In general there would be multiple choices for `U` given a specific `T` here, +and no good way of picking one. However, similar code is allowed if there is +another way of determining `U`: + +``` +fn Allowed[U:! Type, T:! CompatibleWith(U)](x: U*, y: T*) ... +``` + +#### Same implementation restriction + +In some cases, we need to restrict to types that implement certain interfaces +the same way as the type `U`. + +> The values of type `CompatibleWith(U, TT)` are types satisfying +> `CompatibleWith(U)` that have the same implementation of `TT` as `U`. + +For example, if we have a type `HashSet(T)`: + +``` +class HashSet(T:! Hashable) { ... } +``` + +Then `HashSet(T)` may be cast to `HashSet(U)` if +`T is CompatibleWith(U, Hashable)`. The one-parameter interpretation of +`CompatibleWith(U)` is recovered by letting the default for the second `TT` +parameter be `Type`. + +#### Example: Multiple implementations of the same interface + +This allows us to represent functions that accept multiple implementations of +the same interface for a type. + +``` +enum CompareResult { Less, Equal, Greater } +interface Comparable { + fn Compare[me: Self](that: Self) -> CompareResult; +} +fn CombinedLess[T:! Type](a: T, b: T, + U:! CompatibleWith(T) & Comparable, + V:! CompatibleWith(T) & Comparable) -> bool { + match ((a as U).Compare(b as U)) { + case CompareResult.Less => { return True; } + case CompareResult.Greater => { return False; } + case CompareResult.Equal => { + return (a as V).Compare(b as V) == CompareResult.Less; + } + } +} +``` + +Used as: + +``` +class Song { ... } +adapter SongByArtist for Song { impl as Comparable { ... } } +adapter SongByTitle for Song { impl as Comparable { ... } } +var s1: Song = ...; +var s2: Song = ...; +assert(CombinedLess(s1, s2, SongByArtist, SongByTitle) == True); +``` + +We might generalize this to a list of implementations: + +``` +fn CombinedCompare[T:! Type] + (a: T, b: T, CompareList:! List(CompatibleWith(T) & Comparable)) + -> CompareResult { + for (let U:! auto in CompareList) { + var result: CompareResult = (a as U).Compare(b); + if (result != CompareResult.Equal) { + return result; + } + } + return CompareResult.Equal; +} + +assert(CombinedCompare(Song(...), Song(...), (SongByArtist, SongByTitle)) == + CompareResult.Less); +``` + +**Open question:** How are compile-time lists of types declared and iterated +through? They will also be needed for +[variadic argument support](#variadic-arguments). + +#### Example: Creating an impl out of other impls + +And then to package this functionality as an implementation of `Comparable`, we +combine `CompatibleWith` with [type adaptation](#adapting-types): + +``` +adapter ThenCompare(T:! Type, + CompareList:! List(CompatibleWith(T) & Comparable)) for T { + impl as Comparable { + fn Compare[me: Self](that: Self) -> CompareResult { + for (let U:! auto in CompareList) { + var result: CompareResult = (this as U).Compare(that); + if (result != CompareResult.Equal) { + return result; + } + } + return CompareResult.Equal; + } + } +} + +let SongByArtistThenTitle: auto = ThenCompare(Song, (SongByArtist, SongByTitle)); +var s1: Song = ...; +var s2: SongByArtistThenTitle = Song(...) as SongByArtistThenTitle; +assert((s1 as SongByArtistThenTitle).Compare(s2) == CompareResult.Less); +``` + +### Type facet of another type + +Similar to `CompatibleWith(T)`, `FacetOf(T)` introduces an equivalence +relationship between types. `T1 is FacetOf(T2)` if both `T1` and `T2` are facets +of the same type. + +### Sized types and type-of-types + +What is the size of a type? + +- It could be fully known and fixed at compile time -- this is true of + primitive types (`i32`, `f64`, and so on), most + [classes](/docs/design/classes.md), and most other concrete types. +- It could be known generically. This means that it will be known at codegen + time, but not at type-checking time. +- It could be dynamic. For example, it could be a + [dynamic type](#runtime-type-fields), a slice, variable-sized type (such as + [found in Rust](https://doc.rust-lang.org/nomicon/exotic-sizes.html#dynamically-sized-types-dsts)), + or you could dereference a pointer to a base class that could actually point + to a [derived class](/docs/design/classes.md#inheritance). +- It could be unknown which category the type is in. In practice this will be + essentially equivalent to having dynamic size. + +A type is called "sized" if it is in the first two categories, and "unsized" +otherwise. Note: something with size 0 is still considered "sized". The +type-of-type `Sized` is defined as follows: + +> `Sized` is a type whose values are types `T` that are "sized" -- that is the +> size of `T` is known, though possibly only generically. + +Knowing a type is sized is a precondition to declaring variables of that type, +taking values of that type as parameters, returning values of that type, and +defining arrays of that type. Users will not typically need to express the +`Sized` constraint explicitly, though, since it will usually be a dependency of +some other constraint the type will need such as `Movable`. + +**Note:** The compiler will determine which types are "sized", this is not +something types will implement explicitly like ordinary interfaces. + +Example: + +``` +// In the Carbon standard library +interface DefaultConstructible { + // Types must be sized to be default constructible. + impl as Sized; + fn Default() -> Self; +} + +// Classes are "sized" by default. +class Name { + impl as DefaultConstructible { + fn Default() -> Self { ... } + } + ... +} + +fn F[T:! Type](x: T*) { // T is unsized. + // ✅ Allowed: may access unsized values through a pointer. + var y: T* = x; + // ❌ Illegal: T is unsized. + var z: T; +} + +// T is sized, but its size is only known generically. +fn G[T: DefaultConstructible](x: T*) { + // ✅ Allowed: T is default constructible, which means sized. + var y: T = T.Default(); +} + +var z: Name; +// ✅ Allowed: `Name` is sized and implements `DefaultConstructible`. +G(&z); +``` + +**Open question:** Even if the size is fixed, it won't be known at the time of +compiling the generic function if we are using the dynamic strategy. Should we +automatically +[box]() +local variables when using the dynamic strategy? Or should we only allow +`MaybeBox` values to be instantiated locally? Or should this just be a case +where the compiler won't necessarily use the dynamic strategy? + +**Open question:** Should the `Sized` type-of-type expose an associated constant +with the size? So you could say `T.ByteSize` in the above example to get a +generic int value with the size of `T`. Similarly you might say `T.ByteStride` +to get the number of bytes used for each element of an array of `T`. + +#### Model + +This requires a special integer field be included in the witness table type to +hold the size of the type. This field will only be known generically, so if its +value is used for type checking, we need some way of evaluating those type tests +symbolically. + +### `TypeId` + +There are some capabilities every type can provide. For example, every type +should be able to return its name or identify whether it is equal to another +type. It is rare, however, for code to need to access these capabilities, so we +relegate these capabilities to an interface called `TypeId` that all types +automatically implement. This way generic code can indicate that it needs those +capabilities by including `TypeId` in the list of requirements. In the case +where no type capabilities are needed, for example the code is only manipulating +pointers to the type, you would write `T:! Type` and get the efficiency of +`void*` but without giving up type safety. + +``` +fn SortByAddress[T:! Type](v: Vector(T*)*) { ... } +``` + +In particular, the compiler should in general avoid monomorphizing to generate +multiple instantiations of the function in this case. + +**Note:** To achieve this goal, the user will not even be allowed to destroy a +value of type `T` in this case. + +**Open question:** Should `TypeId` be implemented externally for types to avoid +name pollution (`.TypeName`, `.TypeHash`, etc.) unless the function specifically +requests those capabilities? + +## Future work ### Conditional conformance @@ -2262,16 +3414,6 @@ For this to work, we need a rule that picks a single `impl` in the case where there are multiple `impl` definitions that match a particular type and interface combination. -### Other constraints as type-of-types - -There are some constraints that we will naturally represent as named -type-of-types that the user can specify. - -#### Sized types and type-of-types - -Like Rust, we may have types that have values whose size is only determined at -runtime. Many functions may want to restrict to types with known size. - ### Dynamic types Generics provide enough structure to support runtime dispatch for values with @@ -2376,7 +3518,24 @@ See details in [the goals document](goals.md#bridge-for-c-customization-points). Some facility for allowing a function to generically take a variable number of arguments. +### Range constraints on generic integers + +We currently only support `where` clauses on type-of-types. We may want to also +support constraints on generic integers. The constraint with the most expected +value is the ability to do comparisons like `<`, or `>=`. For example, you might +constrain the `N` member of [`NSpacePoint`](#associated-constants) using an +expression like `PointT:! NSpacePoint where 2 <= .N and .N <= 3`. + +The concern here is supporting this at compile time with more benefit than +complexity. For example, we probably don't want to support integer-range based +types at runtime, and there are also concerns about reasoning about comparisons +between multiple generic integer parameters. For example, if `J < K` and +`K <= L`, can we call a function that requires `J < L`? There is also a +secondary syntactic concern about how to write this kind of constraint on a +parameter, as opposed to an associated type, as in `N:! u32 where ___ >= 2`. + ## References - [#553: Generics details part 1](https://github.com/carbon-language/carbon-lang/pull/553) - [#731: Generics details 2: adapters, associated types, parameterized interfaces](https://github.com/carbon-language/carbon-lang/pull/731) +- [#818: Constraints for generics (generics details 3)](https://github.com/carbon-language/carbon-lang/pull/818) diff --git a/docs/design/generics/goals.md b/docs/design/generics/goals.md index 9b41e0b83a80f..a54e78eeedc57 100644 --- a/docs/design/generics/goals.md +++ b/docs/design/generics/goals.md @@ -532,7 +532,8 @@ are complicated and ### Interfaces are nominal -Interfaces can either be structural, as in Go, or nominal, as in Rust and Swift. +Interfaces can either be [structural](terminology.md#structural-interfaces), as +in Go, or [nominal](terminology.md#nominal-interfaces), as in Rust and Swift. Structural interfaces match any type that has the required methods, whereas nominal interfaces only match if there is an explicit declaration stating that the interface is implemented for that specific type. Carbon will support nominal diff --git a/docs/design/generics/overview.md b/docs/design/generics/overview.md index d44cd80c60dad..63510ae7c08b2 100644 --- a/docs/design/generics/overview.md +++ b/docs/design/generics/overview.md @@ -27,12 +27,13 @@ pointers to other design documents that dive deeper into individual topics. - [Generic type parameters](#generic-type-parameters) - [Requiring or extending another interface](#requiring-or-extending-another-interface) - [Combining interfaces](#combining-interfaces) - - [Structural interfaces](#structural-interfaces) + - [Named constraints](#named-constraints) - [Type erasure](#type-erasure) - [Adapting types](#adapting-types) - [Interface input and output types](#interface-input-and-output-types) - [Associated types](#associated-types) - [Parameterized interfaces](#parameterized-interfaces) + - [Constraints](#constraints) - [Future work](#future-work) - [References](#references) @@ -84,9 +85,9 @@ Summary of how Carbon generics work: - The `&` operation on type-of-types allows you conveniently combine interfaces. It gives you all the names that don't conflict. - You may also declare a new type-of-type directly using - ["structural interfaces"](terminology.md#structural-interfaces). Structural - interfaces can express requirements that multiple interfaces be implemented, - and give you control over how name conflicts are handled. + ["named constraints"](terminology.md#named-constraints). Named constraints + can express requirements that multiple interfaces be implemented, and give + you control over how name conflicts are handled. - Alternatively, you may resolve name conflicts by using a qualified syntax to directly call a function from a specific interface. @@ -251,7 +252,7 @@ the constraint on the type is that it must implement the interface `Comparable`. A type-of-type also defines a set of names and a mapping to corresponding qualified names. You may combine interfaces into new type-of-types using [the `&` operator](#combining-interfaces) or -[structural interfaces](#structural-interfaces). +[named constraints](#named-constraints). ### Generic functions @@ -406,16 +407,16 @@ fn BothDraws[T:! Renderable & EndOfGame](game_state: T*) { } ``` -#### Structural interfaces +#### Named constraints You may also declare a new type-of-type directly using -["structural interfaces"](terminology.md#structural-interfaces). Structural -interfaces can express requirements that multiple interfaces be implemented, and -give you control over how name conflicts are handled. Structural interfaces have -other applications and capabilities not covered here. +["named constraints"](terminology.md#named-constraints). Named constraints can +express requirements that multiple interfaces be implemented, and give you +control over how name conflicts are handled. Named constraints have other +applications and capabilities not covered here. ``` -structural interface Combined { +constraint Combined { impl as Renderable; impl as EndOfGame; alias Draw_Renderable = Renderable.Draw; @@ -431,7 +432,7 @@ fn CallItAll[T:! Combined](game_state: T*, int winner) { } game_state->Draw_Renderable(); // Can still use qualified syntax for names - // not defined in the structural interface + // not defined in the named constraint return game_state->(Renderable.Center)(); } ``` @@ -561,9 +562,30 @@ fn FindInVector[T:! Type, U:! Equatable(T)](v: Vector(T), needle: U) fn CompileError[T:! Type, U:! Equatable(T)](x: U) -> T; ``` +### Constraints + +Type-of-types can be further constrained using a `where` clause: + +``` +fn FindFirstPrime[T:! Container where .Element == i32] + (c: T, i: i32) -> Optional(i32) { + // The elements of `c` have type `T.Element`, which is `i32`. + ... +} + +fn PrintContainer[T:! Container where .Element is Printable](c: T) { + // The type of the elements of `c` is not known, but we do know + // that type satisfies the `Printable` interface. + ... +} +``` + +Constraints limit the types that the generic function can operate on, but +increase the knowledge that may be used in the body of the function to operate +on values of those types. + ## Future work -- Other kinds of constraints will be finalized. - Implementations can be parameterized to apply to multiple types. These implementations would be restricted to various conditions are true for the parameters. When there are two implementations that can apply, there is a @@ -583,3 +605,4 @@ fn CompileError[T:! Type, U:! Equatable(T)](x: U) -> T; - [#524: Generics overview](https://github.com/carbon-language/carbon-lang/pull/524) - [#731: Generics details 2: adapters, associated types, parameterized interfaces](https://github.com/carbon-language/carbon-lang/pull/731) +- [#818: Constraints for generics (generics details 3)](https://github.com/carbon-language/carbon-lang/pull/818) diff --git a/docs/design/generics/terminology.md b/docs/design/generics/terminology.md index 1f126f419e3e6..ac35b9d1135a4 100644 --- a/docs/design/generics/terminology.md +++ b/docs/design/generics/terminology.md @@ -24,6 +24,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Interface](#interface) - [Structural interfaces](#structural-interfaces) - [Nominal interfaces](#nominal-interfaces) + - [Named constraints](#named-constraints) - [Associated entity](#associated-entity) - [Impls: Implementations of interfaces](#impls-implementations-of-interfaces) - [Compatible types](#compatible-types) @@ -298,6 +299,14 @@ We use the "structural" versus "nominal" terminology as a generalization of the same terms being used in a [subtyping context](https://en.wikipedia.org/wiki/Subtyping#Subtyping_schemes). +### Named constraints + +Named constraints are "structural" in the sense that they match a type based on +meeting some criteria rather than an explicit statement in the type's +definition. The criteria for a named constraint, however, are less focused on +the type's API and instead might include a set of nominal interfaces that the +type must implement. + ## Associated entity An _associated entity_ is a requirement in an interface that a type's @@ -320,8 +329,10 @@ instead of associated entity. An _impl_ is an implementation of an interface for a specific type. It is the place where the function bodies are defined, values for associated types, etc. are given. Impls are needed for [nominal interfaces](#nominal-interfaces); -[structural interfaces](#structural-interfaces) define conformance implicitly -instead of by requiring an impl to be defined. +[structural interfaces](#structural-interfaces) and +[named constraints](#named-constraints) define conformance implicitly instead of +by requiring an impl to be defined. In can still make sense to implement a named +constraint as a way to implement all of the interfaces it requires. ## Compatible types diff --git a/docs/design/lexical_conventions/words.md b/docs/design/lexical_conventions/words.md index 98e7546022933..d61e8899c5201 100644 --- a/docs/design/lexical_conventions/words.md +++ b/docs/design/lexical_conventions/words.md @@ -43,6 +43,7 @@ The following words are interpreted as keywords: - `break` - `case` - `class` +- `constraint` - `continue` - `default` - `else` @@ -55,11 +56,13 @@ The following words are interpreted as keywords: - `impl` - `import` - `interface` +- `is` - `let` - `library` - `match` - `namespace` - `not` +- `observe` - `or` - `override` - `package` diff --git a/proposals/p0818.md b/proposals/p0818.md new file mode 100644 index 0000000000000..31441187a03a8 --- /dev/null +++ b/proposals/p0818.md @@ -0,0 +1,535 @@ +# Constraints for generics (generics details 3) + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/818) + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) +- [Alternatives considered](#alternatives-considered) + - [Alternatives to `where` clauses modifying type-of-types](#alternatives-to-where-clauses-modifying-type-of-types) + - [Different keyword than `where`](#different-keyword-than-where) + - [Inline constraints instead of `.Self`](#inline-constraints-instead-of-self) + - [Self reference instead of `.Self`](#self-reference-instead-of-self) + - [Implied constraints across declarations](#implied-constraints-across-declarations) + - [No implied constraints](#no-implied-constraints) + - [Type inequality constraints](#type-inequality-constraints) + - [`where .Self is ...` could act like an external impl](#where-self-is--could-act-like-an-external-impl) + - [Other syntax for external impl](#other-syntax-for-external-impl) + - [Other syntax for must be legal type constraints](#other-syntax-for-must-be-legal-type-constraints) + - [Using only `==` instead of also `=`](#using-only--instead-of-also-) + - [Automatic type equality](#automatic-type-equality) + - [Restricted equality constraints](#restricted-equality-constraints) + - [No explicit restrictions](#no-explicit-restrictions) + + + +## Problem + +We want to Carbon to have a high quality generics feature that achieves the +goals set out in [#24](https://github.com/carbon-language/carbon-lang/pull/24). +This is too big to land in a single proposal. This proposal continues +[#553](https://github.com/carbon-language/carbon-lang/pull/553) defining the +details of: + +- adapters +- associated types and other constants +- parameterized interfaces + +## Background + +This is a follow on to these previous generics proposals: + +- [#24: Generics goals](https://github.com/carbon-language/carbon-lang/pull/24) +- [#447: Generics terminology](https://github.com/carbon-language/carbon-lang/pull/447) +- [#524: Generics overview](https://github.com/carbon-language/carbon-lang/pull/524) +- [#553: Generics details part 1](https://github.com/carbon-language/carbon-lang/pull/553) +- [#731: Generics details 2: adapters, associated types, parameterized interfaces](https://github.com/carbon-language/carbon-lang/pull/731) + +Some of the content for this proposal was extracted from a larger +[Generics combined draft proposal](https://github.com/carbon-language/carbon-lang/pull/36). + +## Proposal + +This is a proposal to add two sections to +[this design document on generics details](/docs/design/generics/details.md). + +## Rationale based on Carbon's goals + +Much of this rationale for generics was captured in the +[Generics goals proposal](https://github.com/carbon-language/carbon-lang/pull/24). +The specific decisions for this constraint design were motivated by: + +- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) + - The manual generic type equality approach being proposed could be + extended to automatically observe more types as being equal without + breaking existing code. + - Adding or growing an `observe` statement has been designed to be + non-breaking since they can't introduce name conflicts, only make more + code legal. + - This constraint system is expressive and does not have artificial limits + that would be needed to support automatic type equality. +- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + - Attaching `where` clauses to type-of-types addresses more use cases than + attaching them to declarations, allowing the design to be smaller, see + [#780](https://github.com/carbon-language/carbon-lang/issues/780). + - The manual generic type equality approach is very simple and easy to + understand how it works. It unfortunately is a bit more verbose, leading + to more to read and write. +- [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) + - The manual generic type equality approach does much less work at compile + time than the automatic approach. The extra work would only be done to + produce an error message that includes the changes to the source needed + to avoid repeating that work. In effect, the source code holds a cache + of the facts needed to compile it. + +## Alternatives considered + +### Alternatives to `where` clauses modifying type-of-types + +Issue +[#780: How to write constraints](https://github.com/carbon-language/carbon-lang/issues/780) +considered other forms that constraints could be written. + +One alternative is to place `where` clauses on declarations instead of types. + +``` +fn F[W:! Type, V:! D & E](v:V) -> W + where V.X == W, V.Y == W, V.Z = i32; +``` + +The constraints are written after the names of types are complete, and so +constraints are naturally written in terms of those names. This is a common +approach used by several languages including Swift and Rust, and so is likely +familiar to a significant fraction of our users. The downside of this approach +is it doesn't let you put constraints on types that never get a names, such as +in +["protocols as types" or "existential types" in Swift](https://docs.swift.org/swift-book/LanguageGuide/Protocols.html#ID275) +and "trait objects" in Rust +([1](https://doc.rust-lang.org/book/ch17-02-trait-objects.html), +[2](https://doc.rust-lang.org/reference/types/trait-object.html)). This is a +problem in practice, since those are cases where constraints are needed, since +for type safety any associated types need to be constrained to be a specific +concrete type. This motivated Rust to add another syntax just for specifying +specific concrete types for associated types in the argument passing style. It +is also motivation for Swift to consider a feature they call +["generalized existentials"](https://github.com/apple/swift/blob/main/docs/GenericsManifesto.md#generalized-existentials), +that is very similar to this proposal. + +Another alternative is to generalize the argument passing approach Rust allows +to handle more kinds of constraints. + +``` +fn F[W:! Type, V:! D{.X = W} & E{.Y = W, .Z = i32}](v: V) -> W; +``` + +This would treat constraints on interface parameters and associated types in a +more uniform way, but doesn't handle the full breadth of constraints we would +like to express. We at some point believed that this framing of the constraints +made it harder to express undecidable type equality problems and easier to +enforce restrictions that would make the constraints decidable, but we +eventually discovered that this formulation had the essentially the same +difficulties. + +We considered a variation of argument passing that we called "whole expression +constraint intersections." + +``` +fn F[W:! Type, V:! D & E & {.X = W, .Y = W, .Z = i32}](v: V) -> W; +``` + +This variation made it easy to set associated types from two interfaces with the +same name to the same value, but introduced concerns that it would stop `&` from +being associative and commutative. It was not chosen because it had the same +downsides as the argument passing approach. + +### Different keyword than `where` + +We considered other keywords for introducing constraints, such as: + +- `requires`, like + [C++](https://en.cppreference.com/w/cpp/language/constraints) +- `with` +- `if`, like [D](http://rosettacode.org/wiki/Constrained_genericity#D) +- `when`, like [F#](http://rosettacode.org/wiki/Constrained_genericity#F.23). + +The most common choice across popular languages is `where`, including: + +- [C#](http://rosettacode.org/wiki/Constrained_genericity#C.23) +- [Haskell](http://rosettacode.org/wiki/Constrained_genericity#Haskell) +- [Rust](https://doc.rust-lang.org/rust-by-example/generics/where.html) +- Swift + ([1](https://docs.swift.org/swift-book/LanguageGuide/Generics.html#ID553), + [2](https://docs.swift.org/swift-book/LanguageGuide/Generics.html#ID557)) + +While C++ is particularly important to our target userbase, Carbon's constraints +work more similarly to languages with generics like Rust and Swift. + +### Inline constraints instead of `.Self` + +We considered the possibility of using [named constraints](#named-constraints) +instead of `.Self` for +[recursive constraints](/docs/design/generics/details.md#recursive-constraints). +This comes under consideration since `.Self` outside the named constraint is the +same as `Self` inside. However, you can't always avoid using `.Self`, since +naming the constraint before using it doesn't allow you to define this +`Container` interface: + +``` +interface Container { + ... + let SliceType:! Container where .SliceType == .Self; + ... +} +``` + +The problem that arises is to avoid using `.Self`, we would need to define the +named constraint using `Container` before `Container` is defined: + +``` +constraint ContainerIsSlice { + // Error: forward reference + extends Container where Container.SliceType == Self; +} + +interface Container { + ... + let SliceType:! ContainerIsSlice; + ... +} +``` + +To work around this problem, we could allow the named constraint to be defined +inline in the `Container` definition: + +``` +interface Container { + ... + constraint ContainerIsSlice { + extends Container where Container.SliceType == Self; + } + let SliceType:! ContainerIsSlice; + ... +} +``` + +This alternative seems too cumbersome. + +### Self reference instead of `.Self` + +Another alternative instead of using `.Self` for +[recursive constraints](/docs/design/generics/details.md#recursive-constraints), +is to use the name of the type being declared inside the type declaration, as in +`T:! HasAbs(.MagnitudeType = T)`. This had two downsides: + +- Using the name of the type before it is finished being defined created + questions about what that name meant. Using the reserved token sequence + `.Self` instead makes it clearer that it obeys different rules than other + identifier references. For example, that we don't allow members to be + accessed. +- It doesn't address use cases where we are defining a type-of-type that isn't + associated with a type that has a name. + +### Implied constraints across declarations + +If constraints on an associated type could be implied by any declaration in the +interface, readers and the type checker would be required to scan the entire +interface definition and perform many type declaration lookups to understand the +constraints on that associated type. This is particularly important when those +constraints can be obscured in recursive references to the same interface: + +``` +interface I { + let A:! Type; + let B:! Type; + let C:! Type; + let D:! Type; + let E:! Type; + let SwapType:! I where .A == B and .B == A and .C == C + and .D == D and .E == E; + let CycleType:! I where .A == B and .B == C and .C == D + and .D == E and .E == A; + fn LookUp(hm: HashMap(D, E)*) -> E; + fn Foo(x: Bar(A, B)); +} +``` + +This applies equally to parameters: + +``` +interface I(A:! Type, B:! Type, C:! Type, D:! Type, E:! Type) { + let SwapType:! I(B, A, C, D, E); + let CycleType:! I(B, C, D, E, A); + fn LookUp(hm: HashMap(D, E)*) -> E; + fn Foo(x: Bar(A, B)); +} +``` + +All of the type arguments to `I` must actually implement `Hashable`, since +[an adjacent swap and a cycle generate the full symmetry group on 5 elements](https://www.mathcounterexamples.net/generating-the-symmetric-group-with-a-transposition-and-a-maximal-length-cycle/)). +And additional restrictions on those types would depend on the definition of +`Bar`. For example, this definition + +``` +class Bar(A:! Type, B:! ComparableWith(A)) { ... } +``` + +would imply that all the type arguments to `I` would have to be comparable with +each other. This propagation problem means that allowing constraints to be +implied in this context is substantial, and potentially unbounded, work for the +compiler and human readers. + +From this we conclude that we need the initial declaration part of an +`interface`, type definition, or associated type declaration to include a +complete description of all needed constraints. + +It is possible that this reasoning will apply more generally, but we will wait +until we implement type checking to see what restrictions we actually need. For +example, we might need to restate "parameterized type implements interface" +constraints when it is on an associated type in a referenced interface, as in: + +``` +interface HasConstraint { + let T:! Type where Vector(.Self) is Printable; +} + +interface RestatesConstraint { + // This works, since it restates the constraint on + // `HasConstraint.T` that `U` is equal to. + let U:! Type where Vector(.Self) is Printable; + // This doesn't work: + // ❌ let U:! Type; + + let V:! HasConstraint where .T == U; +} +``` + +### No implied constraints + +Issue [#809](https://github.com/carbon-language/carbon-lang/issues/809) +considered whether Carbon would support implied constraints. The conclusion was +that implied constraints was an important ergonomic improvement. The framing as +a rewrite to a `where` restriction that did not affect the generic type +parameter's unqualified API seemed like something that could be explained to +users. Potential problems where a specialization of a generic type might allow +that type to be instantiated for types that don't satisfy the normal constraints +will be considered in the future along with the specialization feature. + +We also considered the alternative where the user would need to explicitly opt +in to this behavior by adding `& auto` or `& implied_requirements` to their type +constraint, as in: + +``` +fn LookUp[KeyType:! Type & auto](hm: HashMap(KeyType, i32)*, + k: KeyType) -> i32; + +fn PrintValueOrDefault[KeyType:! Printable & auto, + ValueT:! Printable & HasDefault] + (map: HashMap(KeyType, ValueT), key: KeyT); +``` + +The feeling was that implied constraints were natural enough that they didn't +need to be signaled by additional marking, with the accompanying ergonomic and +brevity loss. + +### Type inequality constraints + +You might want an inequality type constraint, for example, to control overload +resolution: + +``` +fn F[T:! Type](x: T) -> T { return x; } +fn F(x: Bool) -> String { + if (x) return "True"; else return "False"; +} + +fn G[T:! Type where .Self != Bool](x: T) -> T { + // We need T != Bool for this to type check. + return F(x); +} +``` + +There are some problems with supporting this feature, however. +[Negative reasoning in general has been a source of difficulties in the implementation of Rust's type system](http://aturon.github.io/tech/2017/04/24/negative-chalk/). +This is a type of constraint that is more difficult to use since it would have +to be repeated by any generic caller, since nothing else can generically +establish two types are different. + +Our manual type equality approach will not be able to detect situations where +the two sides of the inequality in fact have to be equal due to a sequence of +equality constraints. In those situations, no type will be ever be able to +satisfy the inequality constraint. + +We may be able to overcome these difficulties, but we don't expect this feature +is required since neither Rust nor Swift support it. + +### `where .Self is ...` could act like an external impl + +We considered making `T:! A where .Self is B` mean something different than +`T:! A & B`. In particular, in the former case we considered whether `B` might +be considered to be implemented externally for `T`. The advantage of this +alternative is it gives a convenient way of not polluting the members of `T` +with the members of `B`, however it wasn't clear how common that use case would +be. Furthermore, it introduced an inconsistency with other associated types. For +example: + +- `T:! Container where .ElementType = i32` means that `T.ElementType` has type + `i32`. +- Following that, given `T:! Container where .ElementType is Printable` and + `x: T`, we expect that `x.Front().Print()` to be legal. +- To be consistent with the last point, it seems like given + `T:! Container where .Self is Printable` and `x: T`, then `x.Print()` should + be legal as well. + +This matches Rust, where `T: Container` is considered a synonym for +`T where T is Container`. + +[See the Discord discussion](https://discord.com/channels/655572317891461132/708431657849585705/895794334723637258). + +#### Other syntax for external impl + +In a +[follow up Discord discussion](https://discord.com/channels/655572317891461132/708431657849585705/902728293080530954), +we considered allowing developers to write `where .Foo as Bar is Type` to mean +"`.Foo` implements interface `Bar` externally." We would consider this feature +in the future once we saw a demonstrated need. + +### Other syntax for must be legal type constraints + +Instead of `where HashSet(.Self) is Type` to say `.Self` must satisfy the +constraints on being an argument to `HashSet`, we considered other syntaxes like +`where HashSet(.Self) legal` and `where HashSet(.Self)`. The current choice is +intended to be consistent with the syntax for "Parameterized type implements +interface" and how implied constraints ensure that parameters to types +automatically are given their needed constraints. + +### Using only `==` instead of also `=` + +Originally this proposal only used `==` for both setting an associated type to a +specific concrete type and saying it was equal to another type variable. The two +cases affected the type differently. In the specific concrete type case, the +original type-of-type for the associated type doesn't affect the API, you just +get the API of the type. When equating two type variables, though, the +unqualified member names are unaffected. The only change was some interfaces may +be implemented externally. + +In +[this Discord discussion](https://discord.com/channels/655572317891461132/708431657849585705/902713789735116821), +we decided it might be clearer to use two different operators to reflect the +different effect. The compiler or a tool can easily correct cases where the +developer used the wrong one. However, using `=` comes with the downside that it +doesn't follow the pattern of other constraints of looking like a boolean +expression returning `true` when the constraint is satisfied. We would consider +a different pair of operators here to address this point, perhaps `~` for type +variables and `==` for concrete type case. + +### Automatic type equality + +Other languages, such as Swift and Rust, don't require `observe` declarations or +casting for the compiler to recognize two types as transitively equal. The +problem is that there is know way to know how many equality constraints need to +be considered before being able to conclude whether two type expressions are +equal. In fact, the equality relations form a semi-group, where in general +deciding whether two sequences are equivalent is undecidable, see +[Swift type checking is undecidable - Discussion - Swift Forums](https://forums.swift.org/t/swift-type-checking-is-undecidable/39024). + +#### Restricted equality constraints + +One possible approach to this problem is to apply limitations to the equality +constraints developers are allowed to express. The goal is to identify some +restrictions such that: + +- We have a terminating algorithm to decide if a set of constraints meets the + restrictions. +- For constraints that meet the restrictions we have a terminating algorithm + for deciding which expressions are equal. +- Expected use cases satisfy the restrictions. + +The expected cases are things of these forms: + +- `X == Y` +- `X == Y.Z` +- `X == X.Y` +- `X == Y.X` +- and _some_ combinations and variations + +For example, here are some interfaces that have been translated to Carbon syntax +from Swift standard library protocols: + +``` +interface IteratorProtocol { + let Element:! Type; +} + +interface Sequence { + let Element:! Type; + let Iterator:! IteratorProtocol + where .Element == Element; +} + +interface Collection { + extends Sequence; + let Index:! Type; + let SubSequence:! Collection + where .Element == Element + and .SubSequence == SubSequence + and .Index == Index; + let Indices:! Collection + where .Element == Index + and .Index == Index + and .SubSequence == Indices; +} +``` + +One approach we considered is +[regular equivalence classes](p0818/regular_equivalence_classes.md), however we +have not yet been able to figure out how to ensure the algorithm terminates. +This is an approach we would like to reconsider if we find solutions to this +problem once can share this problem more widely. + +Other approaches we considered worked in simple cases but had requirements that +could not be validated, since for example they were equivalent to solving the +same word problem that makes transitive equality undecidable in general. + +#### No explicit restrictions + +If you allow the full undecidable set of `where` clauses, there are some +unpleasant options: + +- Possibly the compiler won't realize that two expressions are equal even + though they should be. +- Possibly the compiler will reject some interface declarations as "too + complicated", due to "running for too many steps", based on some arbitrary + threshold. + +Either way, the compiler will have to perform a lot more work at compile time, +slowing down builds. There is also a danger that composition of things that +separately work or incremental evolution of working code could end up over a +complexity or search depth threshold. In effect, there are still restrictions on +what combinations of equality constraints are allowed, it is just that those +restrictions are implicit in the specifics of the algorithm chosen and execution +limits used. + +One approach that has been suggested by +[Slava Pestov](https://forums.swift.org/u/Slava_Pestov) in the context of Swift, +is to +[formalize type equality as a term rewriting system](https://forums.swift.org/t/formalizing-swift-generics-as-a-term-rewriting-system/45175). +Then the equality constraints in an interface or function declaration can be +completed by running the Knuth-Bendix completion algorithm +([1](https://en.wikipedia.org/wiki/Knuth%E2%80%93Bendix_completion_algorithm#Description_of_the_algorithm_for_finitely_presented_monoids), +[2](https://academic.oup.com/comjnl/article/34/1/2/427931)) for some limited +number of steps. If the algorithm completes successfully, type expressions may +then be efficiently canonicalized. However the algorithm can fail, or fail to +terminate before hitting the threshold number of steps, in which case the source +code would have to be rejected. See +[this example implementation](https://gist.github.com/slavapestov/75dbec34f9eba5fb4a4a00b1ee520d0b). diff --git a/proposals/p0818/regular_equivalence_classes.md b/proposals/p0818/regular_equivalence_classes.md new file mode 100644 index 0000000000000..0e21dab831085 --- /dev/null +++ b/proposals/p0818/regular_equivalence_classes.md @@ -0,0 +1,226 @@ +# Regular equivalence classes + + + +## Problem + +For generics, users will define _interfaces_ that describe types. These +interfaces may have associated types (see +[Swift](https://docs.swift.org/swift-book/LanguageGuide/Generics.html#ID189), +[Rust](https://doc.rust-lang.org/rust-by-example/generics/assoc_items/types.html)): + +``` +interface Container { + let Elt:! Type; + let Slice:! Container; + let Subseq:! Container; +} +``` + +The next step is to express constraints between these associated types. + +- `Container.Slice.Elt == Container.Elt` +- `Container.Slice.Slice == Container.Slice` +- `Container.Subseq.Elt == Container.Elt` +- `Container.Subseq.Slice == Container.Slice` +- `Container.Subseq.Subseq == Container.Subseq` + +Languages commonly express these constraints using `where` clauses, as in +[Swift](https://docs.swift.org/swift-book/LanguageGuide/Generics.html#ID557) and +[Rust](https://doc.rust-lang.org/rust-by-example/generics/where.html), that +directly represent these constraints as equations. + +``` +interface Container { + let Elt:! Type; + let Slice:! Container where .Elt == Elt + and .Slice == Slice; + let Subseq:! Container where .Elt == Elt + and .Slice == Slice + and .Subseq == Subseq; +} +``` + +These relations give us a semi-group, where in general deciding whether two +sequences are equivalent is undecidable, see +[Swift type checking is undecidable - Discussion - Swift Forums](https://forums.swift.org/t/swift-type-checking-is-undecidable/39024). + +If you allow the full undecidable set of `where` clauses, there are some +unpleasant options: + +- Possibly the compiler won't realize that two expressions are equal even + though they should be. +- Possibly the compiler will reject some interface declarations as "too + complicated", due to "running for too many steps", based on some arbitrary + threshold. + +Furthermore, the algorithms are expensive and perform extensive searches. + +### Goal + +The goal is to identify some restrictions such that: + +- We have a terminating algorithm to decide if a set of constraints meets the + restrictions. +- For constraints that meet the restrictions we have a terminating algorithm + for deciding which expressions are equal. +- Expected use cases satisfy the restrictions. + +The expected cases are things of these forms: + +- `X == Y` +- `X == Y.Z` +- `X == X.Y` +- `X == Y.X` +- and _some_ combinations and variations + +For ease of exposition, I'm going to assume that all associated types are +represented by single letters, and omit the dots `.` between them. + +## Idea + +Observe that the equivalence class of words you can rewrite to in these expected +cases are described by a regular expression: + +- `X` can be rewritten to `(X|Y)` using the rewrite `X == Y` +- `X` can be rewritten to `XY*` using the rewrite `X == XY` +- `X` can be rewritten to `Y*X` using the rewrite `X == YX` + +These regular expressions can be used instead of the rewrite rules: anything +matching the regular expression can be rewritten to anything else matched by the +same regular expression. This means we can take any sequence of letters, replace +anything that matches the regular expression by the regular expression itself, +and get a regular expression that matches the original sequence of letters. We +can even get a shortest/smallest representative of the class by dropping all the +things with a `*` and picking the shortest/smallest between alternatives +separated by a `|`. + +### Question + +Given a set of rewrite equations, is there a terminating algorithm that either: + +- gives a finite set of regular expressions (or finite automata) that + describes their equivalence classes, or +- rejects the rewrites. + +We would be willing to reject cases where there are more equivalence classes +than rewrite rules. An example of this would be the rewrite rules `A == XY` and +`B == YZ` have equivalence classes `A|XY`, `B|YZ`, `XYZ|AZ|XB`. + +## Problem + +In particular, we've identified two operations for combining two finite automata +depending on how they overlap: + +- **Containment operation:** If we have automatas `X` and `Y`, where `Y` + completely matches a subsequence matched by `X`, then we need to extend `X` + to include all the rewrites offered by `Y`. For example, if `X` is `AB*` and + `Y` is `(B|C)`, then we can extend `X` to `A(B|C)*`. +- **Union operation:** If we have automatas `X` and `Y`, and there is a string + `ABC` such that `X` matches the prefix `AB` and `Y` matches the suffix `BC`, + then we need to make sure there is a rule for matching `ABC`. + +Even though it is straightforward to translate a single rewrite rule into a +regular expression that matches at least some portion of the equivalence class, +to match the entire equivalence classes we need to apply these two operations in +ways that can run into difficulties. + +One concern is that an individual regular expression might not cover the entire +equivalence for a rewrite rule. We may need to apply the union and containment +operations to combine the automata _with itself_. For example, we may convert +the rewrite rule `AB == BA` into the regular expression `AB|BA`, but this +doesn't completely describe the set of equivalence classes that this rule +creates, since it triggers the union operation twice to add `AAB|ABA|BAA` and +`ABB|BAB|BBA`. These would also trigger the union operation, and so on +indefinitely, so the rewrite rule `AB == BA` does not have a finite set of +regular equivalence classes. Similarly the rule `A == BCB` has an infinite +family of equivalence classes: `BCB|A`, `BCBCB|ACB|BCA`, +`BCBCBCB|ACBCB|ACA|BCACB|BCBCA`, ... + +The other concern is that even in the case that each rule by itself can be +faithfully translated into a regular expression that captures the equivalence +classes of the rewrite, attempts to combine multiple rules using the two +operations might never reach a fixed point. + +In both cases, we are looking for criteria that we can use to decide if the +application of the operations will terminate. If it would not terminate, then we +need to report an error to the user instead of applying those two operations +endlessly. + +## Examples + +### Regular single rules + +- Anything where there are no shared letters between the two sides, and no + side has a prefix matching a suffix + - `A == B` => `A|B` + - `A == BC` => `A|BC` + - `A == BBC` => `A|BBC` + - `A == BCC` => `A|BCC` +- `A == AB` => `AB*` +- `A == BA` => `B*A` +- `A == AA` => `AA*` or `A*A` or `A+` +- `A == ABC` => `A(BC)*` +- `A == ABB` => `A(BB)*` +- `A == AAA` => `A(AA)*` +- `A == BCA` => `(BC)*A` +- `A == BBA` => `(BB)*A` +- `A == ACA` => `A(CA)*` or `(AC)*A` + +### Regular combinations + +- `A==AB` + `A==AC` => `AB*` + `AC*` => `A(B|C)*` +- `A==AB` + `B==BC` => `AB*` + `BC*` => `A(B|C)*`, `BC*` with relation + `A(B|C)* BC* == A(B|C)*`; Note: `A == AB == ABC == AC` +- `A==AB` + `C==CB` => `AB*`, `CB*`, no relations +- `A==AB` + `A==CA` => `AB*` + `C*A` => `C*AB*` +- `A==AB` + `C==BC` => `AB*` + `B*C` => `AB*`, `B*C`, where if you have `AB*C` + it doesn't matter how you split the `B`s between `AB*` and `B*C`. +- `A==AB` + `B==AB` => `AB*` + `A*B` => `(A|B)+` +- `A==AB` + `B==CB` => `AB*` + `C*B` => `A(C*B)*`, `C*B` with `A(C*B)* C*B` -> + `A(C*B)*` +- `A==AB` + `B==CB` + `C==CD` => `AB*` + `C*B` + `CD*` => `A((CD*)*B)*`, + `(CD*)*B`, `CD*` +- `A==ABC` + `D==CB` => `A(BC)*` + `(CB|D)` => `A(BD*C)*`, `A(BD*C)*BD*`, + `(CB|D)` +- `A==ABBBBB` + `A==ABBBBBBBB` (or `A==AB^5` + `A==AB^8`)=> `A(BBBBB)*` + + `A(BBBBBBBB)*` => `AB*`, since `A=AB^8=AB^16=AB^11=AB^6=AB` + +### Not regular + +- `AB == BA`, has equivalence classes: `AB|BA`, `AAB|ABA|BAA`, `ABB|BAB|BBA`, + ... +- Similarly: `AB==C` + `C==BA` +- `ABC == CBA`, has equivalence classes: `ABC|CBA`, `ABABC|ABCBA|CBABA`, + `ABCBC|CBABC|CBCBA`, ... +- `A == BAB`, involves back references or counting the number of `B`s: + `A|BAB|BBABB|BBBABBB|`... +- `A == BAC`, involves matching the number of `B`s to `C`s: + `A|BAC|BBACC|BBBACCC|`... +- `A == BCB`, has equivalence classes: `BCB|A`, `BCBCB|ACB|BCA`, + `BCBCBCB|ACBCB|ACA|BCACB|BCBCA`, ... +- `A == BB`, has equivalence classes: `A|BB`, `BBB|AB|BA`, `(A|BB)(A|BB)|BAB`, + ... +- `A == BBB`, has equivalence classes: `A|BBB`, `BBBB|AB|BA`, + `BBBBB|ABB|BAB|BBA`, ... +- `AA == BB`, has equivalence classes: `AA|BB`, `AAA|ABB|BBA`, `BAA|BBB|AAB`, + ... +- `AB == AA`, has equivalence classes: `A(A|B)`, `A(A|B)(A|B)`, + `A(A|B)(A|B)(A|B)`, ... +- `AB == BB`, similarly +- `AB == BC`, has equivalence classes: `(AB|BC)`, `(AAB|ABC|BCC)`, ..., + `A`^i`BC`^(N-i) +- `A == AAB` and `A == BAA` + +## References + +See these notes from discussions: + +- [Regular equivalence classes 2021-09-23](https://docs.google.com/document/d/1iyL7ZDfWT6ZmDSz6Fp8MrLcl5nOQc4ItQbeL3QlVXYU/edit#) +- [Carbon minutes (rolling): Open discussions: 2021-09-27](https://docs.google.com/document/d/1cRrhRrmaUf2hVi2lFcHsYo2j0jI6t9RGZoYjWhRxp14/edit?resourcekey=0-xWHBEZ8zIqnJiB4yfBSLfA#heading=h.qh4b1gy7o5el) +- [Carbon minutes (rolling): Open discussions: 2021-09-28](https://docs.google.com/document/d/1cRrhRrmaUf2hVi2lFcHsYo2j0jI6t9RGZoYjWhRxp14/edit?resourcekey=0-xWHBEZ8zIqnJiB4yfBSLfA#heading=h.9a1r39mla9ho) +- [Regular equivalence classes 2 2021-10-01](https://docs.google.com/document/d/1xGDRSV6q534-CoDZnzuZJmzhty2yPki2ZfDDPLSHYek/edit#)