Skip to content

Commit

Permalink
Merge pull request scala#2 from sjrd/spec-union-types
Browse files Browse the repository at this point in the history
Integrate the spec of union and intersection types.
  • Loading branch information
sjrd authored Apr 28, 2023
2 parents d1bce3a + 52ed944 commit 6d16acf
Show file tree
Hide file tree
Showing 8 changed files with 178 additions and 362 deletions.
117 changes: 90 additions & 27 deletions docs/_spec/03-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,9 @@ chapter: 3

```ebnf
Type ::= FunctionArgTypes ‘=>’ Type
| InfixType [ExistentialClause]
| InfixType
FunctionArgTypes ::= InfixType
| ‘(’ [ ParamType {‘,’ ParamType } ] ‘)’
ExistentialClause ::= ‘forSome’ ‘{’ ExistentialDcl
{semi ExistentialDcl} ‘}’
ExistentialDcl ::= ‘type’ TypeDcl
| ‘val’ ValDcl
InfixType ::= CompoundType {id [nl] CompoundType}
CompoundType ::= AnnotType {‘with’ AnnotType} [Refinement]
| Refinement
Expand Down Expand Up @@ -325,12 +321,64 @@ An _infix type_ ´T_1´ `op` ´T_2´ consists of an infix operator `op` which ge
The type is equivalent to the type application `op`´[T_1, T_2]´.
The infix operator `op` may be an arbitrary identifier.

All type infix operators have the same precedence; parentheses have to be used for grouping.
The [associativity](06-expressions.html#prefix,-infix,-and-postfix-operations) of a type operator is determined as for term operators: type operators ending in a colon ‘:’ are right-associative; all other operators are left-associative.
Type operators follow the same [precedence and associativity as term operators](06-expressions.html#prefix-infix-and-postfix-operations).
For example, `A + B * C` parses as `A + (B * C)` and `A | B & C` parses as `A | (B & C)`.
Type operators ending in a colon ‘:’ are right-associative; all other operators are left-associative.

In a sequence of consecutive type infix operations ´t_0 \, \mathit{op} \, t_1 \, \mathit{op_2} \, ... \, \mathit{op_n} \, t_n´, all operators ´\mathit{op}\_1, ..., \mathit{op}\_n´ must have the same associativity.
If they are all left-associative, the sequence is interpreted as ´(... (t_0 \mathit{op_1} t_1) \mathit{op_2} ...) \mathit{op_n} t_n´, otherwise it is interpreted as ´t_0 \mathit{op_1} (t_1 \mathit{op_2} ( ... \mathit{op_n} t_n) ...)´.

The type operators `|` and `&` are not really special.
Nevertheless, unless shadowed, they resolve to `scala.|` and `scala.&`, which represent [union and intersection types](#union-and-intersection-types), respectively.

### Union and Intersection Types

Syntactically, the types `S | T` and `S & T` are infix types, where the infix operators are `|` and `&`, respectively (see above).

However, in this specification, ´S | T´ and ´S & T´ refer to the underlying core concepts of *union and intersection types*, respectively.

- The type ´S | T´ represents the set of values that are represented by *either* ´S´ or ´T´.
- The type ´S & T´ represents the set of values that are represented by *both* ´S´ and ´T´.

From the [conformance rules](#conformance) rules on union and intersection types, we can show that ´&´ and ´|´ are *commutative* and *associative*.
Moreover, `` is distributive over ``.
For any type ´A´, ´B´ and ´C´, all of the following relationships hold:

- ´A & B \equiv B & A´,
- ´A | B \equiv B | A´,
- ´(A & B) & C \equiv A & (B & C)´,
- ´(A | B) | C \equiv A | (B | C)´, and
- ´A & (B | C) \equiv (A & B) | (A & C)´.

If ´C´ is a type constructor, then ´C[A] & C[B]´ can be simplified using the following three rules:

- If ´C´ is covariant, ´C[A] & C[B] \equiv C[A & B]´
- If ´C´ is contravariant, ´C[A] & C[B] \equiv C[A | B]´
- If ´C´ is invariant, emit a compile error

From the above rules, we can derive the following conformance relationships:

- When ´C´ is covariant, ´C[A & B] <: C[A] & C[B]´.
- When ´C´ is contravariant, ´C[A | B] <: C[A] & C[B]´.

#### Join of a union type

In some situations, a union type might need to be widened to a non-union type.
For this purpose, we define the _join_ of a union type ´T_1 | ... | T_n´ as the smallest intersection type of base class instances of ´T_1, ..., T_n´.
Note that union types might still appear as type arguments in the resulting type, this guarantees that the join is always finite.

For example, given

```scala
trait C[+T]
trait D
trait E
class A extends C[A] with D
class B extends C[B] with D with E
```

The join of ´A | B´ is ´C[A | B] & D´

### Function Types

```ebnf
Expand Down Expand Up @@ -574,6 +622,8 @@ If ´T´ is a possibly parameterized class type, where ´T´'s class is defined
1. The _member bindings_ of a type ´T´ are
1. all bindings ´d´ such that there exists a type instance of some class ´C´ among the base types of ´T´ and there exists a definition or declaration ´d'´ in ´C´ such that ´d´ results from ´d'´ by replacing every type ´T'´ in ´d'´ by ´T'´ in ´C´ seen from ´T´, and
2. all bindings of the type's [refinement](#compound-types), if it has one.
2. The member bindinds of ´S & T´ are all the binds of ´S´ *and* all the bindins of ´T´.
3. The member bindings of ´S | T´ are the member bindings of its [join](#join-of-a-union-type).

The _definition_ of a type projection `S#T` is the member binding ´d_T´ of the type `T` in `S`.
In that case, we also say that `S#T` _is defined by_ ´d_T´.
Expand Down Expand Up @@ -630,6 +680,10 @@ The conformance relation ´(<:)´ is the smallest transitive relation that satis
1. If the ´i´'th type parameter of ´T´ is declared neither covariant nor contravariant, then ´U_i \equiv T_i´.
- A compound type `´T_1´ with ... with ´T_n´ {´R\,´}` conforms to each of its component types ´T_i´.
- If ´T <: U_i´ for ´i \in \{ 1, ..., n \}´ and for every binding ´d´ of a type or value ´x´ in ´R´ there exists a member binding of ´x´ in ´T´ which subsumes ´d´, then ´T´ conforms to the compound type `´U_1´ with ... with ´U_n´ {´R\,´}`.
- If ´T <: U´, then ´T <: U | W´ and ´T <: W | U´.
- If ´T <: W´ and ´U <: W´, then ´T | U <: W´.
- If ´T <: U´ and ´T <: W´, then ´T <: U & W´.
- If ´T <: W´, then ´T & U <: W´ and ´U & T <: W´.
- If ´T_i \equiv T_i'´ for ´i \in \{ 1, ..., n\}´ and ´U´ conforms to ´U'´ then the method type ´(p_1:T_1, ..., p_n:T_n) U´ conforms to ´(p_1':T_1', ..., p_n':T_n') U'´.
- The polymorphic type ´[a_1 >: L_1 <: U_1, ..., a_n >: L_n <: U_n] T´ conforms to the polymorphic type ´[a_1 >: L_1' <: U_1', ..., a_n >: L_n' <: U_n'] T'´ if, assuming ´L_1' <: a_1 <: U_1', ..., L_n' <: a_n <: U_n'´ one has ´T <: T'´ and ´L_i <: L_i'´ and ´U_i' <: U_i´ for ´i \in \{ 1, ..., n \}´.
- Type constructors ´T´ and ´T'´ follow a similar discipline.
Expand All @@ -648,28 +702,15 @@ A declaration or definition in some compound type of class type ´C´ _subsumes_
- A type declaration `type ´t´[´T_1´, ..., ´T_n´] >: ´L´ <: ´U´` subsumes a type declaration `type ´t´[´T_1´, ..., ´T_n´] >: ´L'´ <: ´U'´` if ´L' <: L´ and ´U <: U'´.
- A type or class definition that binds a type name ´t´ subsumes an abstract type declaration `type t[´T_1´, ..., ´T_n´] >: L <: U` if ´L <: t <: U´.


#### Least upper bounds and greatest lower bounds

The ´(<:)´ relation forms pre-order between types, i.e. it is transitive and reflexive.
This allows us to define _least upper bounds_ and _greatest lower bounds_ of a set of types in terms of that order.
The least upper bound or greatest lower bound of a set of types does not always exist.
For instance, consider the class definitions:

```scala
class A[+T] {}
class B extends A[B]
class C extends A[C]
```

Then the types `A[Any], A[A[Any]], A[A[A[Any]]], ...` form a descending sequence of upper bounds for `B` and `C`.
The least upper bound would be the infinite limit of that sequence, which does not exist as a Scala type.
Since cases like this are in general impossible to detect, a Scala compiler is free to reject a term which has a type specified as a least upper or greatest lower bound, and that bound would be more complex than some compiler-set limit [^4].

The least upper bound or greatest lower bound might also not be unique.
For instance `A with B` and `B with A` are both greatest lower bounds of `A` and `B`.
If there are several least upper bounds or greatest lower bounds, the Scala compiler is free to pick any one of them.
- the _least upper bound_ of `A` and `B` is the smallest type `L` such that `A` <: `L` and `B` <: `L`.
- the _greatest lower bound_ of `A` and `B` is the largest type `G` such that `G` <: `A` and `G` <: `B`.

[^4]: The current Scala compiler limits the nesting level of parameterization in such bounds to be at most two deeper than the maximum nesting level of the operand types
By construction, for all types `A` and `B`, the least upper bound of `A` and `B` is `A | B`, and their greatest lower bound is `A & B`.

### Weak Conformance

Expand Down Expand Up @@ -747,7 +788,29 @@ The erasure mapping is defined as follows.
- The erasure of a singleton type `´p´.type` is the erasure of the type of ´p´.
- The erasure of a type projection `´T´#´x´` is `|´T´|#´x´`.
- The erasure of a compound type `´T_1´ with ... with ´T_n´ {´R\,´}` is the erasure of the intersection dominator of ´T_1, ..., T_n´.
- The erasure of a union type ´S | T´ is the _erased least upper bound_ (_elub_) of the erasures of ´S´ and ´T´.
- The erasure of an intersection type ´S & T´ is the _eglb_ (erased greatest lower bound) of the erasures of ´S´ and ´T´.

The erased LUB is computed as follows:

- if both argument are arrays of objects, an array of the erased LUB of the element types
- if both arguments are arrays of same primitives, an array of this primitive
- if one argument is array of primitives and the other is array of objects, [`Object`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html)
- if one argument is an array, [`Object`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html)
- otherwise a common superclass or trait S of the argument classes, with the following two properties:
- S is minimal: no other common superclass or trait derives from S, and
- S is last: in the linearization of the first argument type ´|A|´ there are no minimal common superclasses or traits that come after S.
The reason to pick last is that we prefer classes over traits that way, which leads to more predictable bytecode and (?) faster dynamic dispatch.

The _intersection dominator_ of a list of types ´T_1, ..., T_n´ is computed as follows.
Let ´T_{i_1}, ..., T_{i_m}´ be the subsequence of types ´T_i´ which are not supertypes of some other type ´T_j´.
If this subsequence contains a type designator ´T_c´ that refers to a class which is not a trait, the intersection dominator is ´T_c´. Otherwise, the intersection dominator is the first element of the subsequence, ´T_{i_1}´.
The rules for ´eglb(A, B)´ are given below in pseudocode:

```
eglb(scala.Array[A], JArray[B]) = scala.Array[eglb(A, B)]
eglb(scala.Array[T], _) = scala.Array[T]
eglb(_, scala.Array[T]) = scala.Array[T]
eglb(A, B) = A if A extends B
eglb(A, B) = B if B extends A
eglb(A, _) = A if A is not a trait
eglb(_, B) = B if B is not a trait
eglb(A, _) = A // use first
```
10 changes: 5 additions & 5 deletions docs/_spec/08-pattern-matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,10 +113,10 @@ Taking XML as an example
implicit class XMLinterpolation(s: StringContext) = {
object xml {
def apply(exprs: Any*) =
// parse ‘s’ and build an XML tree with ‘exprs’
// parse ‘s’ and build an XML tree with ‘exprs’
//in the holes
def unapplySeq(xml: Node): Option[Seq[Node]] =
// match `s’ against `xml’ tree and produce
// match `s’ against `xml’ tree and produce
//subtrees in holes
}
}
Expand Down Expand Up @@ -503,7 +503,7 @@ Each case consists of a (possibly guarded) pattern ´p_i´ and a block ´b_i´.
Each ´p_i´ might be complemented by a guard `if ´e´` where ´e´ is a boolean expression.
The scope of the pattern variables in ´p_i´ comprises the pattern's guard and the corresponding block ´b_i´.

Let ´T´ be the type of the selector expression ´e´ and let ´a_1, ..., a_m´ be the type parameters of all methods enclosing the pattern matching expression.
Let ´T´ be the type of the selector expression ´e´ and let ´a_1, ..., a_m´ be the type parameters of all methods enclosing the pattern matching expression.
For every ´a_i´, let ´L_i´ be its lower bound and ´U_i´ be its higher bound.
Every pattern ´p \in \{p_1,, ..., p_n\}´ can be typed in two ways.
First, it is attempted to type ´p´ with ´T´ as its expected type.
Expand Down Expand Up @@ -537,7 +537,7 @@ In the interest of efficiency the evaluation of a pattern matching expression ma
This might affect evaluation through side effects in guards.
However, it is guaranteed that a guard expression is evaluated only if the pattern it guards matches.

If the selector of a pattern match is an instance of a [`sealed` class](05-classes-and-objects.html#modifiers), the compilation of pattern matching can emit warnings which diagnose that a given set of patterns is not exhaustive, i.e. that there is a possibility of a `MatchError` being raised at run-time.
If the selector of a pattern match is an instance of a [`sealed` class](05-classes-and-objects.html#modifiers), a [union type](03-types#union-and-intersection-types), or a combination thereof, the compilation of pattern matching can emit warnings which diagnose that a given set of patterns is not exhaustive, i.e. that there is a possibility of a `MatchError` being raised at run-time.

###### Example

Expand Down Expand Up @@ -569,7 +569,7 @@ def eval[T](t: Term[T]): T = t match {

Note that the evaluator makes crucial use of the fact that type parameters of enclosing methods can acquire new bounds through pattern matching.

For instance, the type of the pattern in the second case, `Succ(u)`, is `Int`.
For instance, the type of the pattern in the second case, `Succ(u)`, is `Int`.
It conforms to the selector type `T` only if we assume an upper and lower bound of `Int` for `T`.
Under the assumption `Int <: T <: Int` we can also verify that the type right hand side of the second case, `Int` conforms to its expected type, `T`.

Expand Down
7 changes: 6 additions & 1 deletion docs/_spec/A2-scala-2-compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,9 @@ def f(): Unit = { ... }
Scala 3 accepts the old syntax under the `-source:3.0-migration` option.
If the `-migration` option is set, it can even rewrite old syntax to new.
The [Scalafix](https://scalacenter.github.io/scalafix/) tool also
can rewrite procedure syntax to make it Scala 3 compatible.
can rewrite procedure syntax to make it Scala 3 compatible.

## Compound Types (`with`)

Intersection types `A & B` replace compound types `A with B` in Scala 2.
For the moment, the syntax `A with B` is still allowed and interpreted as `A & B`, but its usage as a type (as opposed to in a `new` or `extends` clause) will be deprecated and removed in the future.
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,3 @@ to give at that point a definition of a `children` method with the required type
class C extends A, B:
def children: List[A & B] = ???
```


[More details](./intersection-types-spec.md)
77 changes: 77 additions & 0 deletions docs/_spec/APPLIEDreference/new-types/union-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
layout: doc-page
title: "Union Types"
nightlyOf: https://docs.scala-lang.org/scala3/reference/new-types/union-types.html
---

A union type `A | B` has as values all values of type `A` and also all values of type `B`.


```scala
case class UserName(name: String)
case class Password(hash: Hash)

def help(id: UserName | Password) =
val user = id match
case UserName(name) => lookupName(name)
case Password(hash) => lookupPassword(hash)
...
```

Union types are duals of intersection types. `|` is _commutative_:
`A | B` is the same type as `B | A`.

The compiler will assign a union type to an expression only if such a
type is explicitly given. This can be seen in the following [REPL](https://docs.scala-lang.org/overviews/repl/overview.html) transcript:

```scala
scala> val password = Password(123)
val password: Password = Password(123)

scala> val name = UserName("Eve")
val name: UserName = UserName(Eve)

scala> if true then name else password
val res2: Object = UserName(Eve)

scala> val either: Password | UserName = if true then name else password
val either: Password | UserName = UserName(Eve)
```

The type of `res2` is `Object & Product`, which is a supertype of
`UserName` and `Password`, but not the least supertype `Password |
UserName`. If we want the least supertype, we have to give it
explicitly, as is done for the type of `either`.

## Type inference

When inferring the result type of a definition (`val`, `var`, or `def`) and the
type we are about to infer is a union type, then we replace it by its join.
Similarly, when instantiating a type argument, if the corresponding type
parameter is not upper-bounded by a union type and the type we are about to
instantiate is a union type, we replace it by its join. This mirrors the
treatment of singleton types which are also widened to their underlying type
unless explicitly specified. The motivation is the same: inferring types
which are "too precise" can lead to unintuitive typechecking issues later on.

**Note:** Since this behavior limits the usability of union types, it might
be changed in the future. For example by not widening unions that have been
explicitly written down by the user and not inferred, or by not widening a type
argument when the corresponding type parameter is covariant.

See [PR #2330](https://github.com/lampepfl/dotty/pull/2330) and
[Issue #4867](https://github.com/lampepfl/dotty/issues/4867) for further discussions.

### Example

```scala
import scala.collection.mutable.ListBuffer
val x = ListBuffer(Right("foo"), Left(0))
val y: ListBuffer[Either[Int, String]] = x
```

This code typechecks because the inferred type argument to `ListBuffer` in the
right-hand side of `x` was `Left[Int, Nothing] | Right[Nothing, String]` which
was widened to `Either[Int, String]`. If the compiler hadn't done this widening,
the last line wouldn't typecheck because `ListBuffer` is invariant in its
argument.
Loading

0 comments on commit 6d16acf

Please sign in to comment.