Merge pull request scala#2 from sjrd/spec-union-types

Integrate the spec of union and intersection types.
smarter · Apr 28, 2023 · 6d16acf · 6d16acf
2 parents d1bce3a + 52ed944
commit 6d16acf
Show file tree

Hide file tree

Showing 8 changed files with 178 additions and 362 deletions.
diff --git a/docs/_spec/03-types.md b/docs/_spec/03-types.md
@@ -8,13 +8,9 @@ chapter: 3
 
 ```ebnf
   Type              ::=  FunctionArgTypes ‘=>’ Type
-                      |  InfixType [ExistentialClause]
+                      |  InfixType
   FunctionArgTypes  ::=  InfixType
                       |  ‘(’ [ ParamType {‘,’ ParamType } ] ‘)’
-  ExistentialClause ::=  ‘forSome’ ‘{’ ExistentialDcl
-                             {semi ExistentialDcl} ‘}’
-  ExistentialDcl    ::=  ‘type’ TypeDcl
-                      |  ‘val’ ValDcl
   InfixType         ::=  CompoundType {id [nl] CompoundType}
   CompoundType      ::=  AnnotType {‘with’ AnnotType} [Refinement]
                       |  Refinement
@@ -325,12 +321,64 @@ An _infix type_ ´T_1´ `op` ´T_2´ consists of an infix operator `op` which ge
 The type is equivalent to the type application `op`´[T_1, T_2]´.
 The infix operator `op` may be an arbitrary identifier.
 
-All type infix operators have the same precedence; parentheses have to be used for grouping.
-The [associativity](06-expressions.html#prefix,-infix,-and-postfix-operations) of a type operator is determined as for term operators: type operators ending in a colon ‘:’ are right-associative; all other operators are left-associative.
+Type operators follow the same [precedence and associativity as term operators](06-expressions.html#prefix-infix-and-postfix-operations).
+For example, `A + B * C` parses as `A + (B * C)` and `A | B & C` parses as `A | (B & C)`.
+Type operators ending in a colon ‘:’ are right-associative; all other operators are left-associative.
 
 In a sequence of consecutive type infix operations ´t_0 \, \mathit{op} \, t_1 \, \mathit{op_2} \, ... \, \mathit{op_n} \, t_n´, all operators ´\mathit{op}\_1, ..., \mathit{op}\_n´ must have the same associativity.
 If they are all left-associative, the sequence is interpreted as ´(... (t_0 \mathit{op_1} t_1) \mathit{op_2} ...) \mathit{op_n} t_n´, otherwise it is interpreted as ´t_0 \mathit{op_1} (t_1 \mathit{op_2} ( ... \mathit{op_n} t_n) ...)´.
 
+The type operators `|` and `&` are not really special.
+Nevertheless, unless shadowed, they resolve to `scala.|` and `scala.&`, which represent [union and intersection types](#union-and-intersection-types), respectively.
+
+### Union and Intersection Types
+
+Syntactically, the types `S | T` and `S & T` are infix types, where the infix operators are `|` and `&`, respectively (see above).
+
+However, in this specification, ´S ｜ T´ and ´S ＆ T´ refer to the underlying core concepts of *union and intersection types*, respectively.
+
+- The type ´S ｜ T´ represents the set of values that are represented by *either* ´S´ or ´T´.
+- The type ´S ＆ T´ represents the set of values that are represented by *both* ´S´ and ´T´.
+
+From the [conformance rules](#conformance) rules on union and intersection types, we can show that ´＆´ and ´｜´ are *commutative* and *associative*.
+Moreover, `＆` is distributive over `｜`.
+For any type ´A´, ´B´ and ´C´, all of the following relationships hold:
+
+- ´A ＆ B \equiv B ＆ A´,
+- ´A ｜ B \equiv B ｜ A´,
+- ´(A ＆ B) ＆ C \equiv A ＆ (B ＆ C)´,
+- ´(A ｜ B) ｜ C \equiv A ｜ (B ｜ C)´, and
+- ´A ＆ (B ｜ C) \equiv (A ＆ B) ｜ (A ＆ C)´.
+
+If ´C´ is a type constructor, then ´C[A] ＆ C[B]´ can be simplified using the following three rules:
+
+- If ´C´ is covariant, ´C[A] ＆ C[B] \equiv C[A ＆ B]´
+- If ´C´ is contravariant, ´C[A] ＆ C[B] \equiv C[A | B]´
+- If ´C´ is invariant, emit a compile error
+
+From the above rules, we can derive the following conformance relationships:
+
+- When ´C´ is covariant, ´C[A ＆ B] <: C[A] ＆ C[B]´.
+- When ´C´ is contravariant, ´C[A ｜ B] <: C[A] ＆ C[B]´.
+
+#### Join of a union type
+
+In some situations, a union type might need to be widened to a non-union type.
+For this purpose, we define the _join_ of a union type ´T_1 ｜ ... ｜ T_n´ as the smallest intersection type of base class instances of ´T_1, ..., T_n´.
+Note that union types might still appear as type arguments in the resulting type, this guarantees that the join is always finite.
+
+For example, given
+
+```scala
+trait C[+T]
+trait D
+trait E
+class A extends C[A] with D
+class B extends C[B] with D with E
+```
+
+The join of ´A ｜ B´ is ´C[A ｜ B] ＆ D´
+
 ### Function Types
 
 ```ebnf
@@ -574,6 +622,8 @@ If ´T´ is a possibly parameterized class type, where ´T´'s class is defined
 1. The _member bindings_ of a type ´T´ are
   1. all bindings ´d´ such that there exists a type instance of some class ´C´ among the base types of ´T´ and there exists a definition or declaration ´d'´ in ´C´ such that ´d´ results from ´d'´ by replacing every type ´T'´ in ´d'´ by ´T'´ in ´C´ seen from ´T´, and
   2. all bindings of the type's [refinement](#compound-types), if it has one.
+2. The member bindinds of ´S ＆ T´ are all the binds of ´S´ *and* all the bindins of ´T´.
+3. The member bindings of ´S ｜ T´ are the member bindings of its [join](#join-of-a-union-type).
 
 The _definition_ of a type projection `S#T` is the member binding ´d_T´ of the type `T` in `S`.
 In that case, we also say that `S#T` _is defined by_ ´d_T´.
@@ -630,6 +680,10 @@ The conformance relation ´(<:)´ is the smallest transitive relation that satis
   1. If the ´i´'th type parameter of ´T´ is declared neither covariant nor contravariant, then ´U_i \equiv T_i´.
 - A compound type `´T_1´ with ... with ´T_n´ {´R\,´}` conforms to each of its component types ´T_i´.
 - If ´T <: U_i´ for ´i \in \{ 1, ..., n \}´ and for every binding ´d´ of a type or value ´x´ in ´R´ there exists a member binding of ´x´ in ´T´ which subsumes ´d´, then ´T´ conforms to the compound type `´U_1´ with ... with ´U_n´ {´R\,´}`.
+- If ´T <: U´, then ´T <: U ｜ W´ and ´T <: W ｜ U´.
+- If ´T <: W´ and ´U <: W´, then ´T ｜ U <: W´.
+- If ´T <: U´ and ´T <: W´, then ´T <: U ＆ W´.
+- If ´T <: W´, then ´T ＆ U <: W´ and ´U ＆ T <: W´.
 - If ´T_i \equiv T_i'´ for ´i \in \{ 1, ..., n\}´ and ´U´ conforms to ´U'´ then the method type ´(p_1:T_1, ..., p_n:T_n) U´ conforms to ´(p_1':T_1', ..., p_n':T_n') U'´.
 - The polymorphic type ´[a_1 >: L_1 <: U_1, ..., a_n >: L_n <: U_n] T´ conforms to the polymorphic type ´[a_1 >: L_1' <: U_1', ..., a_n >: L_n' <: U_n'] T'´ if, assuming ´L_1' <: a_1 <: U_1', ..., L_n' <: a_n <: U_n'´ one has ´T <: T'´ and ´L_i <: L_i'´ and ´U_i' <: U_i´ for ´i \in \{ 1, ..., n \}´.
 - Type constructors ´T´ and ´T'´ follow a similar discipline.
@@ -648,28 +702,15 @@ A declaration or definition in some compound type of class type ´C´ _subsumes_
 - A type declaration `type ´t´[´T_1´, ..., ´T_n´] >: ´L´ <: ´U´` subsumes a type declaration `type ´t´[´T_1´, ..., ´T_n´] >: ´L'´ <: ´U'´` if ´L' <: L´ and ´U <: U'´.
 - A type or class definition that binds a type name ´t´ subsumes an abstract type declaration `type t[´T_1´, ..., ´T_n´] >: L <: U` if ´L <: t <: U´.
 
-
 #### Least upper bounds and greatest lower bounds
+
 The ´(<:)´ relation forms pre-order between types, i.e. it is transitive and reflexive.
 This allows us to define _least upper bounds_ and _greatest lower bounds_ of a set of types in terms of that order.
-The least upper bound or greatest lower bound of a set of types does not always exist.
-For instance, consider the class definitions:
-
-```scala
-class A[+T] {}
-class B extends A[B]
-class C extends A[C]
-```
-
-Then the types `A[Any], A[A[Any]], A[A[A[Any]]], ...` form a descending sequence of upper bounds for `B` and `C`.
-The least upper bound would be the infinite limit of that sequence, which does not exist as a Scala type.
-Since cases like this are in general impossible to detect, a Scala compiler is free to reject a term which has a type specified as a least upper or greatest lower bound, and that bound would be more complex than some compiler-set limit [^4].
 
-The least upper bound or greatest lower bound might also not be unique.
-For instance `A with B` and `B with A` are both greatest lower bounds of `A` and `B`.
-If there are several least upper bounds or greatest lower bounds, the Scala compiler is free to pick any one of them.
+- the _least upper bound_ of `A` and `B` is the smallest type `L` such that `A` <: `L` and `B` <: `L`.
+- the _greatest lower bound_ of `A` and `B` is the largest type `G` such that `G` <: `A` and `G` <: `B`.
 
-[^4]: The current Scala compiler limits the nesting level of parameterization in such bounds to be at most two deeper than the maximum nesting level of the operand types
+By construction, for all types `A` and `B`, the least upper bound of `A` and `B` is `A ｜ B`, and their greatest lower bound is `A ＆ B`.
 
 ### Weak Conformance
 
@@ -747,7 +788,29 @@ The erasure mapping is defined as follows.
 - The erasure of a singleton type `´p´.type` is the erasure of the type of ´p´.
 - The erasure of a type projection `´T´#´x´` is `|´T´|#´x´`.
 - The erasure of a compound type `´T_1´ with ... with ´T_n´ {´R\,´}` is the erasure of the intersection dominator of ´T_1, ..., T_n´.
+- The erasure of a union type ´S ｜ T´ is the _erased least upper bound_ (_elub_) of the erasures of ´S´ and ´T´.
+- The erasure of an intersection type ´S ＆ T´ is the _eglb_ (erased greatest lower bound) of the erasures of ´S´ and ´T´.
+
+The erased LUB is computed as follows:
+
+- if both argument are arrays of objects, an array of the erased LUB of the element types
+- if both arguments are arrays of same primitives, an array of this primitive
+- if one argument is array of primitives and the other is array of objects, [`Object`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html)
+- if one argument is an array, [`Object`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html)
+- otherwise a common superclass or trait S of the argument classes, with the following two properties:
+  - S is minimal: no other common superclass or trait derives from S, and
+  - S is last: in the linearization of the first argument type ´|A|´ there are no minimal common superclasses or traits that come after S.
+  The reason to pick last is that we prefer classes over traits that way, which leads to more predictable bytecode and (?) faster dynamic dispatch.
 
-The _intersection dominator_ of a list of types ´T_1, ..., T_n´ is computed as follows.
-Let ´T_{i_1}, ..., T_{i_m}´ be the subsequence of types ´T_i´ which are not supertypes of some other type ´T_j´.
-If this subsequence contains a type designator ´T_c´ that refers to a class which is not a trait, the intersection dominator is ´T_c´. Otherwise, the intersection dominator is the first element of the subsequence, ´T_{i_1}´.
+The rules for ´eglb(A, B)´ are given below in pseudocode:
+
+```
+eglb(scala.Array[A], JArray[B]) = scala.Array[eglb(A, B)]
+eglb(scala.Array[T], _)         = scala.Array[T]
+eglb(_, scala.Array[T])         = scala.Array[T]
+eglb(A, B)                      = A                     if A extends B
+eglb(A, B)                      = B                     if B extends A
+eglb(A, _)                      = A                     if A is not a trait
+eglb(_, B)                      = B                     if B is not a trait
+eglb(A, _)                      = A                     // use first
+```
diff --git a/docs/_spec/08-pattern-matching.md b/docs/_spec/08-pattern-matching.md
@@ -113,10 +113,10 @@ Taking XML as an example
 implicit class XMLinterpolation(s: StringContext) = {
     object xml {
         def apply(exprs: Any*) =
-            // parse ‘s’ and build an XML tree with ‘exprs’ 
+            // parse ‘s’ and build an XML tree with ‘exprs’
             //in the holes
         def unapplySeq(xml: Node): Option[Seq[Node]] =
-          // match `s’ against `xml’ tree and produce 
+          // match `s’ against `xml’ tree and produce
           //subtrees in holes
     }
 }
@@ -503,7 +503,7 @@ Each case consists of a (possibly guarded) pattern ´p_i´ and a block ´b_i´.
 Each ´p_i´ might be complemented by a guard `if ´e´` where ´e´ is a boolean expression.
 The scope of the pattern variables in ´p_i´ comprises the pattern's guard and the corresponding block ´b_i´.
 
-Let ´T´ be the type of the selector expression ´e´ and let ´a_1, ..., a_m´ be the type parameters of all methods enclosing the pattern matching expression.  
+Let ´T´ be the type of the selector expression ´e´ and let ´a_1, ..., a_m´ be the type parameters of all methods enclosing the pattern matching expression.
 For every ´a_i´, let ´L_i´ be its lower bound and ´U_i´ be its higher bound.
 Every pattern ´p \in \{p_1,, ..., p_n\}´ can be typed in two ways.
 First, it is attempted to type ´p´ with ´T´ as its expected type.
@@ -537,7 +537,7 @@ In the interest of efficiency the evaluation of a pattern matching expression ma
 This might affect evaluation through side effects in guards.
 However, it is guaranteed that a guard expression is evaluated only if the pattern it guards matches.
 
-If the selector of a pattern match is an instance of a [`sealed` class](05-classes-and-objects.html#modifiers), the compilation of pattern matching can emit warnings which diagnose that a given set of patterns is not exhaustive, i.e. that there is a possibility of a `MatchError` being raised at run-time.
+If the selector of a pattern match is an instance of a [`sealed` class](05-classes-and-objects.html#modifiers), a [union type](03-types#union-and-intersection-types), or a combination thereof, the compilation of pattern matching can emit warnings which diagnose that a given set of patterns is not exhaustive, i.e. that there is a possibility of a `MatchError` being raised at run-time.
 
 ###### Example
 
@@ -569,7 +569,7 @@ def eval[T](t: Term[T]): T = t match {
 
 Note that the evaluator makes crucial use of the fact that type parameters of enclosing methods can acquire new bounds through pattern matching.
 
-For instance, the type of the pattern in the second case, `Succ(u)`, is `Int`. 
+For instance, the type of the pattern in the second case, `Succ(u)`, is `Int`.
 It conforms to the selector type `T` only if we assume an upper and lower bound of `Int` for `T`.
 Under the assumption `Int <: T <: Int` we can also verify that the type right hand side of the second case, `Int` conforms to its expected type, `T`.
 

diff --git a/docs/_spec/A2-scala-2-compatibility.md b/docs/_spec/A2-scala-2-compatibility.md
@@ -32,4 +32,9 @@ def f(): Unit = { ... }
 Scala 3 accepts the old syntax under the `-source:3.0-migration` option.
 If the `-migration` option is set, it can even rewrite old syntax to new.
 The [Scalafix](https://scalacenter.github.io/scalafix/) tool also
-can rewrite procedure syntax to make it Scala 3 compatible.
+can rewrite procedure syntax to make it Scala 3 compatible.
+
+## Compound Types (`with`)
+
+Intersection types `A & B` replace compound types `A with B` in Scala 2.
+For the moment, the syntax `A with B` is still allowed and interpreted as `A & B`, but its usage as a type (as opposed to in a `new` or `extends` clause) will be deprecated and removed in the future.
diff --git a/...reference/new-types/intersection-types.md → ...reference/new-types/intersection-types.md b/...reference/new-types/intersection-types.md → ...reference/new-types/intersection-types.md
@@ -63,6 +63,3 @@ to give at that point a definition of a `children` method with the required type
 class C extends A, B:
   def children: List[A & B] = ???
 ```
-
-
-[More details](./intersection-types-spec.md)
diff --git a/docs/_spec/APPLIEDreference/new-types/union-types.md b/docs/_spec/APPLIEDreference/new-types/union-types.md
@@ -0,0 +1,77 @@
+---
+layout: doc-page
+title: "Union Types"
+nightlyOf: https://docs.scala-lang.org/scala3/reference/new-types/union-types.html
+---
+
+A union type `A | B` has as values all values of type `A` and also all values of type `B`.
+
+
+```scala
+case class UserName(name: String)
+case class Password(hash: Hash)
+
+def help(id: UserName | Password) =
+  val user = id match
+    case UserName(name) => lookupName(name)
+    case Password(hash) => lookupPassword(hash)
+  ...
+```
+
+Union types are duals of intersection types. `|` is _commutative_:
+`A | B` is the same type as `B | A`.
+
+The compiler will assign a union type to an expression only if such a
+type is explicitly given. This can be seen in the following [REPL](https://docs.scala-lang.org/overviews/repl/overview.html) transcript:
+
+```scala
+scala> val password = Password(123)
+val password: Password = Password(123)
+
+scala> val name = UserName("Eve")
+val name: UserName = UserName(Eve)
+
+scala> if true then name else password
+val res2: Object = UserName(Eve)
+
+scala> val either: Password | UserName = if true then name else password
+val either: Password | UserName = UserName(Eve)
+```
+
+The type of `res2` is `Object & Product`, which is a supertype of
+`UserName` and `Password`, but not the least supertype `Password |
+UserName`.  If we want the least supertype, we have to give it
+explicitly, as is done for the type of `either`.
+
+## Type inference
+
+When inferring the result type of a definition (`val`, `var`, or `def`) and the
+type we are about to infer is a union type, then we replace it by its join.
+Similarly, when instantiating a type argument, if the corresponding type
+parameter is not upper-bounded by a union type and the type we are about to
+instantiate is a union type, we replace it by its join. This mirrors the
+treatment of singleton types which are also widened to their underlying type
+unless explicitly specified. The motivation is the same: inferring types
+which are "too precise" can lead to unintuitive typechecking issues later on.
+
+**Note:** Since this behavior limits the usability of union types, it might
+be changed in the future. For example by not widening unions that have been
+explicitly written down by the user and not inferred, or by not widening a type
+argument when the corresponding type parameter is covariant.
+
+See [PR #2330](https://github.com/lampepfl/dotty/pull/2330) and
+[Issue #4867](https://github.com/lampepfl/dotty/issues/4867) for further discussions.
+
+### Example
+
+```scala
+import scala.collection.mutable.ListBuffer
+val x = ListBuffer(Right("foo"), Left(0))
+val y: ListBuffer[Either[Int, String]] = x
+```
+
+This code typechecks because the inferred type argument to `ListBuffer` in the
+right-hand side of `x` was `Left[Int, Nothing] | Right[Nothing, String]` which
+was widened to `Either[Int, String]`. If the compiler hadn't done this widening,
+the last line wouldn't typecheck because `ListBuffer` is invariant in its
+argument.