diff --git a/proposals/p0702.md b/proposals/p0702.md new file mode 100644 index 0000000000000..e6ca7b1e873f4 --- /dev/null +++ b/proposals/p0702.md @@ -0,0 +1,512 @@ +# Comparison operators + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/702) + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) + - [Terminology](#terminology) + - [Usage in existing languages](#usage-in-existing-languages) + - [Three-way comparisons](#three-way-comparisons) + - [Chained comparisons](#chained-comparisons) +- [Proposal](#proposal) +- [Details](#details) + - [Precedence](#precedence) + - [Associativity](#associativity) + - [Built-in comparisons and implicit conversions](#built-in-comparisons-and-implicit-conversions) + - [Consistency with implicit conversions](#consistency-with-implicit-conversions) + - [Comparisons with constants](#comparisons-with-constants) + - [Performance](#performance) + - [Overloading](#overloading) + - [Default implementations for basic types](#default-implementations-for-basic-types) +- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) +- [Alternatives considered](#alternatives-considered) + - [Alternative symbols](#alternative-symbols) + - [Chained comparisons](#chained-comparisons-1) + - [Convert operands like C++](#convert-operands-like-c) + - [Provide a three-way comparison operator](#provide-a-three-way-comparison-operator) + - [Allow comparisons as the operand of `not`](#allow-comparisons-as-the-operand-of-not) + + + +## Problem + +We need to be able to compare values for equality, and to compare ordered values +for relative ordering. + +## Background + +### Terminology + +We refer to tests that check whether two values are the same or different as +_equality_ comparisons, and to tests that determine the relative ordering of two +values as _relational_ comparisons. + +### Usage in existing languages + +There is near-universal convention on the use of the following symbols for +relational operators: + +- `<`, `<=`, `>`, and `>=` perform ordered comparisons (less than, less than + or equal to, greater than, greater than or equal to). + +There are rare exceptions in somewhat esoteric languages: some languages use `≤` +and `≥`, but these are not straightforward to type for many potential Carbon +developers. + +For equality operators, there is some divergence but still a very strong trend: + +- C-family languages, Rust, Swift, Kotlin, Zig, Nim, Ruby, etc. use `==` for + equality comparison and `!=` for inequality comparison. +- Some languages, such as ALGOL, APL, BASIC, and PL/I, use `=` as equality + comparison, with some using a different symbol (such as `:=` or `<-`) for + assignment and others distinguishing assignment from equality comparison + based on context. +- Haskell and Fortran use `==` for "equal to" and `/=` for "not equal to". The + latter is intended to resemble a ≠ symbol. +- Some languages, such as Pascal and BASIC, use `<>` for inequality + comparison. Python 2 permits this as a synonym for `!=`. +- Perl uses `eq` and `ne` for string comparisons; some shells and UNIX `test` + use `-eq` and `-ne` for for integer comparisons. + +Some languages support multiple different kinds of equality comparison, such as +both a value comparison (typically `==`) and an object identity comparison +(typically `===` or `is`). Some languages that freely convert between numbers +and strings have different operators to perform a string comparison versus a +numeric comparison. Fortran has custom `.eqv.` and `.neqv.` for equality +comparisons of Boolean values. + +Some languages have synonyms for equality operators. For example, Fortran allows +`.eq.`, `.ne.`, `.gt.`, and so on, as synonyms for `==`, `/=`, `>`, and so on. +This appears to be historical: FORTRAN 77 had only the dotted forms of these +operators. + +### Three-way comparisons + +C++ has three-way comparisons, written using the `<=>` operator. These provide a +useful mechanism to allow overloading the behavior of relational comparisons +without defining four separate operator overloads for relational comparisons. + +Similarly, Python provides a `__cmp__` special method that can be used to +implement all equality and relational comparisons. + +### Chained comparisons + +Python permits comparisons to be chained: that is, `a < b <= c` is interpreted +as `a < b and b <= c`, except that `b` is evaluated only once. In most C-family +languages, that expression is instead interpreted as `(a < b) <= c`, which +computes the value of `a < b`, maps `false` to `0` and `true` to `1`, then +compares the result to `c`. + +## Proposal + +Carbon will provide the following operators: + +- Equality comparison operators: `==` and `!=`. +- Relational comparison operators: `<`, `<=`, `>`, `>=`. + +Each has the obvious mathematical meaning, where `==` means =, `!=` means ≠, +`<=` means ≤, and `>=` means ≥. + +There will be no three-way comparison operator symbol. The interface used to +support overloading comparison operators will provide a named function to +perform three-way comparisons. + +Chained comparisons are an error: a comparison expression cannot appear as an +unparenthesized operand of another comparison operator. + +For built-in types, we will follow these rules: + +- Behave consistently with implicit conversions: if an operation is valid + between built-in types `T` and `U`, then it is valid between built-in types + that implicitly convert to `T` and `U`. +- Never invent an intermediate type that is larger than both operand types. +- A comparison produces either a mathematically correct answer or a + compilation error. + +The first two rules are expected to also apply for other built-in operators, +such as arithmetic. The third rule is specific to comparisons. + +One consequence of the first rule is that we do not convert operands in a way +that might lose information. This is generally also implied by the third rule. + +## Details + +All six operators are infix binary operators. For standard Carbon types, they +produce a `Bool` value. + +### Precedence + +The comparison operators are all at the same precedence level. This level is +lower than operators used to compute (non-Boolean) values, higher than the +logical operators `and` and `or`, and incomparable with the precedence of `not`. + +For example, this is OK: + +``` +if (n + m * 3 < n * n and 3 < m and m < 6) {} +``` + +... but these are errors: + +``` +// Error, ambiguous: `(not a) == b` or `not (a == b)`? +if (not a == b) {} +// Error, requires parentheses: `a == (not b)`. +if (a == not b) {} +// Error, requires parentheses: `not (f < 5.0)`. +if (not f < 5.0) {} +``` + +### Associativity + +The comparison operators are non-associative. For example: + +``` +// Error, need `3 < m and m < 6`. +if (3 < m < 6) {} +// Error, need `a == b and b == c`. +if (a == b == c) {} +// Error, need `(m > 1) == (n > 1)`. +if (m > 1 == n > 1) {} +``` + +### Built-in comparisons and implicit conversions + +Built-in comparisons are permitted: + +- when both operands are of standard Carbon integer types (`Int(n)` or + `Unsigned(n)`), or +- when both operands are of standard Carbon floating-point types (`Float(n)`), + or +- when one operand is of floating-point type and the other is of integer type, + if all values of the integer type can be exactly represented in the + floating-point type + +In each case, the result is the mathematically-correct answer. This applies even +when comparing `Int(n)` with `Unsigned(m)`. + +For example: + +``` +// The value of `v` is True, because `a` is less than `b`, even though the +// result of either an `i32` comparison or a `u32` comparison would be False. +fn f(a: i32, b: u32) -> Bool { return a < b; } +let v: Bool = f(-1, 4_000_000_000); + +// This does not compile, because `i64` values in general (and 10^18 in +// particular) are not exactly representable in the type `f32`. +let f: f32 = 1.0e18; +let n: i64 = 1_000_000_000_000_000_000; +let w: Bool = f == n; +``` + +Comparisons involving integer and floating-point constants are not covered by +these rules and are [discussed separately](#comparisons-with-constants). + +#### Consistency with implicit conversions + +As specified in [#820](https://github.com/carbon-language/carbon-lang/pull/820), +we support the following implicit conversions: + +- from `Int(n)` to `Int(m)` if `m > n`, +- from `Unsigned(n)` to `Int(m)` or `Unsigned(m)` if `m > n`, +- from `Float(n)` to `Float(m)` if `m > n`, and +- from `Int(n)` to `Float(m)` if `Float(m)` can represent all values of + `Int(n)`. + +These rules can be summarized as: a type `T` can be converted to `U` if every +value of type `T` is a value of type `U`. + +Additionally [#820](https://github.com/carbon-language/carbon-lang/pull/820) +permits conversions from certain kinds of integer and floating-point constants +to `Int(n)` and `Float(n)` types if the constant can be represented in the type. + +All built-in comparisons can be viewed as performing implicit conversions on at +most one of the operands in order to reach a suitable pair of identical or very +similar types, and then performing a comparison on those types. The target types +for these implicit conversions are, for each suitable value `n`: + +- `Int(n)` vs `Int(n)` +- `Unsigned(n)` vs `Unsigned(n)` +- `Int(n)` vs `Unsigned(n)` +- `Unsigned(n)` vs `Int(n)` +- `Float(n)` vs `Float(n)` + +There will in general be multiple combinations of implicit conversions that will +lead to one of the above forms, but we will arrive at the same result regardless +of which is selected, because all comparisons are mathematically correct and all +implicit conversions are lossless. Implementations are expected to do whatever +is most efficient: for example, for `u16 < i32` it is likely that the best +choice would be to promote the `u16` to `i32`, not `u32`. + +Because we only ever convert at most one operand, we never use an intermediate +type that is larger than both input types. For example, both `i32` and `f32` can +be implicitly converted to `f64`, but we do not permit comparisons between `i32` +and `f32` even though we could perform those comparisons in `f64`. If such +comparisons were permitted, the results could be surprising: + +``` +// OK, i32 can exactly represent this value. +var n: i32 = 2_000_000_001; +// OK, this value is within the representable range for f32. +var f: f32 = 2_000_000_001.0; +// This comparison could compare unequal, because f32 cannot exactly represent +// the value 2,000,000,001. +if (n == f) { ... } +// OK with explicit cast, but may still compare unequal. +if (n == f as f64) { ... } +if (n as f64 == f) { ... } +``` + +The two kinds of mixed-type comparison may be [less efficient](#performance) +than the other kinds due to the slightly wider domain. + +Note that this approach diverges from C++, which would convert both operands to +a common type first, sometimes performing a lossy conversion potentially giving +an incorrect result, sometimes converting both operands, and sometimes using a +wider type than either of the operand types. + +#### Comparisons with constants + +As described in [#820](https://github.com/carbon-language/carbon-lang/pull/820), +integer constants can be implicitly converted to any integer or floating-point +type that can represent their value, and floating-point constants can be +implicitly converted to any floating-point type that can represent their value. +We permit the following comparisons involving constants: + +- A constant can be compared with a value of any type to which it can be + implicitly converted. +- Any two constants can be compared, even if there is no type that can + represent both. + +Note that this disallows comparisons between, for example, `i32` and an integer +literal that cannot be represented in `i32`. Such comparisons would always be +tautological. This decision should be revisited if it proves problematic in +practice, for example in templated code where the literal is sometimes in range. + +#### Performance + +The choice to give correct results for signed/unsigned comparisons has a +performance impact in practice, because it exposes operations that some +processors do not currently directly support. +[Sample microbenchmarks](https://quick-bench.com/q/1_xA8G_jXci_yeOKt0WgCc6eGN4) +for implementations of several operations show the following performance on +x86_64: + +| Operation | Mathematical comparison time | C++ comparison time | Ratio | +| ----------- | ---------------------------- | ------------------- | ----- | +| `i64 < u64` | 1636 | 798 | 2.0x | +| `u64 < i64` | 1956 | 798 | 2.5x | + +The execution times here are computed as operation time minus no-op time. + +The mixed-type operations typically have 2-2.5x the execution time of the +same-type operations. However, this is a predictable performance change, and can +be controlled by the developer by converting the operands to a suitable type +prior to the conversion if a faster same-type comparison is preferred over a +correct mixed-type comparison. + +The above comparison attempts to demonstrate a worst-case difference. In many +cases, better code can be generated for the mixed-type comparison. For example, +when +[branching on the result of the comparison](https://quick-bench.com/q/mXJiHK3_RcCH4fgB88phQscLu88), +the difference is significantly reduced: + +| Operation | Mathematical comparison time | C++ comparison time | Ratio | +| ----------- | ---------------------------- | ------------------- | ----- | +| `i64 < u64` | 996 | 991 | 1.0x | +| `u64 < i64` | 1973 | 997 | 2.0x | + +### Overloading + +Separate interfaces will be provided to permit overloading equality and +relational comparisons. The exact design of those interfaces is left to a future +proposal. As non-binding design guidance for such a proposal: + +- The interface for equality comparisons should primarily provide the ability + to override the behavior of `==`. The `!=` operator can optionally also be + overridden, with a default implementation that returns `not (a == b)`. + Overriding `!=` separately from `==` is expected to be used to support + floating-point NaN comparisons and for C++ interoperability. +- The interface for relational comparisons should primarily provide the + ability to specify a three-way comparison operator. The individual + relational comparison operators can optionally be overridden separately, + with a default implementation in terms of the three-way comparison operator. + This facility is expected to be used primarily to support C++ + interoperability. +- Overloaded comparison operators may wish to produce a type other than + `Bool`, for uses such as a vector comparison producing a vector of `Bool` + values. We should decide whether we wish to support such uses. + +### Default implementations for basic types + +In addition to being defined for standard Carbon numeric types, equality and +relational comparisons are also defined for all "data" types: + +- Tuples. +- Structs (structural data classes). +- Classes implementing an interface that identifies them as data classes. + +Relational comparisons for these types provide a lexicographical ordering. This +proposal defers to +[#710](https://github.com/carbon-language/carbon-lang/issues/710) for details on +comparison support for classes. + +In each case, the comparison is only available if it is supported by all element +types. + +The `Bool` type should be treated as a choice type, and so should support +equality comparisons and relational comparisons if and only if choice types do +in general. That decision is left to a future proposal. + +## Rationale based on Carbon's goals + +- _Performance-critical software:_ + + - The use of a three-way comparison as the central primitive for + overloading relational comparisons provides predictable, composable + performance for comparing hierarchical data structures. + - The performance of mixed comparisons may be slower than in C++, but this + is because it's performing a different operation. This performance + change is predictable, and can be controlled by the programmer by + performing suitable non-value-preserving casts to a common type prior to + the comparison. + +- _Code that is easy to read, understand, and write:_ + + - The chosen precedence and associativity rules aim to avoid bugs and + ensure the code does what it appears to do, requiring parentheses in + cases where the intent is unclear. + - The choice to not perform lossy conversions on operands of a comparison + operator removes a source of bugs caused by unintended lossy + conversions. + +- _Interoperability with and migration from existing C++ code:_ + + - The use of the chosen operator symbols exactly matches C++, reducing + friction for developers and code moving between the two languages, and + for interoperability. + +## Alternatives considered + +### Alternative symbols + +We could use `/=` instead of `!=` for not-equal comparisons. + +Advantages: + +- Avoids overloading `!` for both "not equals" and template/generic use in + `:!` bindings. +- There is no other usage of `!` meaning "not" in the language because we use + a `not` operator. + +Disadvantages: + +- Unfamiliar to C++ programmers. +- `a /= b` would likely be expected to mean an `a = a / b` compound + assignment. +- Breaks consistency with Python, which uses `not` for logical negation and + `!=` for inequality comparison. + +We could use `=/=` instead of `!=` for not-equal comparisons. + +Advantages: + +- As above; also `=/=` looks like an `==` with a line through the middle. + +Disadvantages: + +- This would be inventive and unlike all other languages. As above, breaks + consistency with Python. +- This would make `=/=` one character longer, and harder to type on US-ASCII + keyboards because the keys are distant but likely to be typed with the same + finger. + +### Chained comparisons + +We could support Python-like chained comparisons. + +Advantages: + +- Small ergonomic improvement for range comparisons. +- Middle operand is evaluated only once. + +Disadvantages: + +- Using the middle expression as an argument to two different functions may + create problems, as the value will need to be stored somewhere, potentially + changing the semantics of the operator expression as we can no longer move + from the operand. +- Both short-circuiting behavior and non-short-circuiting behavior will be + surprising and unintuitive to some. The short-circiuting option will + introduce control flow without a keyword to announce it, which goes against + our design decision to use a keyword for `and` and `or` to announce the + control flow. The non-short-circuiting option will evaluate subexpressions + unnecessarily, which creates a tension with our performance goal. +- Experienced C++ developers may expect a different behavior, such as + `a < b == cmp` comparing the result of `a < b` against the Boolean value + `cmp`. + +See also the ongoing discussion in +[#451](https://github.com/carbon-language/carbon-lang/issues/451). + +### Convert operands like C++ + +We could convert the operands of comparison operators in a way that's equivalent +to C++'s behavior. + +Advantages: + +- May ease migration from C++. +- May allow programmers to reuse some intuition, for example when comparing + floating-point values against integer values. +- Allows more efficient machine code to be generated for source code that + takes no special care about the types of comparison operands. +- Improves performance predictability for C++ developers unfamiliar with + Carbon's rules. + +Disadvantages: + +- Produces incorrect results. +- Does not provide a simple syntax for correct mixed-type comparisons. + +### Provide a three-way comparison operator + +We could provide a symbol for three-way comparisons, such as C++20's `<=>`. + +Advantages: + +- The use of a symbol rather than a named member of an interface for this + functionality may ease migration from C++20. + +Disadvantages: + +- Reserves a symbol for an operation that should not be used directly except + in special circumstances, and that will produce a nuanced type even when + comparing standard Carbon types such as `f32`. + +### Allow comparisons as the operand of `not` + +We could permit comparisons to appear as the immediate operand of `not` without +parentheses. + +Advantages: + +- Provides an easier syntax for floating-point comparisons where the desired + result for a NaN operand is `True` rather than `False`: `not f < 5.0`. + +Disadvantages: + +- Introduces ambiguity when comparing Boolean values: `not cond1 == cond2` + might intend to compare `not cond1` to `cond2` rather than comparing + `cond1 != cond2`.