Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison operators #702

Merged
merged 17 commits into from
Sep 24, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions proposals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,6 @@ request:
- [0623 - Require braces](p0623.md)
- [0646 - Low context-sensitivity principle](p0646.md)
- [0680 - And, or, not](p0680.md)
- [0702 - Comparison operators](p0702.md)

<!-- endproposals -->
361 changes: 361 additions & 0 deletions proposals/p0702.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# Comparison operators

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/702)

<!-- toc -->

## Table of contents

- [Problem](#problem)
- [Background](#background)
- [Terminology](#terminology)
- [Usage in existing languages](#usage-in-existing-languages)
- [Three-way comparisons](#three-way-comparisons)
- [Chained comparisons](#chained-comparisons)
- [Proposal](#proposal)
- [Details](#details)
- [Precedence](#precedence)
- [Associativity](#associativity)
- [Conversions](#conversions)
- [Overloading](#overloading)
- [Default implementations for basic types](#default-implementations-for-basic-types)
- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
- [Alternatives considered](#alternatives-considered)
- [Alternative symbols](#alternative-symbols)
- [Chained comparisons](#chained-comparisons-1)
- [Convert operands like C++](#convert-operands-like-c)
- [Provide a three-way comparison operator](#provide-a-three-way-comparison-operator)
- [Allow comparisons as the operand of `not`](#allow-comparisons-as-the-operand-of-not)
- [Disallow relational comparisons of Boolean values](#disallow-relational-comparisons-of-boolean-values)

<!-- tocstop -->

## Problem

We need to be able to compare values for equality, and to compare ordered values
for relative ordering.

## Background

### Terminology

We refer to tests that check whether two values are the same or different as
_equality_ comparisons, and to tests that determine the relative ordering of two
values as _relational_ comparisons.

### Usage in existing languages

There is near-universal convention on the use of the following symbols for
relational operators:

- `<`, `<=`, `>`, and `>=` perform ordered comparisons (less than, less than
or equal to, greater than, greater than or equal to).

There are rare exceptions in somewhat esoteric languages: some languages use `≤`
and `≥`, but these are not straightforward to type for many potential Carbon
developers.

For equality operators, there is some divergence but still a very strong trend:

- C-family languages, Rust, Swift, Kotlin, Zig, Nim, Ruby, etc. use `==` for
equality comparison and `!=` for inequality comparison.
- Some languages, such as ALGOL, APL, BASIC, and PL/I, use `=` as equality
comparison, with some using a different symbol (such as `:=` or `<-`) for
assignment and others distinguishing assignment from equality comparison
based on context.
- Haskell and Fortran use `==` for "equal to" and `/=` for "not equal to". The
latter is intended to resemble a ≠ symbol.
- Some languages, such as Pascal and BASIC, use `<>` for inequality
comparison. Python 2 permits this as a synonym for `!=`.
- Perl uses `eq` and `ne` for string comparisons; some shells and UNIX `test`
use `-eq` and `-ne` for for integer comparisons.

Some languages support multiple different kinds of equality comparison, such as
both a value comparison (typically `==`) and an object identity comparison
(typically `===` or `is`). Some languages that freely convert between numbers
and strings have different operators to perform a string comparison versus a
numeric comparison. Fortran has custom `.eqv.` and `.neqv.` for equality
comparisons of Boolean values.

Some languages have synonyms for equality operators. For example, Fortran allows
`.eq.`, `.ne.`, `.gt.`, and so on, as synonyms for `==`, `/=`, `>`, and so on.
This appears to be historical: FORTRAN 77 had only the dotted forms of these
operators.

### Three-way comparisons

C++ has three-way comparisons, written using the `<=>` operator. These provide a
useful mechanism to allow overloading the behavior of relational comparisons
without defining four separate operator overloads for relational comparisons.

Similarly, Python provides a `__cmp__` special method that can be used to
implement all equality and relational comparisons.

### Chained comparisons

Python permits comparisons to be chained: that is, `a < b <= c` is interpreted
as `a < b and b <= c`, except that `b` is evaluated only once. In most C-family
languages, that expression is instead interpreted as `(a < b) <= c`, which
computes the value of `a < b`, maps `false` to `0` and `true` to `1`, then
compares the result to `c`.

## Proposal

Carbon will provide the following operators:

- Equality comparison operators: `==` and `!=`.
- Relational comparison operators: `<`, `<=`, `>`, `>=`.

Each has the obvious mathematical meaning, where `==` means =, `!=` means ≠,
`<=` means ≤, and `>=` means ≥.

There will be no three-way comparison operator symbol. The interface used to
support overloading comparison operators will provide a named function to
perform three-way comparisons.

Chained comparisons are an error: a comparison expression cannot appear as an
unparenthesized operand of another comparison operator.

## Details

All six operators are infix binary operators. For standard Carbon types, they
produce a `Bool` value.

### Precedence

The comparison operators are all at the same precedence level. This level is
lower than operators used to compute (non-Boolean) values, higher than the
logical operators `and` and `or`, and incomparable with the precedence of `not`.

For example, this is OK:

```
if (n + m * 3 < n * n and 3 < m and m < 6) {}
```

... but these are errors:

```
// Error, ambiguous: `(not a) == b` or `not (a == b)`?
if (not a == b) {}
// Error, requires parentheses: `a == (not b)`.
if (a == not b) {}
// Error, requires parentheses: `not (f < 5.0)`.
if (not f < 5.0) {}
```

### Associativity

The comparison operators are non-associative. For example:

```
// Error, need `3 < m and m < 6`.
if (3 < m < 6) {}
// Error, need `a == b and b == c`.
if (a == b == c) {}
// Error, need `(m > 1) == (n > 1)`.
if (m > 1 == n > 1) {}
```

### Conversions

When both operands are of standard Carbon numeric types (`Int(n)` or
`Float(n)`), no conversions are performed on either operand, and the result is
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
the mathematically correct result for that comparison, or `False` if either
operand is a NaN. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider disallowing == and != when one side is integral and the other is floating-point, because equality comparisons on floating point numbers are in some ways much more like bitwise operations than mathematical operations, and from that point of view there is no "mathematically correct result".


```
// The value of `v` is True, because `a` is less than `b`, even though the
// result of either an `i32` comparison or a `u32` comparison would be False.
fn f(a: i32, b: u32) -> Bool { return a < b; }
let v: Bool = f(-1, 4_000_000_000);

// The value of `w` is False, because `f` has value 999999984306749440, which
// is not exactly equal to n.
let f: f32 = 1.0e18;
let n: i64 = 1_000_000_000_000_000_000;
let w: Bool = f == n;
```

An equivalent viewpoint is that the comparison is performed in a hypothetical
suffiicently large type. For example, a comparison of `i32` against `u32` can be
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
performed in `i64`, and a comparison of `f32` against `i32` can be performed in
`f64`. However, no such type is required to actually exist.

Note that this diverges from C++, which would convert both operands to a common
type first, sometimes performing a lossy conversion.

### Overloading

Separate interfaces will be provided to permit overloading equality and
relational comparisons. The exact design of those interfaces is left to a future
proposal. As non-binding design guidance for such a proposal:

- The interface for equality comparisons should primarily provide the ability
to override the behavior of `==`. The `!=` operator can optionally also be
overridden, with a default implementation that returns `not (a == b)`.
chandlerc marked this conversation as resolved.
Show resolved Hide resolved
- The interface for relational comparisons should primarily provide the
ability to specify a three-way comparison operator. The individual
relational comparison operators can optionally be overridden separately,
with a default implementation in terms of the three-way comparison operator.
- Overloaded comparison operators may wish to produce a type other than
`Bool`, for uses such as a vector comparison producing a vector of `Bool`
values.

### Default implementations for basic types

In addition to being defined for standard Carbon numeric types, equality
comparisons are also defined for all "data" types:

- Tuples.
- Structs (structural data classes).
- Classes implementing an interface that identifies them as data classes.

In addition, relational comparisons are defined for tuples, and provide a
lexicographical ordering.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

In each case, the ordering is only available if it is supported by all element
josh11b marked this conversation as resolved.
Show resolved Hide resolved
types.

The `Bool` type supports equality comparisons and relational comparisons. For
relational comparisons, `False` is treated as being less than `True`.

## Rationale based on Carbon's goals

- _Performance-critical software:_

- The use of a three-way comparison as the central primitive for
overloading relational comparisons provides predictable, composable
performance for comparing hierarchical data structures.

- _Code that is easy to read, understand, and write:_

- The chosen precedence and associativity rules aim to avoid bugs and
ensure the code does what it appears to do, requiring parentheses in
cases where the intent is unclear.
- The choice to not convert operands of a comparison operator removes a
source of bugs caused by unintended lossy conversions.

- _Interoperability with and migration from existing C++ code:_

- The use of the chosen operator symbols exactly matches C++, reducing
friction for developers and code moving between the two languages, and
for interoperability.

## Alternatives considered

### Alternative symbols

We could use `/=` instead of `!=` for not-equal comparisons.

Advantages:

- Avoids overloading `!` for both "not equals" and template/generic use in
`:!` bindings.
- There is no other usage of `!` meaning "not" in the language because we use
a `not` operator.

Disadvantages:

- Unfamiliar to C++ programmers.
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
- `a /= b` would likely be expected to mean an `a = a / b` compound
assignment.

We could use `=/=` instead of `!=` for not-equal comparisons.

Advantages:

- As above; also `=/=` looks like an `==` with a line through the middle.

Disadvantages:

- This would be inventive and unlike all other languages.
- This would make `=/=` one character longer, and harder to type on US-ASCII
keyboards because the keys are distant but likely to be typed with the same
finger.

### Chained comparisons

We could support Python-like chained comparisons.

Advantages:

- Small ergonomic improvement for range comparisons.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

Disadvantages:
jonmeow marked this conversation as resolved.
Show resolved Hide resolved

- Using the middle expression as an argument to two different functions may
create problems, as the value will need to be stored somewhere, potentially
changing the semantics of the operator expression as we can no longer move
from the operand.
- Both short-circuiting behavior and non-short-circuiting behavior will be
surprising and unintuitive to some.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

### Convert operands like C++

We could convert the operands of comparison operators in a way that's equivalent
to C++'s behavior.

Advantages:

- May ease migration from C++.
- May allow programmers to reuse some intuition, for example when comparing
floating-point values against integer values.
- May allow more efficient machine code to be generated for source code that
takes no special care about the types of comparison operands.
jonmeow marked this conversation as resolved.
Show resolved Hide resolved

Disadvantages:

- Produces incorrect results.
- Does not provide a simple syntax for correct mixed-type comparisons.

### Provide a three-way comparison operator

We could provide a symbol for three-way comparisons, such as C++20's `<=>`.

Advantages:

- The use of a symbol rather than a named member of an interface for this
functionality may ease migration from C++20.

Disadvantages:

- Reserves a symbol for an operation that should not be used directly except
in special circumstances, and that will produce a nuanced type even when
comparing standard Carbon types such as `f32`.

### Allow comparisons as the operand of `not`

We could permit comparisons to appear as the immediate operand of `not` without
parentheses.

Advantages:

- Provides an easier syntax for floating-point comparisons where the desired
result for a NaN operand is `True` rather than `False`: `not f < 5.0`.

Disadvantages:

- Introduces ambiguity when comparing Boolean values: `not cond1 == cond2`
might intend to compare `not cond1` to `cond2` rather than cmoparing
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
`cond1 != cond2`.

### Disallow relational comparisons of Boolean values

We could disallow ordered comparisons of Boolean values.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

Advantages:

- Disallows an operation that might be unintended.
geoffromer marked this conversation as resolved.
Show resolved Hide resolved

Disadvantages:

- Disallows an operation that might be intended.
- Likely to make `Bool` behave differently from discriminated union types,
which are likely to be treated as data types.
geoffromer marked this conversation as resolved.
Show resolved Hide resolved