Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison operators #702

Merged
merged 17 commits into from
Sep 24, 2021
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 126 additions & 44 deletions proposals/p0702.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Details](#details)
- [Precedence](#precedence)
- [Associativity](#associativity)
- [Conversions](#conversions)
- [Built-in comparisons and implicit conversions](#built-in-comparisons-and-implicit-conversions)
- [Performance](#performance)
- [Overloading](#overloading)
- [Default implementations for basic types](#default-implementations-for-basic-types)
- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
Expand All @@ -32,7 +33,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Convert operands like C++](#convert-operands-like-c)
- [Provide a three-way comparison operator](#provide-a-three-way-comparison-operator)
- [Allow comparisons as the operand of `not`](#allow-comparisons-as-the-operand-of-not)
- [Disallow relational comparisons of Boolean values](#disallow-relational-comparisons-of-boolean-values)

<!-- tocstop -->

Expand Down Expand Up @@ -122,6 +122,16 @@ perform three-way comparisons.
Chained comparisons are an error: a comparison expression cannot appear as an
unparenthesized operand of another comparison operator.

For built-in types, we will follow these rules:

- A comparison produces either a mathematically correct answer or a
compilation error.
- Do not perform implicit conversions that lose information.
- Never invent an intermediate type that is larger than both operand types.

One indirect consequence of the second and third rule is that we never convert
both operands in a comparison.

## Details

All six operators are infix binary operators. For standard Carbon types, they
Expand Down Expand Up @@ -163,33 +173,101 @@ if (a == b == c) {}
if (m > 1 == n > 1) {}
```

### Conversions
### Built-in comparisons and implicit conversions

When both operands are of standard Carbon integer types (`Int(n)` or
`Unsigned(n)`), the narrower operand is converted to the width of the wider
operand, then a mathematically-correct comparison is performed.
chandlerc marked this conversation as resolved.
Show resolved Hide resolved

When both operands are of standard Carbon numeric types (`Int(n)` or
`Float(n)`), no conversions are performed on either operand, and the result is
the mathematically correct result for that comparison, or `False` if either
operand is a NaN. For example:
When both operands are of standard Carbon floating-point types (`Float(n)`), the
narrower operand is converted to the width of the wider operand, then a
`Float(n)` comparison is performed.

When one operand is of floating-point type and the other is of integer type, if
all values of the integer type can be exactly represented in the floating-point
type, it is converted to the floating-point type. Otherwise, the integer operand
is required to be a constant and is required to be exactly representable in the
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
floating-point type. Otherwise the comparison is rejected.

For example:

```
// The value of `v` is True, because `a` is less than `b`, even though the
// result of either an `i32` comparison or a `u32` comparison would be False.
fn f(a: i32, b: u32) -> Bool { return a < b; }
let v: Bool = f(-1, 4_000_000_000);

// The value of `w` is False, because `f` has value 999999984306749440, which
// is not exactly equal to n.
// This does not compile, because `i64` values in general (and 10^18 in
// particular) are not exactly representable in the type `f32`.
let f: f32 = 1.0e18;
let n: i64 = 1_000_000_000_000_000_000;
let w: Bool = f == n;
```

An equivalent viewpoint is that the comparison is performed in a hypothetical
suffiicently large type. For example, a comparison of `i32` against `u32` can be
performed in `i64`, and a comparison of `f32` against `i32` can be performed in
`f64`. However, no such type is required to actually exist.

Note that this diverges from C++, which would convert both operands to a common
type first, sometimes performing a lossy conversion.
More generally, we support the following implicit conversions:

- from `Int(n)` to `Int(m)` if `m > n`,
- from `Unsigned(n)` to `Int(m)` or `Unsigned(m)` if `m > n`,
- from `Float(n)` to `Float(m)` if `m > n`, and
- from `Int(n)` to `Float(m)` if `Float(m)` can represent all values of
`Int(n)` or if the integer operand is a constant that is exactly
representable in the floating-point type.

This proposal does not take an explicit stance on what it means for the integer
operand to be a constant, except that integer literals are expected to qualify
(so `f >= 0` for a floating-point value `f` is valid). For the time being, no
expression forms other than integer literals should be permitted to implicitly
convert from integer to floating-point type, although we expect this to be
revised by future proposals.

Other than the rule for converting a constant integer to floating-point, these
rules can be summarized as: a type `T` can be converted to `U` if every value of
type `T` is a value of type `U`.

We support comparison operations (both equality and relational) on the following
operand types, for each suitable value `n`:

- `Int(n)` vs `Int(n)`
- `Unsigned(n)` vs `Unsigned(n)`
- `Int(n)` vs `Unsigned(n)`
- `Unsigned(n)` vs `Int(n)`
- `Float(n)` vs `Float(n)`

When searching for a suitable comparison, we only consider performing
conversions on one of the two operands, never both. We expect for this to also
apply to [overloaded](#overloading) comparison operators for user-defined types,
but this proposal does not constrain that decision.

The two kinds of mixed-type comparison may be [less efficient](#performance)
than the other kinds due to the slightly wider domain.

Note that this approach diverges from C++, which would convert both operands to
a common type first, sometimes performing a lossy conversion potentially giving
an incorrect result, sometimes converting both operands, and sometimes using a
wider type than either of the operand types.

#### Performance

The choice to give correct results for signed/unsigned comparisons has a
performance impact in practice, because it exposes operations that some
processors do not currently directly support.
[Sample microbenchmarks](https://godbolt.org/z/dfGe4MhEx) for implementations of
several operations show the following performance on x86_64 (use the Quick-bench
link in Compiler Explorer to run the benchmarks):

| Operation | Mathematical comparison time | C++ comparison time | Ratio |
| ----------- | ---------------------------- | ------------------- | ----- |
| `i64 < u64` | 2814 | 992 | 2.8x |
chandlerc marked this conversation as resolved.
Show resolved Hide resolved
| `u64 < i64` | 1957 | 1012 | 1.9x |

The mixed-type operations are typically 2-3x slower than the same-type
operations. However, this is a predictable performance change, and can be
controlled by the developer by converting the operands to a suitable type prior
to the conversion if a faster same-type comparison is preferred over a correct
mixed-type comparison.

It is likely that better code could be generated than that currently produced in
this benchmark for these operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://quick-bench.com/q/1_xA8G_jXci_yeOKt0WgCc6eGN4 seems to get to 1.3x and 1.4x. ;]

But it's still a "bad" test -- most users aren't conjuring booleans, they're branching. Most of the extra slowness is materializing the 0 and 1 value here. It'll be harder to measure the difference when used in real-world code.

But it's still a difference and so it may be something we have to pay attention to here long-term.

Copy link
Contributor Author

@zygoloid zygoloid Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation I'm using here is:

((correct operation benchmark time) - (no-op benchmark time)) /
((converting operation benchmark time) - (no-op benchmark time))

Using that calculation, I get 2.0x and 2.5x for the two operations in your revised approach rather than the 1.3x and 1.4x as reported by quick-bench. Even though these are similar to the numbers I had before, I've switched to your quick-bench link because I think your branchless implementation seems preferable in general. I previously reported 2.8x and 1.8x, but I'm now realizing that the 1.8x was cheating because it involved a branch that happened to be always taken, that skipped half of the comparison. And while the 2.8x involved a branch that was taken half the time, that was in predictable batches of 500 occurrences branching each way.

I've also added results for a case where we branch on the result of the computation, where the performance delta does seem smaller -- and the delta disappears entirely for one of the two operations; perhaps we're getting one of the two tests for free somehow. I expect that's still not really representative, but perhaps it's closer to being so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Good catch on the noop-baseline.

Probably spent too much time on this, but I've got a re-worked benchmark that I like a bit better:
https://godbolt.org/z/xj3n6PKfb

You can't run it on quick-bench.com because it needs Abseil, but it actually uses a controllable distribution of random values. As pasted it should use non-negative values that fit into an i64.

The result is, at least on my Skylake VM, identical performance for the branchless version and the converting version. The branching version behaves a bit differently. Note that these nearly never find "true" for equality test, and so that's a bit biasing here.

If you shift the dataset a bit to, for example, have negative integers then it skews very differently: the non-converting cases are actually much faster. Why? Because they can short circuit by only looking at one input when that input is negative! But we shouldn't expect this to be representative of anything, that's why I skewed the distribution back to make the negative test essentially always fail -- if that test ever succeeds then the converting case will be slower.

It also seems to show much tighter behavior with floating point as well, but I've not spent much time analyzing that.

Happy to iterate a bit here and get a benchmark that we can actually include in the repository if useful (and it seems like it might be) to ensure we're doing a good job of setting expectations here.

But regardless, this shouldn't block the proposal. Feel free to merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, to compile the above locally on Linux, I think the following will work:

brew install google-benchmark abseil
clang++ -o conv_bench conv_bench.cpp -O3 -I(brew --prefix google-benchmark)/include -I(brew --prefix abseil)/include (brew --prefix google-benchmark)/lib/libbenchmark.a (brew --prefix abseil)/lib/libabsl_*.so -lpthread -Wl,-rpath,(brew --prefix abseil)/lib -gmlt


### Overloading

Expand All @@ -198,9 +276,8 @@ relational comparisons. The exact design of those interfaces is left to a future
proposal. As non-binding design guidance for such a proposal:

- The interface for equality comparisons should primarily provide the ability
to override the behavior of `==`. The
!=`operator can optionally also be overridden, with a default implementation that returns`not
(a == b)`.
to override the behavior of `==`. The `!=` operator can optionally also be
overridden, with a default implementation that returns `not (a == b)`.
chandlerc marked this conversation as resolved.
Show resolved Hide resolved
- The interface for relational comparisons should primarily provide the
ability to specify a three-way comparison operator. The individual
relational comparison operators can optionally be overridden separately,
Expand All @@ -211,21 +288,24 @@ proposal. As non-binding design guidance for such a proposal:

### Default implementations for basic types

In addition to being defined for standard Carbon numeric types, equality
comparisons are also defined for all "data" types:
In addition to being defined for standard Carbon numeric types, equality and
relational comparisons are also defined for all "data" types:

- Tuples.
- Structs (structural data classes).
- Classes implementing an interface that identifies them as data classes.

In addition, relational comparisons are defined for tuples, and provide a
lexicographical ordering.
Relational comparisons for these types provide a lexicographical ordering. This
proposal defers to
[#561](https://github.com/carbon-language/carbon-lang/pull/561) for details on
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
comparison support for classes.

In each case, the ordering is only available if it is supported by all element
In each case, the comparison is only available if it is supported by all element
types.

The `Bool` type supports equality comparisons and relational comparisons. For
relational comparisons, `False` is treated as being less than `True`.
The `Bool` type should be treated as a choice type, and so should support
equality comparisons and relational comparisons if and only if choice types do
in general. That decision is left to a future proposal.

## Rationale based on Carbon's goals

Expand Down Expand Up @@ -267,6 +347,8 @@ Disadvantages:
- Unfamiliar to C++ programmers.
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
- `a /= b` would likely be expected to mean an `a = a / b` compound
assignment.
- Breaks consistency with Python, which uses `not` for logical negation and
`!=` for inequality comparison.

We could use `=/=` instead of `!=` for not-equal comparisons.

Expand All @@ -276,7 +358,8 @@ Advantages:

Disadvantages:

- This would be inventive and unlike all other languages.
- This would be inventive and unlike all other languages. As above, breaks
consistency with Python.
- This would make `=/=` one character longer, and harder to type on US-ASCII
keyboards because the keys are distant but likely to be typed with the same
finger.
Expand All @@ -288,6 +371,7 @@ We could support Python-like chained comparisons.
Advantages:

- Small ergonomic improvement for range comparisons.
josh11b marked this conversation as resolved.
Show resolved Hide resolved
- Middle operand is evaluated only once.

Disadvantages:
jonmeow marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -296,7 +380,17 @@ Disadvantages:
changing the semantics of the operator expression as we can no longer move
from the operand.
- Both short-circuiting behavior and non-short-circuiting behavior will be
surprising and unintuitive to some.
surprising and unintuitive to some. The short-circiuting option will
introduce control flow without a keyword to announce it, which goes against
our design decision to use a keyword for `and` and `or` to announce the
control flow. The non-short-circuiting option will evaluate subexpressions
unnecessarily, which creates a tension with our performance goal.
- Experienced C++ developers may expect a different behavior, such as
`a < b == cmp` comparing the result of `a < b` against the Boolean value
`cmp`.

See also the ongoing discussion in
[#451](https://github.com/carbon-language/carbon-lang/issues/451).

### Convert operands like C++

Expand All @@ -308,8 +402,10 @@ Advantages:
- May ease migration from C++.
- May allow programmers to reuse some intuition, for example when comparing
floating-point values against integer values.
- May allow more efficient machine code to be generated for source code that
- Allows more efficient machine code to be generated for source code that
takes no special care about the types of comparison operands.
- Improves performance predictability for C++ developers unfamiliar with
Carbon's rules.

Disadvantages:

Expand Down Expand Up @@ -344,19 +440,5 @@ Advantages:
Disadvantages:

- Introduces ambiguity when comparing Boolean values: `not cond1 == cond2`
might intend to compare `not cond1` to `cond2` rather than cmoparing
might intend to compare `not cond1` to `cond2` rather than comparing
`cond1 != cond2`.

### Disallow relational comparisons of Boolean values

We could disallow ordered comparisons of Boolean values.

Advantages:

- Disallows an operation that might be unintended.

Disadvantages:

- Disallows an operation that might be intended.
- Likely to make `Bool` behave differently from discriminated union types,
which are likely to be treated as data types.