Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C++-like for loops #353

Merged
merged 16 commits into from
May 14, 2021
1 change: 1 addition & 0 deletions proposals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ request:
- [0301 - Principle: Errors are values](p0301.md)
- [0339 - `var` statement](p0339.md)
- [0340 - while loops](p0340.md)
- [0353 - `for` loops](p0353.md)
- [0415 - Syntax: `return`](p0415.md)
- [0426 - Governance & evolution revamp](p0426.md)
- [0444 - GitHub Discussions](p0444.md)
Expand Down
357 changes: 357 additions & 0 deletions proposals/p0353.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
# `for` loops

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/353)

<!-- toc -->

## Table of contents

- [Problem](#problem)
- [Background](#background)
- [C++](#c)
- [Java](#java)
- [TypeScript and JavaScript](#typescript-and-javascript)
- [Python, Swift, and Rust](#python-swift-and-rust)
- [Go](#go)
- [Proposal](#proposal)
- [Details](#details)
- [Range inputs](#range-inputs)
- [Executable semantics form](#executable-semantics-form)
- [Caveats](#caveats)
- [C++ as baseline](#c-as-baseline)
- [Semisemi support](#semisemi-support)
- [Range literals](#range-literals)
- [Enumerating containers](#enumerating-containers)
- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
- [Alternatives considered](#alternatives-considered)
- [Include semisemi `for` loops](#include-semisemi-for-loops)
- [Writing `in` instead of `:`](#writing-in-instead-of-)
- [Multi-variable bindings](#multi-variable-bindings)

<!-- tocstop -->

## Problem

Control flow is documented at
[language overview](/docs/design/README.md#control-flow). `for` loops are common
in C++, and Carbon should consider providing some form of it.

## Background

### C++

There are two forms of `for` loops in C++:

- **Semisemi** (semicolon, semicolon): `for (int i = 0; i < list.size(); ++i)`
- **Range-based**: `for (auto x : list)`

Semisemi `for` loops have been around for a long time, and are in C. Range-based
`for` loops were added in C++11.

For example, here is a basic semisemi:

```cc
for (int i = 0; i < list.size(); ++i) {
printf("List at %d: %s\n", i, list[i].name);
}
```

An equivalent semisemi using iterators and the comma operator may look like:

```cc
int i = 0;
for (auto it = list.begin(); it != list.end(); ++it, ++i) {
printf("List at %d: %s\n", i, it->name);
}
```

Range-based syntax can be simpler, but can also make it more difficult if there
are multiple pieces of interesting information:

```cc
int i = 0;
for (const auto& x : list) {
printf("List at %d: %s\n", i, x.name);
++i;
}
```

### Java

Java provides equivalent syntax to C++. Although Java doesn't have a comma
operator, it does provide for comma-separated statements in the first and third
sections of semisemi for loops.

### TypeScript and JavaScript

Both TypeScript and JavaScript offer three kinds of for loops:

- Semisemi, mirroring C++.
- `for (x of list)`, mirroring range-based for loops.
- `for (x in list)`, returning indices.

For example, here is an `in` loop:

```javascript
for (i in list) {
console.log('List at ' + i + ': ' + list[i].name);
}
```

### Python, Swift, and Rust

Python, Swift, and Rust all only support range-based for loops, using
`for x in list` syntax.

### Go

Go uses `for` as its primary looping construct. It has:

- Semisemi, mirroring C++.
- `for i < list.size()` condition-only loops, mirroring C++ `while` loops.
- `for {` infinite loops.

## Proposal

Carbon should adopt C++-style range-based `for` loops syntax. Semisemi `for`
loops should be addressed through a different mechanism.

Related keywords are:

- `for`
- `continue`: continues with the next loop iteration.
- `break`: breaks out of the loop.

## Details

For loop syntax looks like: `for (` `var` _type_ _variable_ `:` _expression_
`) {` _statements_ `}`

Similar to the
[if/else proposal](https://github.com/carbon-language/carbon-lang/pull/285), the
braces are optional and must be paired (`{ ... }`) if present. When there are no
braces, only one statement is allowed.

`continue` will continue with the next loop iteration directly, skipping any
other statements in the loop body.

`break` exits the loop immediately.

All of this is consistent with C/C++ behavior.
austern marked this conversation as resolved.
Show resolved Hide resolved

### Range inputs

The syntax for inputs is not being defined in this proposal. However, we can
still establish critical things to support:

- Interoperable C++ objects that work with C++'s range-based `for` loops, such
as containers with iterators.
- Carbon arrays and other containers.
- Range literals. These are not proposed, but for an example seen in other
languages, `0..2` may indicate the set of integers [0, 2).

### Executable semantics form

```bison
%token FOR

statement:
FOR "(" pattern ":" expression ")" statement
| /* pre-existing statements elided */
;
```

The `continue` and `break` statements are intended to be added as part of the
[while proposal](https://github.com/carbon-language/carbon-lang/pull/340).

## Caveats

### C++ as baseline

This baseline syntax is based on C++, following the migration sub-goal
[Familiarity for experienced C++ developers with a gentle learning curve](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code).
To the extent that this proposal anchors on a particular approach, it aims to
anchor on C++'s existing syntax, consistent with that sub-goal.

Alternatives will generally reflect breaking consistency with C++ syntax. While
most proposals may consider alternatives more, this proposal suggests a
threshold of only accepting alternatives that skew from C++ syntax if they are
clearly better; the priority in this proposal is to _avoid debate_ and produce a
trivial proposal. Where an alternative would trigger debate, it should be
examined by an advocate in a separate proposal.

### Semisemi support

Carbon will not provide semisemi support. This decision will be contingent upon
a better alternative loop structure which is not currently provided by `while`
or `for` syntax. If Carbon doesn't evolve a better solution, semisemi support
will be added later.

For details, see [the alternative](#include-semisemi-for-loops).

### Range literals

Range literals are important to the ergonomics of range-based `for` loops, and
should be added. However, they should be examined separately as part of limiting
the scope of this proposal.

### Enumerating containers

Several languages have the concept of providing an index with the object in a
range-based for loop:

- Python does `for i, item in enumerate(items)`, with a global function.
- Go does `for i, item := range items`, with a keyword.
- Swift does `for (i, item) in items.enumerated()`, having removed a
`enumerate()` global function.
- Rust does `for (i, item) in items.enumerate()`.

An equivalent pattern for Carbon should be examined separately as part of
limiting the scope of this proposal.

## Rationale based on Carbon's goals

Relevant goals are:

- [3. Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write):

- Range-based `for` loops are easy to read and very helpful.
- Semisemi `for` syntax is complex and can be error prone for cases where
range-based loops work. Avoiding it, even by providing equivalent syntax
with a different loop structure, should discourage its use and direct
engineers towards better options. The alternative syntax should also be
easier to understand than semisemi syntax, otherwise we should just keep
semisemi syntax.

- [7. Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code):

- Keeping syntax close to C++ will make it easier for developers to
transition.

## Alternatives considered
zygoloid marked this conversation as resolved.
Show resolved Hide resolved

Both alternatives from the
[`if`/`else` proposal](https://github.com/carbon-language/carbon-lang/pull/285)
apply to `while` as well: we could remove parentheses, require braces, or both.
The conclusions mirror here in order to avoid a divergence in syntax.

Additional alternatives follow.

### Include semisemi `for` loops

We could include semisemi for loops for greater consistency with C++.

This is in part important because switching from a semisemi `for` loop to a
`while` loop is not always straightforward due to how `for` evaluates the third
section of the semisemi. The inter-loop evaluation of the third section is
important given how it interacts with `continue`. In particular, consider the
loops:

```cc
for (int i = 0; i < 3; ++i) {
if (i == 1) continue;
printf("%d\n", i);
}

int j = 0;
while (j < 3) {
if (j == 1) continue;
printf("%d\n", j);
++j;
}

int k = 0;
while (k < 3) {
++k;
if (k == 1) continue;
printf("%d\n", k);
}

int l = 0;
while (l < 3) {
if (l == 1) {
++l;
continue;
}
printf("%d\n", l);
++l;
}
```

To explain the differences between these loops:

- The first loop will print 0 and 2.
- The second loop will print 0, then loop infinitely because the increment is
never reached.
- The third loop will only print 2 because the increment happens too early.
- Only the fourth loop is equivalent to the first loop, and it duplicates the
increment.

There is no easy place to put the increment in a `while` loop.

Advantages:

- We need a plan for
[migrating both developers and code from C++](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
semisemis `for` loops, and providing them in Carbon is the easiest solution.
- Semisemis remain common in C++ code.
- Semisemis are much more flexible than range-based `for` loops.
- `while` loops do not offer a sufficient alternative.

Disadvantages:

- Semisemi loops can be error prone, such as `for (int i = 0; i < 3; --i)`.
- Syntax such as `for (int x : range(0, 3))` leaves less room for
developer mistakes.
- Removing semisemi syntax will likely improve
[understandability of Carbon code](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write),
a language goal.
- If we add semisemi loops, it would be very difficult to get rid of them.
- Code using them should be expected to accumulate quickly, from both
migrated code and developers familiar with C++ idioms.

If we want to remove `for` loops, we should avoid adding them. We do need to
ensure that developers are _happy_ with the replacement, although that should be
achievable through providing strong range support, including range literals.

A story for migrating developers and code is still required. For developers, it
would be ideal if we could have a compiler error that detects semisemi loops and
advises the preferred Carbon constructs. For both developers and code, we need a
suitable loop syntax that is easy to use in cases that remain hard to write in
`while` or range-based `for` loops. This will depend on a separate proposal, but
there's at least presently interest in this direction.
gribozavr marked this conversation as resolved.
Show resolved Hide resolved

### Writing `in` instead of `:`

Range-based for loops could write `in` instead of `:`, such as:

```carbon
for (x in list) {
...
}
```

An argument for switching _now_, instead of using
[C++ as a baseline](#c-as-baseline), would be that `var` syntax has been
discussed as using a `:`, and avoiding `:` in range-based for loops may reduce
syntax ambiguity risks. However, the
[current `var` proposal](https://github.com/carbon-language/carbon-lang/pull/339)
does not use a `:`, and so this risk is only a potential future concern: it's
too early to require further evaluation.

Because the benefits of this alternative are debatable and would diverge from
C++, adopting `in` would run contrary to
[using C++ as a baseline](#c-as-a-baseline). Any divergence should be justified
and reviewed as a separate proposal.

### Multi-variable bindings

C++ allows `for (auto [x, y] : range_of_pairs)` which is not explicitly part of
the syntax here. Carbon is likely to support this through tuples, so adding
special `for` syntax for this would likely be redundant.