diff --git a/docs/design/README.md b/docs/design/README.md index c7a788cef2b7c..50a6819e21d3a 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -27,7 +27,9 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Floating-point literals](#floating-point-literals) - [String types](#string-types) - [String literals](#string-literals) -- [Value categories and value phases](#value-categories-and-value-phases) +- [Values, objects, and expressions](#values-objects-and-expressions) + - [Expression categories](#expression-categories) + - [Value phases](#value-phases) - [Composite types](#composite-types) - [Tuples](#tuples) - [Struct types](#struct-types) @@ -43,6 +45,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Constant `let` declarations](#constant-let-declarations) - [Variable `var` declarations](#variable-var-declarations) - [`auto`](#auto) + - [Global constants and variables](#global-constants-and-variables) - [Functions](#functions) - [Parameters](#parameters) - [`auto` return type](#auto-return-type) @@ -396,10 +399,10 @@ Some values, such as `()` and `{}`, may even be used as types, but only act like types when they are in a type position, like after a `:` in a variable declaration or the return type after a `->` in a function declaration. Any expression in a type position must be -[a constants or symbolic value](#value-categories-and-value-phases) so the -compiler can resolve whether the value can be used as a type. This also puts -limits on how much operators can do different things for types. This is good for -consistency, but is a significant restriction on Carbon's design. +[a constant or symbolic value](#value-phases) so the compiler can resolve +whether the value can be used as a type. This also puts limits on how much +operators can do different things for types. This is good for consistency, but +is a significant restriction on Carbon's design. ## Primitive types @@ -637,21 +640,57 @@ are available for representing strings with `\`s and `"`s. > - Proposal > [#199: String literals](https://github.com/carbon-language/carbon-lang/pull/199) -## Value categories and value phases +## Values, objects, and expressions -Every expression has a -[value category](), -similar to [C++](https://en.cppreference.com/w/cpp/language/value_category), -that is either _l-value_ or _r-value_. Carbon will automatically convert an -l-value to an r-value, but not in the other direction. +Carbon has both abstract _values_ and concrete _objects_. Carbon _values_ are +things like `42`, `true`, and `i32` (a type value). Carbon _objects_ have +_storage_ where values can be read and written. Storage also allows taking the +address of an object in memory in Carbon. -L-value expressions refer to values that have storage and a stable address. They -may be modified, assuming their type is not [`const`](#const). +> References: +> +> - [Values, variables, and pointers](values.md) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) + +### Expression categories + +A Carbon expression produces a value, references an object, or initializes an +object. Every expression has a +[category](), +similar to [C++](https://en.cppreference.com/w/cpp/language/value_category): + +- [_Value expressions_](values.md#value-expressions) produce abstract, + read-only _values_ that cannot be modified or have their address taken. +- [_Reference expressions_](values.md#reference-expressions) refer to + _objects_ with _storage_ where a value may be read or written and the + object's address can be taken. +- [_Initializing expressions_](values.md#initializing-expressions) which + require storage to be provided implicitly when evaluating the expression. + The expression then initializes an object in that storage. These are used to + model function returns, which can construct the returned value directly in + the caller's storage. + +Expressions in one category can be converted to any other category when needed. +The primitive conversion steps used are: + +- _Value binding_ converts a reference expression into a value expression. +- _Direct initialization_ converts a value expression into an initializing + expression. +- _Copy initialization_ converts a reference expression into an initializing + expression. +- _Temporary materialization_ converts an initializing expression into a + reference expression. + +> References: +> +> - [Expression categories](values.md#expression-categories) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) -R-value expressions evaluate to values that may not have dedicated storage. This -means they cannot be modified and their address generally cannot be taken. The -values of r-value expressions are broken down into three kinds, called _value -phases_: +### Value phases + +Value expressions are also broken down into three _value phases_: - A _constant_ has a value known at compile time, and that value is available during type checking, for example to use as the size of an array. These @@ -674,7 +713,7 @@ to a runtime value: ```mermaid graph TD; A(constant)-->B(symbolic value)-->C(runtime value); - D(l-value)-->C; + D(reference expression)-->C; ``` Constants convert to symbolic values and to runtime values. Symbolic values will @@ -682,9 +721,7 @@ generally convert into runtime values if an operation that inspects the value is performed on them. Runtime values will convert into constants or to symbolic values if constant evaluation of the runtime expression succeeds. -> **Note:** Conversion of runtime values to other phases is provisional, as are -> the semantics of r-values. See pending proposal -> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006). +> **Note:** Conversion of runtime values to other phases is provisional. ## Composite types @@ -762,30 +799,25 @@ not support [pointer arithmetic](); the only pointer [operations](#expressions) are: -- Dereference: given a pointer `p`, `*p` gives the value `p` points to as an - [l-value](#value-categories-and-value-phases). `p->m` is syntactic sugar for - `(*p).m`. -- Address-of: given an [l-value](#value-categories-and-value-phases) `x`, `&x` +- Dereference: given a pointer `p`, `*p` gives the value `p` points to as a + [reference expression](#expression-categories). `p->m` is syntactic sugar + for `(*p).m`. +- Address-of: given a [reference expression](#expression-categories) `x`, `&x` returns a pointer to `x`. There are no [null pointers](https://en.wikipedia.org/wiki/Null_pointer) in Carbon. To represent a pointer that may not refer to a valid object, use the type `Optional(T*)`. -**TODO:** Perhaps Carbon will have +**Future work:** Perhaps Carbon will have [stricter pointer provenance](https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html) or restrictions on casts between pointers and integers. -> **Note:** While the syntax for pointers has been decided, the semantics of -> pointers are provisional, as is the syntax for optionals. See pending proposal -> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006). - > References: > -> - Question-for-leads issue -> [#520: should we use whitespace-sensitive operator fixity?](https://github.com/carbon-language/carbon-lang/issues/520) -> - Question-for-leads issue -> [#523: what syntax should we use for pointer types?](https://github.com/carbon-language/carbon-lang/issues/523) +> - [Pointers](values.md#pointers) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) ### Arrays and slices @@ -846,7 +878,7 @@ Some common expressions in Carbon include: `not e` - [Indexing](#arrays-and-slices): `a[3]` - [Function](#functions) call: `f(4)` - - [Pointer](#pointer-types): `*p`, `p->m`, `&x` + - [Pointer](expressions/pointer_operators.md): `*p`, `p->m`, `&x` - [Move](#move): `~x` - [Conditionals](expressions/if.md): `if c then t else f` @@ -875,6 +907,8 @@ are applied to convert the expression to the target type. > [#911: Conditional expressions](https://github.com/carbon-language/carbon-lang/pull/911) > - Proposal > [#1083: Arithmetic expressions](https://github.com/carbon-language/carbon-lang/pull/1083) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) ## Declarations, Definitions, and Scopes @@ -954,14 +988,15 @@ binding any name to it. Binding patterns default to _`let` bindings_. The `var` keyword is used to make it a _`var` binding_. -- The result of a `let` binding is the name is bound to an - [r-value](#value-categories-and-value-phases). This means the value cannot - be modified, and its address generally cannot be taken. -- A `var` binding has dedicated storage, and so the name is an - [l-value](#value-categories-and-value-phases) which can be modified and has - a stable address. +- A `let` binding binds a name to a value, so the name can be used as a + [value expression](#expression-categories). This means the value cannot be + modified, and its address generally cannot be taken. +- A `var` binding creates an object with dedicated storage, and so the name + can be used as a [reference expression](#expression-categories) which can be + modified and has a stable address. -A `let`-binding may be implemented as an alias for the original value (like a +A `let`-binding may be [implemented](values.md#value-expressions) as an alias +for the original value (like a [`const` reference in C++]()), or it may be copied from the original value (if it is copyable), or it may be moved from the original value (if it was a temporary). The Carbon @@ -971,9 +1006,8 @@ the program's correctness must not depend on which option the Carbon implementation chooses. A [generic binding](#checked-and-template-parameters) uses `:!` instead of a -colon (`:`) and can only match -[constant or symbolic values](#value-categories-and-value-phases), not run-time -values. +colon (`:`) and can only match [constant or symbolic values](#value-phases), not +run-time values. The keyword `auto` may be used in place of the type in a binding pattern, as long as the type can be deduced from the type of a value in the same @@ -1049,14 +1083,17 @@ Here `x: i64` is the pattern, which is followed by an equal sign (`=`) and the value to match, `42`. The names from [binding patterns](#binding-patterns) are introduced into the enclosing [scope](#declarations-definitions-and-scopes). -> **Note:** `let` declarations are provisional. See pending proposal -> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006). +> References: +> +> - [Binding patterns and local variables with `let` and `var`](values.md#binding-patterns-and-local-variables-with-let-and-var) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) ### Variable `var` declarations -A `var` declaration is similar, except with `var` bindings, so `x` here is an -[l-value](#value-categories-and-value-phases) with storage and an address, and -so may be modified: +A `var` declaration is similar, except with `var` bindings, so `x` here is a +[reference expression](#expression-categories) for an object with storage and an +address, and so may be modified: ```carbon var x: i64 = 42; @@ -1069,7 +1106,7 @@ they are used. > References: > -> - [Variables](variables.md) +> - [Binding patterns and local variables with `let` and `var`](values.md#binding-patterns-and-local-variables-with-let-and-var) > - Proposal > [#162: Basic Syntax](https://github.com/carbon-language/carbon-lang/pull/162) > - Proposal @@ -1078,6 +1115,8 @@ they are used. > [#339: Add `var [ = ];` syntax for variables](https://github.com/carbon-language/carbon-lang/pull/339) > - Proposal > [#618: var ordering](https://github.com/carbon-language/carbon-lang/pull/618) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) ### `auto` @@ -1098,6 +1137,21 @@ var z: auto = (y > 1); > - Proposal > [#851: auto keyword for vars](https://github.com/carbon-language/carbon-lang/pull/851) +### Global constants and variables + +[Constant `let` declarations](#constant-let-declarations) may occur at a global +scope as well as local and member scopes. However, there are currently no global +variables. + +> **Note**: The semantics of global constant declarations and absence of global +> variable declarations is currently provisional. +> +> We are exploring several different ideas for how to design less bug-prone +> patterns to replace the important use cases programmers still have for global +> variables. We may be unable to fully address them, at least for migrated code, +> and be forced to add some limited form of global variables back. We may also +> discover that their convenience outweighs any improvements afforded. + ## Functions Functions are the core unit of behavior. For example, this is a @@ -1149,7 +1203,7 @@ declaration. The parameter names in a forward declaration may be omitted using The bindings in the parameter list default to [`let` bindings](#binding-patterns), and so the parameter names are treated as -[r-values](#value-categories-and-value-phases). This is appropriate for input +[value expressions](#expression-categories). This is appropriate for input parameters. This binding will be implemented using a pointer, unless it is legal to copy and copying is cheaper. @@ -1167,9 +1221,11 @@ the caller, and dereferencing using `*` in the callee. Outputs of a function should prefer to be returned. Multiple values may be returned using a [tuple](#tuples) or [struct](#struct-types) type. -> **Note:** The semantics of parameter passing are provisional. See pending -> proposal -> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006). +> References: +> +> - [Binding patterns and local variables with `let` and `var`](values.md#binding-patterns-and-local-variables-with-let-and-var) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) ### `auto` return type @@ -1689,8 +1745,8 @@ two methods `Distance` and `Offset`: declaration. - `origin.Offset(`...`)` does modify the value of `origin`. This is signified using `[addr self: Self*]` in the method declaration. Since calling this - method requires taking the address of `origin`, it may only be called on - [non-`const`](#const) [l-values](#value-categories-and-value-phases). + method requires taking the [non-`const`](#const) address of `origin`, it may + only be called on [reference expressions](#expression-categories). - Methods may be declared lexically inline like `Distance`, or lexically out of line like `Offset`. @@ -1883,20 +1939,19 @@ type, use `UnsafeDelete`. #### `const` -> **Note:** This is provisional, no design for `const` has been through the -> proposal process yet. - For every type `MyClass`, there is the type `const MyClass` such that: - The data representation is the same, so a `MyClass*` value may be implicitly converted to a `(const MyClass)*`. -- A `const MyClass` [l-value](#value-categories-and-value-phases) may - automatically convert to a `MyClass` r-value, the same way that a `MyClass` - l-value can. +- A `const MyClass` [reference expression](#expression-categories) may + automatically convert to a `MyClass` value expression, the same way that a + `MyClass` reference expression can. - If member `x` of `MyClass` has type `T`, then member `x` of `const MyClass` has type `const T`. -- The API of a `const MyClass` is a subset of `MyClass`, excluding all methods - taking `[addr self: Self*]`. +- While all of the member names in `MyClass` are also member names in + `const MyClass`, the effective API of a `const MyClass` reference expression + is a subset of `MyClass`, because only `addr` methods accepting a + `const Self*` will be valid. Note that `const` binds more tightly than postfix-`*` for forming a pointer type, so `const MyClass*` is equal to `(const MyClass)*`. @@ -1911,8 +1966,8 @@ var origin: Point = {.x = 0, .y = 0}; // `const Point*`: let p: const Point* = &origin; -// ✅ Allowed conversion of `const Point` l-value -// to `Point` r-value. +// ✅ Allowed conversion of `const Point` reference expression +// to `Point` value expression. let five: f32 = p->Distance(3, 4); // ❌ Error: mutating method `Offset` excluded @@ -1924,6 +1979,12 @@ p->Offset(3, 4); p->x += 2; ``` +> References: +> +> - [`const`-qualified types](values.md#const-qualified-types) +> - Proposal +> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006) + #### Unformed state Types indicate that they support unformed states by @@ -1967,8 +2028,6 @@ value. > **Note:** This is provisional. The move operator was discussed but not > proposed in accepted proposal > [#257: Initialization of memory and variables](https://github.com/carbon-language/carbon-lang/pull/257). -> See pending proposal -> [#2006: Values, variables, pointers, and references](https://github.com/carbon-language/carbon-lang/pull/2006). #### Mixins @@ -2664,8 +2723,8 @@ templates. Constraints can then be added incrementally, with the compiler verifying that the semantics stay the same. Once all constraints have been added, removing the word `template` to switch to a checked parameter is safe. -The [value phase](#value-categories-and-value-phases) of a checked parameter is -a symbolic value whereas the value phase of a template parameter is constant. +The [value phase](#value-phases) of a checked parameter is a symbolic value +whereas the value phase of a template parameter is constant. Although checked generics are generally preferred, templates enable translation of code between C++ and Carbon, and address some cases where the type checking @@ -3111,9 +3170,10 @@ The interfaces that correspond to each operator are given by: The [logical operators can not be overloaded](expressions/logical_operators.md#overloading). -Operators that result in [l-values](#value-categories-and-value-phases), such as -dereferencing `*p` and indexing `a[3]`, have interfaces that return the address -of the value. Carbon automatically dereferences the pointer to get the l-value. +Operators that result in [reference expressions](#expression-categories), such +as dereferencing `*p` and indexing `a[3]`, have interfaces that return the +address of the value. Carbon automatically dereferences the pointer to form the +reference expression. Operators that can take multiple arguments, such as function calling operator `f(4)`, have a [variadic](generics/details.md#variadic-arguments) parameter diff --git a/docs/design/control_flow/return.md b/docs/design/control_flow/return.md index 57cbcbe57db70..09bf364491f37 100644 --- a/docs/design/control_flow/return.md +++ b/docs/design/control_flow/return.md @@ -80,8 +80,8 @@ fn MaybeDraw(should_draw: bool) -> () { ### `returned var` -[Variables](../variables.md) may be declared with a `returned` statement. Its -syntax is: +[Local variables](../values.md#binding-patterns-and-local-variables-with-let-and-var) +may be declared with a `returned` statement. Its syntax is: > `returned` _var statement_ diff --git a/docs/design/expressions/README.md b/docs/design/expressions/README.md index 65755a9017b10..e6c68d2773abc 100644 --- a/docs/design/expressions/README.md +++ b/docs/design/expressions/README.md @@ -61,10 +61,27 @@ graph BT unqualifiedName["x"] click unqualifiedName "https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/README.md#unqualified-names" + top((" ")) + memberAccess>"x.y
- x.(...)"] + x.(...)
+ x->y
+ x->(...)"] click memberAccess "https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/member_access.md" + constType["const T"] + click pointer-type "https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/type_operators.md" + + pointerType>"T*"] + click pointer-type "https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/type_operators.md" + + %% FIXME: Need to switch unary operators from a left/right associativity to + %% a "repeated" marker, as we only have one direction for associativity and + %% that is wrong in this specific case. + pointer>"*x
+ &x
"] + click pointer "https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/pointer.md" + negation["-x"] click negation "https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/arithmetic.md" @@ -124,15 +141,22 @@ graph BT expressionEnd["x;"] - memberAccess --> parens & braces & unqualifiedName - negation --> memberAccess - complement --> memberAccess + top --> parens & braces & unqualifiedName + + constType --> top + pointerType --> constType + as --> pointerType + + memberAccess --> top + pointer --> memberAccess + negation --> pointer + complement --> pointer unary --> negation & complement %% Use a longer arrow here to put `not` next to `and` and `or`. - not -----> memberAccess - multiplication & modulo & as & bitwise_and & bitwise_or & bitwise_xor & shift --> unary + not -------> memberAccess + as & multiplication & modulo & bitwise_and & bitwise_or & bitwise_xor & shift --> unary addition --> multiplication - comparison --> modulo & addition & as & bitwise_and & bitwise_or & bitwise_xor & shift + comparison --> as & addition & modulo & bitwise_and & bitwise_or & bitwise_xor & shift logicalOperand --> comparison & not and & or --> logicalOperand logicalExpression --> and & or @@ -179,12 +203,14 @@ keyword and is not preceded by a period (`.`). ### Qualified names and member access -A _qualified name_ is a word that appears immediately after a period. Qualified -names appear in the following contexts: +A _qualified name_ is a word that appears immediately after a period or +rightward arrow. Qualified names appear in the following contexts: - [Designators](/docs/design/classes.md#literals): `.` _word_ - [Simple member access expressions](member_access.md): _expression_ `.` _word_ +- [Simple pointer member access expressions](member_access.md): _expression_ + `->` _word_ ``` var x: auto = {.hello = 1, .world = 2}; @@ -194,6 +220,10 @@ var x: auto = {.hello = 1, .world = 2}; x.hello = x.world; ^^^^^ ^^^^^ qualified name ^^^^^^^ ^^^^^^^ member access expression + +x.hello = (&x)->world; + ^^^^^ qualified name + ^^^^^^^^^^^ pointer member access expression ``` Qualified names refer to members of an entity determined by the context in which @@ -231,6 +261,7 @@ complex than a single _word_, a compound member access expression can be used, with parentheses around the member name: - _expression_ `.` `(` _expression_ `)` +- _expression_ `->` `(` _expression_ `)` ``` interface I { fn F[self: Self](); } @@ -241,34 +272,40 @@ impl X as I { fn F[self: Self]() {} } fn Q(x: X) { x.(I.F)(); } ``` +Either simple or compound member access can be part of a _pointer_ member access +expression when an `->` is used instead of a `.`, where _expression_ `->` _..._ +is syntactic sugar for `(` `*` _expression_ `)` `.` _..._. + ## Operators Most expressions are modeled as operators: -| Category | Operator | Syntax | Function | -| ---------- | ------------------------------- | --------- | --------------------------------------------------------------------- | -| Arithmetic | [`-`](arithmetic.md) (unary) | `-x` | The negation of `x`. | -| Bitwise | [`^`](bitwise.md) (unary) | `^x` | The bitwise complement of `x`. | -| Arithmetic | [`+`](arithmetic.md) | `x + y` | The sum of `x` and `y`. | -| Arithmetic | [`-`](arithmetic.md) (binary) | `x - y` | The difference of `x` and `y`. | -| Arithmetic | [`*`](arithmetic.md) | `x * y` | The product of `x` and `y`. | -| Arithmetic | [`/`](arithmetic.md) | `x / y` | `x` divided by `y`, or the quotient thereof. | -| Arithmetic | [`%`](arithmetic.md) | `x % y` | `x` modulo `y`. | -| Bitwise | [`&`](bitwise.md) | `x & y` | The bitwise AND of `x` and `y`. | -| Bitwise | [`\|`](bitwise.md) | `x \| y` | The bitwise OR of `x` and `y`. | -| Bitwise | [`^`](bitwise.md) (binary) | `x ^ y` | The bitwise XOR of `x` and `y`. | -| Bitwise | [`<<`](bitwise.md) | `x << y` | `x` bit-shifted left `y` places. | -| Bitwise | [`>>`](bitwise.md) | `x >> y` | `x` bit-shifted right `y` places. | -| Conversion | [`as`](as_expressions.md) | `x as T` | Converts the value `x` to the type `T`. | -| Comparison | [`==`](comparison_operators.md) | `x == y` | Equality: `true` if `x` is equal to `y`. | -| Comparison | [`!=`](comparison_operators.md) | `x != y` | Inequality: `true` if `x` is not equal to `y`. | -| Comparison | [`<`](comparison_operators.md) | `x < y` | Less than: `true` if `x` is less than `y`. | -| Comparison | [`<=`](comparison_operators.md) | `x <= y` | Less than or equal: `true` if `x` is less than or equal to `y`. | -| Comparison | [`>`](comparison_operators.md) | `x > y` | Greater than: `true` if `x` is greater than to `y`. | -| Comparison | [`>=`](comparison_operators.md) | `x >= y` | Greater than or equal: `true` if `x` is greater than or equal to `y`. | -| Logical | [`and`](logical_operators.md) | `x and y` | A short-circuiting logical AND: `true` if both operands are `true`. | -| Logical | [`or`](logical_operators.md) | `x or y` | A short-circuiting logical OR: `true` if either operand is `true`. | -| Logical | [`not`](logical_operators.md) | `not x` | Logical NOT: `true` if the operand is `false`. | +| Category | Operator | Syntax | Function | +| ---------- | ----------------------------------- | --------- | --------------------------------------------------------------------- | +| Pointer | [`*`](pointer_operators.md) (unary) | `*x` | Pointer dereference: the object pointed to by `x`. | +| Pointer | [`&`](pointer_operators.md) (unary) | `&x` | Address-of: a pointer to the object `x`. | +| Arithmetic | [`-`](arithmetic.md) (unary) | `-x` | The negation of `x`. | +| Bitwise | [`^`](bitwise.md) (unary) | `^x` | The bitwise complement of `x`. | +| Arithmetic | [`+`](arithmetic.md) | `x + y` | The sum of `x` and `y`. | +| Arithmetic | [`-`](arithmetic.md) (binary) | `x - y` | The difference of `x` and `y`. | +| Arithmetic | [`*`](arithmetic.md) | `x * y` | The product of `x` and `y`. | +| Arithmetic | [`/`](arithmetic.md) | `x / y` | `x` divided by `y`, or the quotient thereof. | +| Arithmetic | [`%`](arithmetic.md) | `x % y` | `x` modulo `y`. | +| Bitwise | [`&`](bitwise.md) | `x & y` | The bitwise AND of `x` and `y`. | +| Bitwise | [`\|`](bitwise.md) | `x \| y` | The bitwise OR of `x` and `y`. | +| Bitwise | [`^`](bitwise.md) (binary) | `x ^ y` | The bitwise XOR of `x` and `y`. | +| Bitwise | [`<<`](bitwise.md) | `x << y` | `x` bit-shifted left `y` places. | +| Bitwise | [`>>`](bitwise.md) | `x >> y` | `x` bit-shifted right `y` places. | +| Conversion | [`as`](as_expressions.md) | `x as T` | Converts the value `x` to the type `T`. | +| Comparison | [`==`](comparison_operators.md) | `x == y` | Equality: `true` if `x` is equal to `y`. | +| Comparison | [`!=`](comparison_operators.md) | `x != y` | Inequality: `true` if `x` is not equal to `y`. | +| Comparison | [`<`](comparison_operators.md) | `x < y` | Less than: `true` if `x` is less than `y`. | +| Comparison | [`<=`](comparison_operators.md) | `x <= y` | Less than or equal: `true` if `x` is less than or equal to `y`. | +| Comparison | [`>`](comparison_operators.md) | `x > y` | Greater than: `true` if `x` is greater than to `y`. | +| Comparison | [`>=`](comparison_operators.md) | `x >= y` | Greater than or equal: `true` if `x` is greater than or equal to `y`. | +| Logical | [`and`](logical_operators.md) | `x and y` | A short-circuiting logical AND: `true` if both operands are `true`. | +| Logical | [`or`](logical_operators.md) | `x or y` | A short-circuiting logical OR: `true` if either operand is `true`. | +| Logical | [`not`](logical_operators.md) | `not x` | Logical NOT: `true` if the operand is `false`. | The binary arithmetic and bitwise operators also have [compound assignment](/docs/design/assignment.md) forms. These are statements diff --git a/docs/design/expressions/indexing.md b/docs/design/expressions/indexing.md index 16a657491165c..aab8b0928f98e 100644 --- a/docs/design/expressions/indexing.md +++ b/docs/design/expressions/indexing.md @@ -23,14 +23,17 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception ## Overview Carbon supports indexing using the conventional `a[i]` subscript syntax. When -`a` is an l-value, the result of subscripting is always an l-value, but when `a` -is an r-value, the result can be an l-value or an r-value, depending on which +`a` is a +[durable reference expression](/docs/design/values.md#durable-reference-expressions), +the result of subscripting is also a durable reference expression, but when `a` +is a [value expression](/docs/design/values.md#value-expressions), the result +can be a durable reference expression or a value expression, depending on which interface the type implements: -- If subscripting an r-value produces an r-value result, as with an array, the - type should implement `IndexWith`. -- If subscripting an r-value produces an l-value result, as with C++'s - `std::span`, the type should implement `IndirectIndexWith`. +- If subscripting a value expression produces a value expression, as with an + array, the type should implement `IndexWith`. +- If subscripting a value expression produces a durable reference expression, + as with C++'s `std::span`, the type should implement `IndirectIndexWith`. `IndirectIndexWith` is a subtype of `IndexWith`, and subscript expressions are rewritten to method calls on `IndirectIndexWith` if the type is known to @@ -39,6 +42,19 @@ implement that interface, or to method calls on `IndexWith` otherwise. `IndirectIndexWith` provides a final blanket `impl` of `IndexWith`, so a type can implement at most one of those two interfaces. +The `Addr` methods of these interfaces, which are used to form durable reference +expressions on indexing, must return a pointer and work similarly to the +[pointer dereference customization interface](/docs/design/values.md#dereferencing-customization). +The returned pointer is then dereferenced by the language to form the reference +expression referring to the pointed-to object. These methods must return a raw +pointer, and do not automatically chain with customized dereference interfaces. + +**Open question:** It's not clear that the lack of chaining is necessary, and it +might be more expressive for the pointer type returned by the `Addr` methods to +be an associated type with a default to allow types to produce custom +pointer-like types on their indexing boundary and have them still be +automatically dereferenced. + ## Details A subscript expression has the form "_lhs_ `[` _index_ `]`". As in C++, this @@ -61,13 +77,15 @@ interface IndirectIndexWith(SubscriptType:! type) { ``` A subscript expression where _lhs_ has type `T` and _index_ has type `I` is -rewritten based on the value category of _lhs_ and whether `T` is known to +rewritten based on the expression category of _lhs_ and whether `T` is known to implement `IndirectIndexWith(I)`: - If `T` implements `IndirectIndexWith(I)`, the expression is rewritten to "`*((` _lhs_ `).(IndirectIndexWith(I).Addr)(` _index_ `))`". -- Otherwise, if _lhs_ is an l-value, the expression is rewritten to "`*((` - _lhs_ `).(IndexWith(I).Addr)(` _index_ `))`". +- Otherwise, if _lhs_ is a + [_durable reference expression_](/docs/design/values.md#durable-reference-expressions), + the expression is rewritten to "`*((` _lhs_ `).(IndexWith(I).Addr)(` _index_ + `))`". - Otherwise, the expression is rewritten to "`(` _lhs_ `).(IndexWith(I).At)(` _index_ `)`". @@ -136,3 +154,5 @@ Carbon API. - Proposal [#2274: Subscript syntax and semantics](https://github.com/carbon-language/carbon-lang/pull/2274) +- Proposal + [#2006: Values, variables, and pointers](https://github.com/carbon-language/carbon-lang/pull/2006) diff --git a/docs/design/expressions/member_access.md b/docs/design/expressions/member_access.md index d020035585e23..5a53a763ca026 100644 --- a/docs/design/expressions/member_access.md +++ b/docs/design/expressions/member_access.md @@ -29,9 +29,12 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception ## Overview A _qualified name_ is a [word](../lexical_conventions/words.md) that is preceded -by a period. The name is found within a contextually determined entity: +by a period or a rightward arrow. The name is found within a contextually +determined entity: - In a member access expression, this is the entity preceding the period. +- In a pointer member access expression, this is the entity pointed to by the + pointer preceding the rightward arrow. - For a designator in a struct literal, the name is introduced as a member of the struct type. @@ -43,10 +46,12 @@ A member access expression is either a _simple_ member access expression of the form: - _member-access-expression_ ::= _expression_ `.` _word_ +- _member-access-expression_ ::= _expression_ `->` _word_ or a _compound_ member access of the form: - _member-access-expression_ ::= _expression_ `.` `(` _expression_ `)` +- _member-access-expression_ ::= _expression_ `->` `(` _expression_ `)` Compound member accesses allow specifying a qualified member name. @@ -66,14 +71,26 @@ class Cog { fn GrowSomeCogs() { var cog1: Cog = Cog.Make(1); var cog2: Cog = cog1.Make(2); + var cog_pointer: Cog* = &cog2; let cog1_size: i32 = cog1.size; cog1.Grow(1.5); cog2.(Cog.Grow)(cog1_size as f64); cog1.(Widget.Grow)(1.1); cog2.(Widgets.Cog.(Widgets.Widget.Grow))(1.9); + cog_pointer->Grow(0.75); + cog_pointer->(Widget.Grow)(1.2); } ``` +Pointer member access expressions are those using a `->` instead of a `.` and +their semantics are exactly what would result from first dereferencing the +expression preceding the `->` and then forming a member access expression using +a `.`. For example, a simple pointer member access expression _expression_ `->` +_word_ becomes `(` `*` _expression_ `)` `.` _word_. More details on this syntax +and semantics can be found in the [pointers](/docs/design/values.md#pointers) +design. The rest of this document describes the semantics using `.` alone for +simplicity. + A member access expression is processed using the following steps: - First, the word or parenthesized expression to the right of the `.` is diff --git a/docs/design/expressions/pointer_operators.md b/docs/design/expressions/pointer_operators.md new file mode 100644 index 0000000000000..a6b997d501abc --- /dev/null +++ b/docs/design/expressions/pointer_operators.md @@ -0,0 +1,57 @@ +# Pointer operators + + + + + +## Table of contents + +- [Overview](#overview) +- [Details](#details) + - [Precedence](#precedence) +- [Alternatives considered](#alternatives-considered) +- [References](#references) + + + +## Overview + +Carbon provides the following operators related to pointers: + +- `&` as a prefix unary operator takes the address of an object, forming a + pointer to it. +- `*` as a prefix unary operator dereferences a pointer. + +Note that [member access expressions](member_access.md) include an `->` form +that implicitly performs a dereference in the same way as the `*` operator. + +## Details + +The semantic details of pointer operators are collected in the main +[pointers](/docs/design/values.md#pointers) design. The syntax and precedence +details are covered here. + +The syntax tries to remain as similar as possible to C++ pointer types as they +are commonly written in code and are expected to be extremely common and a key +anchor of syntactic similarity between the languages. + +### Precedence + +These operators have high precedence. Only [member access](member_access.md) +expressions can be used as an unparenthesized operand to them. + +The two prefix operators `&` and `*` are generally above the other unary and +binary operators and can appear inside them as unparenthesized operands. For the +full details, see the [precedence graph](README.md#precedence). + +## Alternatives considered + +- [Alternative pointer syntaxes](/proposals/p2006.md#alternative-pointer-syntaxes) + +## References + +- [Proposal #2006: Values, variables, and pointers](/proposals/p2006.md) diff --git a/docs/design/expressions/type_operators.md b/docs/design/expressions/type_operators.md new file mode 100644 index 0000000000000..b849756c69f9c --- /dev/null +++ b/docs/design/expressions/type_operators.md @@ -0,0 +1,67 @@ +# Type operators + + + + + +## Table of contents + +- [Overview](#overview) +- [Details](#details) + - [Precedence](#precedence) +- [Alternatives considered](#alternatives-considered) +- [References](#references) + + + +## Overview + +Carbon provides the following operators to transform types: + +- `const` as a prefix unary operator produces a `const`-qualified type. +- `*` as a postfix unary operator produces a pointer _type_ to some other + type. + +The pointer type operator is also covered as one of the +[pointer operators](pointer_operators.md). + +## Details + +The semantic details of both `const`-qualified types and pointer types are +provided as part of the [values](/docs/design/values.md) design: + +- [`const`-qualified types](/docs/design/values.md#const-qualified-types) +- [Pointers](/docs/design/values.md#pointers) + +The syntax of these operators tries to mimic the most common appearance of +`const` types and pointer types in C++. + +### Precedence + +Because these are type operators, they don't have many precedence relationship +with non-type operators. + +- `const` binds more tightly than `*` and can appear unparenthesized in an + operand, despite being both a unary operator and having whitespace + separating it. + - This allows the syntax of a pointer to a `const i32` to be `const i32*`, + which is intended to be familiar to C++ developers. + - Forming a `const` pointer type requires parentheses: `const (i32*)`. +- All type operators bind more tightly than `as` so they can be used in its + type operand. + - This also allows a desirable transitive precedence with `if`: + `if condition then T* else U*`. + +## Alternatives considered + +- [Alternative pointer syntaxes](/proposals/p2006.md#alternative-pointer-syntaxes) +- [Alternative syntaxes for locals](/proposals/p2006.md#alternative-syntaxes-for-locals) +- [Make `const` a postfix rather than prefix operator](/proposals/p2006.md#make-const-a-postfix-rather-than-prefix-operator) + +## References + +- [Proposal #2006: Values, variables, and pointers](/proposals/p2006.md) diff --git a/docs/design/generics/terminology.md b/docs/design/generics/terminology.md index d959eb0eb6fab..dd1742d8aa008 100644 --- a/docs/design/generics/terminology.md +++ b/docs/design/generics/terminology.md @@ -287,7 +287,7 @@ _Binding patterns_ associate a name with a type and a value. This is used to declare function parameters, in `let` and `var` declarations, as well as to declare [generic parameters](#generic-means-compile-time-parameterized). There are three kinds of binding patterns, corresponding to -[the three value phases](/docs/design/README.md#value-categories-and-value-phases): +[the three value phases](/docs/design/README.md#value-phases): - A _runtime binding pattern_ binds to a dynamic value at runtime, and is written using a `:`, as in `x: i32`. diff --git a/docs/design/type_inference.md b/docs/design/type_inference.md index 599eced4da135..06067ad705a36 100644 --- a/docs/design/type_inference.md +++ b/docs/design/type_inference.md @@ -22,7 +22,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception [Type inference](https://en.wikipedia.org/wiki/Type_inference) occurs in Carbon when the `auto` keyword is used. This may occur in -[variable declarations](variables.md) or [function declarations](functions.md). +[variable declarations](values.md#binding-patterns-and-local-variables-with-let-and-var) +or [function declarations](functions.md). At present, type inference is very simple: given the expression which generates the value to be used for type inference, the inferred type is the precise type @@ -30,7 +31,8 @@ of that expression. For example, the inferred type for `auto` in `fn Foo(x: i64) -> auto { return x; }` is `i64`. Type inference is currently supported for [function return types](functions.md) -and [declared variable types](variables.md). +and +[declared variable types](values.md#binding-patterns-and-local-variables-with-let-and-var). ## Open questions diff --git a/docs/design/values.md b/docs/design/values.md new file mode 100644 index 0000000000000..1224b6bd5eb4d --- /dev/null +++ b/docs/design/values.md @@ -0,0 +1,1044 @@ +# Values, variables, and pointers + + + + + +## Table of contents + +- [Values, objects, and expressions](#values-objects-and-expressions) + - [Expression categories](#expression-categories) + - [Value binding](#value-binding) + - [Direct initialization](#direct-initialization) + - [Copy initialization](#copy-initialization) + - [Temporary materialization](#temporary-materialization) +- [Binding patterns and local variables with `let` and `var`](#binding-patterns-and-local-variables-with-let-and-var) + - [Local variables](#local-variables) + - [Consuming function parameters](#consuming-function-parameters) +- [Reference expressions](#reference-expressions) + - [Durable reference expressions](#durable-reference-expressions) + - [Ephemeral reference expressions](#ephemeral-reference-expressions) +- [Value expressions](#value-expressions) + - [Comparison to C++ parameters](#comparison-to-c-parameters) + - [Polymorphic types](#polymorphic-types) + - [Interop with C++ `const &` and `const` methods](#interop-with-c-const--and-const-methods) + - [Escape hatches for value addresses in Carbon](#escape-hatches-for-value-addresses-in-carbon) +- [Initializing expressions](#initializing-expressions) + - [Function calls and returns](#function-calls-and-returns) + - [Deferred initialization from values and references](#deferred-initialization-from-values-and-references) + - [Declared `returned` variable](#declared-returned-variable) +- [Pointers](#pointers) + - [Reference types](#reference-types) + - [Pointer syntax](#pointer-syntax) + - [Dereferencing customization](#dereferencing-customization) +- [`const`-qualified types](#const-qualified-types) +- [Lifetime overloading](#lifetime-overloading) +- [Value representation and customization](#value-representation-and-customization) +- [Alternatives considered](#alternatives-considered) +- [References](#references) + + + +## Values, objects, and expressions + +Carbon has both abstract _values_ and concrete _objects_. Carbon _values_ are +things like `42`, `true`, and `i32` (a type value). Carbon _objects_ have +_storage_ where values can be read and written. Storage also allows taking the +address of an object in memory in Carbon. + +Both objects and values can be nested within each other. For example +`(true, true)` is both a value and also contains two sub-values. When a +two-tuple is stored somewhere, it is both a tuple-typed object and contains two +subobjects. + +These terms are important components in the describing the semantics of Carbon +code, but they aren't sufficient. We also need to explicitly and precisely talk +about the Carbon _expressions_ that produce or reference values and objects. +Categorizing the expressions themselves allows us to be more precise and +articulate important differences not captured without looking at the expression +itself. + +### Expression categories + +There are three expression categories in Carbon: + +- [_Value expressions_](#value-expressions) produce abstract, read-only + _values_ that cannot be modified or have their address taken. +- [_Reference expressions_](#reference-expressions) refer to _objects_ with + _storage_ where a value may be read or written and the object's address can + be taken. +- [_Initializing expressions_](#initializing-expressions) require storage to + be provided implicitly when evaluating the expression. The expression then + initializes an object in that storage. These are used to model function + returns, which can construct the returned value directly in the caller's + storage. + +Expressions in one category can be converted to any other category when needed. +The primitive conversion steps used are: + +- [_Value binding_](#value-binding) forms a value expression from the current + value of the object referenced by a reference expression. +- [_Direct initialization_](#direct-initialization) converts a value + expression into an initializing expression. +- [_Copy initialization_](#copy-initialization) converts a reference + expression into an initializing expression. +- [_Temporary materialization_](#temporary-materialization) converts an + initializing expression into a reference expression. + +These conversion steps combine to provide the transitive conversion table: + +| From: | value | reference | initializing | +| ------------------: | ------------------------- | --------- | ------------------ | +| to **value** | == | bind | materialize + bind | +| to **reference** | direct init + materialize | == | materialize | +| to **initializing** | direct init | copy init | == | + +Reference expressions formed through temporary materialization are called +[_ephemeral reference expressions_](#ephemeral-reference-expressions) and have +restrictions on how they are used. In contrast, reference expressions that refer +to declared storage are called +[_durable reference expressions_](#durable-reference-expressions). Beyond the +restrictions on what is valid, there is no distinction in their behavior or +semantics. + +#### Value binding + +We call forming a value expression from a reference expression _value binding_. +This forms a value expression that will evaluate to the value of the object in +the referenced storage of the reference expression. It may do this by eagerly +reading that value into a machine register, lazily reading that value on-demand +into a machine register, or in some other way modeling that abstract value. + +See the [value expressions](#value-expressions) section for more details on the +semantics of value expressions. + +#### Direct initialization + +This is the first way we have of initializing storage of an object. There may +not be storage for the source of this initialization, as the value expression +used for the initialization may be in a machine register or simply be abstractly +modeled like a source literal. A canonical example here is zeroing an object. + +#### Copy initialization + +This initializes storage for an object based on some other object which already +has initialized storage. A classic example here are types which can be copied +trivially and where this is implemented as a `memcpy` of their underlying bytes. + +#### Temporary materialization + +We use temporary materialization when we need to initialize an object by way of +storage, but weren't provided dedicate storage and can simply bind the result to +a value afterward. + +> **Open question:** The lifetimes of temporaries is not yet specified. + +## Binding patterns and local variables with `let` and `var` + +[_Binding patterns_](/docs/design/README.md#binding-patterns) introduce names +that are [_value expressions_](#value-expressions) by default and are called +_value bindings_. This is the desired default for many pattern contexts, +especially function parameters. Values are a good model for "input" function +parameters which are the dominant and default style of function parameters: + +```carbon +fn Sum(x: i32, y: i32) -> i32 { + // `x` and `y` are value expressions here. We can use their value, but not + // modify them or take their address. + return x + y; +} +``` + +Value bindings require the matched expression to be a _value expression_, +converting it into one as necessary. + +A _variable pattern_ can be introduced with the `var` keyword to create an +object with storage when matched. Every binding pattern name introduced within a +variable pattern is called a _variable binding_ and forms a +[_durable reference expression_](#durable-reference-expressions) to an object +within the variable pattern's storage when used. Variable patterns require their +matched expression to be an _initializing expression_ and provide their storage +to it to be initialized. + +```carbon +fn MutateThing(ptr: i64*); + +fn Example() { + // `1` starts as a value expression, which is what a `let` binding expects. + let x: i64 = 1; + + // `2` also starts as a value expression, but the variable binding requires it + // to be converted to an initializing expression by using the value `2` to + // initialize the provided variable storage that `y` will refer to. + var y: i64 = 2; + + // Allowed to take the address and mutate `y` as it is a durable reference + // expression. + MutateThing(&y); + + // ❌ This would be an error though due to trying to take the address of the + // value expression `x`. + MutateThing(&x); +} +``` + +### Local variables + +A local binding pattern can be introduced with either the `let` or `var` +keyword. The `let` introducer begins a value pattern which works the same as the +default patterns in other contexts. The `var` introducer immediately begins a +variable pattern. + +- `let` _identifier_`:` _( expression |_ `auto` _)_ `=` _value_`;` +- `var` _identifier_`:` _( expression |_ `auto` _) [_ `=` _value ]_`;` + +These are just simple examples of binding patterns used directly in local +declarations. Local `let` and `var` declarations build on Carbon's general +[pattern matching](/docs/design/pattern_matching.md) design. `var` declarations +implicitly start off within a `var` pattern. `let` declarations introduce +patterns that bind values by default, the same as function parameters and most +other pattern contexts. + +The general pattern matching model also allows nesting `var` sub-patterns within +a larger pattern that defaults to matching values. For example, we can combine +the two local declarations above into one destructuring declaration with an +inner `var` pattern here: + +```carbon +fn DestructuringExample() { + // Both `1` and `2` start as value expressions. The `x` binding directly + // matches `1`. For `2`, the variable binding requires it to be converted to + // an initializing expression by using the value `2` to initialize the + // provided variable storage that `y` will refer to. + let (x: i64, var y: i64) = (1, 2); + + // Just like above, we can take the address and mutate `y`: + MutateThing(&y); + + // ❌ And this remains an error: + MutateThing(&x); +} +``` + +If `auto` is used in place of the type for a local binding pattern, +[type inference](type_inference.md) is used to automatically determine the +variable's type. + +These local bindings introduce names scoped to the code block in which they +occur, which will typically be marked by an open brace (`{`) and close brace +(`}`). + +### Consuming function parameters + +Just as part of a `let` binding can use a `var` prefix to become a variable +pattern and bind names that will form reference expressions to the variable's +storage, so can function parameters: + +```carbon +fn Consume(var x: SomeData) { + // We can mutate and use variable that `x` refers to here. +} +``` + +This allows us to model an important special case of function inputs -- those +that are _consumed_ by the function, either through local processing or being +moved into some persistent storage. Marking these in the pattern and thus +signature of the function changes the expression category required for arguments +in the caller. These arguments are required to be _initializing expressions_, +potentially being converted into such an expression if necessary, that directly +initialize storage dedicated-to and owned-by the function parameter. + +This pattern serves the same purpose as C++'s pass-by-value when used with types +that have non-trivial resources attached to pass ownership into the function and +consume the resource. But rather than that being the seeming _default_, Carbon +makes this a use case that requires a special marking on the declaration. + +## Reference expressions + +_Reference expressions_ refer to _objects_ with _storage_ where a value may be +read or written and the object's address can be taken. + +Calling a [method](/docs/design/classes.md#methods) on a reference expression +where the method's `self` parameter has an `addr` specifier can always +implicitly take the address of the referred-to object. This address is passed as +a [pointer](#pointers) to the `self` parameter for such methods. + +There are two sub-categories of reference expressions: _durable_ and +_ephemeral_. These refine the _lifetime_ of the underlying storage and provide +safety restrictions reflecting that lifetime. + +### Durable reference expressions + +_Durable reference expressions_ are those where the object's storage outlives +the full expression and the address could be meaningfully propagated out of it +as well. + +There are two contexts that require a durable reference expression in Carbon: + +- [Assignment statements](/docs/design/assignment.md) require the + left-hand-side of the `=` to be a durable reference. This stronger + requirement is enforced before the expression is rewritten to dispatch into + the `Carbon.Assign.Op` interface method. +- [Address-of expressions](#pointer-syntax) require their operand to be a + durable reference and compute the address of the referenced object. + +There are several kinds of expressions that produce durable references in +Carbon: + +- Names of objects introduced with a + [variable binding](#binding-patterns-and-local-variables-with-let-and-var): + `x` +- Dereferenced [pointers](#pointers): `*p` +- Names of subobjects through member access to some other durable reference + expression: `x.member` or `p->member` +- [Indexing](/docs/design/expressions/indexing.md) into a type similar to + C++'s `std::span` that implements `IndirectIndexWith`, or indexing into any + type with a durable reference expression such as `local_array[i]`. + +Durable reference expressions can only be produced _directly_ by one of these +expressions. They are never produced by converting one of the other expression +categories into a reference expression. + +### Ephemeral reference expressions + +We call the reference expressions formed through +[temporary materialization](#temporary-materialization) _ephemeral reference +expressions_. They still refer to an object with storage, but it may be storage +that will not outlive the full expression. Because the storage is only +temporary, we impose restrictions on where these reference expressions can be +used: their address can only be taken implicitly as part of a method call whose +`self` parameter is marked with the `addr` specifier. + +**Future work:** The current design allows directly requiring an ephemeral +reference for `addr`-methods because this replicates the flexibility in C++ -- +very few C++ methods are L-value-ref-qualified which would have a similar effect +to `addr`-methods requiring a durable reference expression. This is leveraged +frequently in C++ for builder APIs and other patterns. However, Carbon provides +more tools in this space than C++ already, and so it may be worth evaluating +whether we can switch `addr`-methods to the same restrictions as assignment and +`&`. Temporaries would never have their address escaped (in a safe way) in that +world and there would be fewer different kinds of entities. But this is reserved +for future work as we should be very careful about the expressivity hit being +tolerable both for native-Carbon API design and for migrated C++ code. + +## Value expressions + +A value cannot be mutated, cannot have its address taken, and may not have +storage at all or a stable address of storage. Values are abstract constructs +like function input parameters and constants. They can be formed in two ways -- +a literal expression like `42`, or by reading the value of some stored object. + +A core goal of values in Carbon is to provide a single model that can get both +the efficiency of passing by value when working with small types such as those +that fit into a machine register, but also the efficiency of minimal copies when +working with types where a copy would require extra allocations or other costly +resources. This directly helps programmers by providing a simpler model to +select the mechanism of passing function inputs. But it is also important to +enable generic code that needs a single type model that will have generically +good performance. + +When forming a value expression from a reference expression, Carbon +[binds](#value-binding) the referenced object to that value expression. This +allows immediately reading from the object's storage into a machine register or +a copy if desired, but does not require that. The read of the underlying object +can also be deferred until the value expression itself is used. Once an object +is bound to a value expression in this way, any mutation to the object or its +storage ends the lifetime of the value binding, and makes any use of the value +expression an error. + +> Note: this is _not_ intended to ever become "undefined behavior", but instead +> just "erroneous". We want to be able to detect and report such code as having +> a bug, but do not want unbounded UB and are not aware of important +> optimizations that this would inhibit. +> +> _Open issue:_ We need a common definition of erroneous behavior that we can +> use here (and elsewhere). Once we have that, we should cite it here. + +> Note: this restriction is also **experimental** -- we may want to strengthen +> or weaken it based on experience, especially with C++ interop and a more +> complete memory safety story. + +Even with these restrictions, we expect to make values in Carbon useful in +roughly the same places as `const &`s in C++, but with added efficiency in the +case where the values can usefully be kept in machine registers. We also +specifically encourage a mental model of a `const &` with extra efficiency. + +The actual _representation_ of a value when bound, especially across function +boundaries, is [customizable](#value-representation-and-customization) by the +type. The defaults are based around preserving the baseline efficiency of C++'s +`const &`, but potentially reading the value when that would be both correct and +reliably more efficient, such as into a machine register. + +### Comparison to C++ parameters + +While these are called "values" in Carbon, they are not related to "by-value" +parameters as they exist in C++. The semantics of C++'s by-value parameters are +defined to create a new local copy of the argument, although it may move into +this copy. + +Carbon's values are much closer to a `const &` in C++ with extra restrictions +such as allowing copies under "as-if" and preventing taking the address. +Combined, these restrictions allow implementation strategies such as in-register +parameters. + +### Polymorphic types + +Value expressions and value bindings can be used with +[polymorphic types](/docs/design/classes.md#inheritance), for example: + +```carbon +base class MyBase { ... } + +fn UseBase(b: MyBase) { ... } + +class Derived { + extend base: MyBase; + ... +} + +fn PassDerived() { + var d: Derived = ...; + // Allowed to pass `d` here: + UseBase(d); +} +``` + +This is still allowed to create a copy or to move, but it must not _slice_. Even +if a copy is created, it must be a `Derived` object, even though this may limit +the available implementation strategies. + +> **Future work:** The interaction between a +> [custom value representation](#value-representation-and-customization) and a +> value expression used with a polymorphic type needs to be fully captured. +> Either it needs to restrict to a `const Self*` style representation (to +> prevent slicing) or it needs to have a model for the semantics when a +> different value representation is used. + +### Interop with C++ `const &` and `const` methods + +While value expressions cannot have their address taken in Carbon, they should +be interoperable with C++ `const &`s and C++ `const`-qualified methods. This +will in-effect "pin" some object (potentially a copy or temporary) into memory +and allow C++ to take its address. Without supporting this, values would likely +create an untenable interop ergonomic barrier. However, this does create some +additional constraints on value expressions and a way that their addresses can +escape unexpectedly. + +Despite interop requiring an address to implement, C++ allows `const &` +parameters to bind to temporary objects where that address doesn't have much +meaning and might not be valid once the called function returns. As a +consequence, we don't expect C++ interfaces using a `const &` to misbehave in +practice. + +> **Future work:** when a type customizes its +> [value representation](#value-representation-and-customization), as currently +> specified this will break the use of `const &` C++ APIs with such a value. We +> should extend the rules around value representation customization to require +> that either the representation type can be converted to (a copy) of the +> customized type, or implements an interop-specific interface to compute a +> `const` pointer to the original object used to form the representation object. +> This will allow custom representations to either create copies for interop or +> retain a pointer to the original object and expose that for interop as +> desired. + +Another risk is exposing Carbon's value expressions to `const &` parameters in +this way, as C++ allows casting away `const`. However, in the absence of +`mutable` members, casting away `const` does not make it safe to _mutate_ +through a `const &` parameter (or a `const`-qualified method). C++ allows +`const &` parameters and `const` member functions to access objects that are +_declared_ `const`. These objects cannot be mutated, even if `const` is removed, +exactly the same as Carbon value expressions. In fact, these kinds of mutations +[break in real implementations](https://cpp.compiler-explorer.com/z/KMhTondaK). +The result is that Carbon's value expressions will work similarly to +`const`-declared objects in C++, and will interop with C++ code similarly well. + +### Escape hatches for value addresses in Carbon + +**Open question:** It may be necessary to provide some amount of escape hatch +for taking the address of values. The +[C++ interop](#interop-with-c-const--and-const-methods) above already takes +their address functionally. Currently, this is the extent of an escape hatch to +the restrictions on values. + +If a further escape hatch is needed, this kind of fundamental weakening of the +semantic model would be a good case for some syntactic marker like Rust's +`unsafe`, although rather than a region, it would seem better to tie it directly +to the operation in question. For example: + +```carbon +class S { + fn ValueMemberFunction[self: Self](); + fn AddrMemberFunction[addr self: const Self*](); +} + +fn F(s_value: S) { + // This is fine. + s_value.ValueMemberFunction(); + + // This requires an unsafe marker in the syntax. + s_value.unsafe AddrMemberFunction(); +} +``` + +The specific tradeoff here is covered in a proposal +[alternative](/proposals/p2006.md#value-expression-escape-hatches). + +## Initializing expressions + +Storage in Carbon is initialized using _initializing expressions_. Their +evaluation produces an initialized object in the storage, although that object +may still be _unformed_. + +**Future work:** More details on initialization and unformed objects should be +added to the design from the proposal +[#257](https://github.com/carbon-language/carbon-lang/pull/257), see +[#1993](https://github.com/carbon-language/carbon-lang/issues/1993). When added, +it should be linked from here for the details on the initialization semantics +specifically. + +The simplest form of initializing expressions are value or durable reference +expressions that are converted into an initializing expression. Value +expressions are written directly into the storage to form a new object. +Reference expressions have the object they refer to copied into a new object in +the provided storage. + +**Future work:** The design should be expanded to fully cover how copying is +managed and linked to from here. + +The first place where an initializing expression is _required_ is to satisfy +[_variable patterns_](#binding-patterns-and-local-variables-with-let-and-var). +These require the expression they match to be an initializing expression for the +storage they create. The simplest example is the expression after the `=` in a +local `var` declaration. + +The next place where a Carbon expression requires an initializing expression is +the expression operand to `return` statements. We expand more completely on how +return statements interact with expressions, values, objects, and storage +[below](#function-calls-and-returns). + +The last path that requires forming an initializing expression in Carbon is when +attempting to convert a non-reference expression into an ephemeral reference +expression: the expression is first converted to an initializing expression if +necessary, and then temporary storage is materialized to act as its output, and +as the referent of the resulting ephemeral reference expression. + +### Function calls and returns + +Function calls in Carbon are modeled directly as initializing expressions -- +they require storage as an input and when evaluated cause that storage to be +initialized with an object. This means that when a function call is used to +initialize some variable pattern as here: + +```carbon +fn CreateMyObject() -> MyType { + return ; +} + +var x: MyType = CreateMyObject(); +``` + +The `` in the `return` statement actually initializes the +storage provided for `x`. There is no "copy" or other step. + +All `return` statement expressions are required to be initializing expressions +and in fact initialize the storage provided to the function's call expression. +This in turn causes the property to hold _transitively_ across an arbitrary +number of function calls and returns. The storage is forwarded at each stage and +initialized exactly once. + +Note that functions without a specified return type work exactly the same as +functions with a `()` return type for the purpose of expression categories. + +#### Deferred initialization from values and references + +Carbon also makes the evaluation of function calls and return statements tightly +linked in order to enable more efficiency improvements. It allows the actual +initialization performed by the `return` statement with its expression to be +deferred from within the body of the function to the caller initializer +expression if it can simply propagate a value or reference expression to the +caller that is guaranteed to be alive and available to the caller. + +Consider the following code: + +```carbon +fn SelectSecond(first: Point, second: Point, third: Point) -> Point { + return second; +} + +fn UsePoint(p: Point); + +fn F(p1: Point, p2: Point) { + UsePoint(SelectSecond(p2, p1, p2)); +} +``` + +The call to `SelectSecond` must provide storage for a `Point` that can be +initialized. However, Carbon allows an implementation of the actual +`SelectSecond` function to not initialize this storage when it reaches +`return second`. The expression `second` is a name bound to the call's argument +value expression, and that value expression is necessarily valid in the caller. +Carbon in this case allows the implementation to merely communicate that the +returned expression is a name bound to a specific value expression argument to +the call, and the caller _if necessary_ should initialize the temporary storage. +This in turn allows the caller `F` to recognize that the value expression +argument (`p1`) is already valid to pass as the argument to `UsePoint` without +initializing the temporary storage from it and reading it back out of that +storage. + +None of this impacts the type system and so an implementation can freely select +specific strategies here based on concrete types without harming generic code. +There is always a generic fallback as well if monomorphization isn't desired. + +This freedom mirrors that of [input values](#value-expressions) where might be +implemented as either a reference or a copy without breaking genericity. Here +too, many small types will not need to be lazy and simply eagerly initialize the +temporary which is implemented as an actual machine register. But for large +types or ones with associated allocated storage, this can reliably avoid +extraneous memory allocations and other costs. + +Note that this flexibility doesn't avoid the call expression materializing +temporary storage and providing it to the function. Whether the function needs +this storage is an implementation detail. It simply allows deferring an +important case of initializing that storage from a value or reference expression +already available in the caller to the caller so that it can identify cases +where that initialization is not necessary. + +**References:** This addresses an issue-for-leads about +[reducing the potential copies incurred by returns](https://github.com/carbon-language/carbon-lang/issues/828). + +#### Declared `returned` variable + +The model of initialization of returns also facilitates the use of +[`returned var` declarations](control_flow/return.md#returned-var). These +directly observe the storage provided for initialization of a function's return. + +## Pointers + +Pointers in Carbon are the primary mechanism for _indirect access_ to storage +containing some value. Dereferencing a pointer is one of the primary ways to +form a [_durable reference expression_](#durable-reference-expressions). + +Carbon pointers are heavily restricted compared to C++ pointers -- they cannot +be null and they cannot be indexed or have pointer arithmetic performed on them. +In some ways, this makes them more similar to references in C++, but they retain +the essential aspect of a pointer that they syntactically distinguish between +the point*er* and the point*ee*. + +Carbon will still have mechanisms to achieve the equivalent behaviors as C++ +pointers. Optional pointers are expected to serve nullable use cases. Slice or +view style types are expected to provide access to indexable regions. And even +raw pointer arithmetic is expected to be provided at some point, but through +specialized constructs given the specialized nature of these operations. + +**Future work:** Add explicit designs for these use cases and link to them here. + +### Reference types + +Unlike C++, Carbon does not currently have reference types. The only form of +indirect access are pointers. There are a few aspects to this decision that need +to be separated carefully from each other as the motivations and considerations +are different. + +First, Carbon has only a single fundamental construct for indirection because +this gives it a single point that needs extension and configuration if and when +we want to add more powerful controls to the indirect type system such as +lifetime annotations or other safety or optimization mechanisms. The designs +attempts to identify a single, core indirection tool and then layer other +related use cases on top. This is motivated by keeping the language scalable as +it evolves and reducing the huge explosion of complexity that C++ sees due to +having a large space here. For example, when there are N > 1 ways to express +indirection equivalently and APIs want to accept any one of them across M +different parameters they can end up with N \* M combinations. + +Second, with pointers, Carbon's indirection mechanism retains the ability to +refer distinctly to the point*er* and the point*ee* when needed. This ends up +critical for supporting rebinding and so without this property more permutations +of indirection would likely emerge. + +Third, Carbon doesn't provide a straightforward way to avoid the syntactic +distinction between indirect access and direct access. + +For a full discussion of the tradeoffs of these design decisions, see the +alternatives considered section of [P2006]: + +- [References in addition to pointers](/proposals/p2006.md#references-in-addition-to-pointers) +- [Syntax-free or automatic dereferencing](/proposals/p2006.md#syntax-free-or-automatic-dereferencing) +- [Exclusively using references](/proposals/p2006.md#exclusively-using-references) + +### Pointer syntax + +The type of a pointer to a type `T` is written with a postfix `*` as in `T*`. +Dereferencing a pointer is a [_reference expression_] and is written with a +prefix `*` as in `*p`: + +```carbon +var i: i32 = 42; +var p: i32* = &i; + +// Form a reference expression `*p` and assign `13` to the referenced storage. +*p = 13; +``` + +This syntax is chosen specifically to remain as similar as possible to C++ +pointer types as they are commonly written in code and are expected to be +extremely common and a key anchor of syntactic similarity between the languages. +The different alternatives and tradeoffs for this syntax issue were discussed +extensively in [#523] and are summarized in the +[proposal](/proposals/p2006.md#alternative-pointer-syntaxes). + +[#523]: https://github.com/carbon-language/carbon-lang/issues/523 + +Carbon also supports an infix `->` operation, much like C++. However, Carbon +directly defines this as an exact rewrite to `*` and `.` so that `p->member` +becomes `(*p).member` for example. This means there is no overloaded or +customizable `->` operator in Carbon the way there is in C++. Instead, +customizing the behavior of `*p` in turn customizes the behavior of `p->`. + +**Future work:** As [#523] discusses, one of the primary challenges of the C++ +syntax is the composition of a prefix dereference operation and other postfix or +infix operations, especially when chained together such as a classic C++ +frustrations of mixes of dereference and indexing: `(*(*p)[42])[13]`. Where +these compositions are sufficiently common to create ergonomic problems, the +plan is to introduce custom syntax analogous to `->` that rewrites down to the +grouped dereference. However, nothing beyond `->` itself is currently provided. +Extending this, including the exact design and scope of extension desired, is a +future work area. + +### Dereferencing customization + +Carbon should support user-defined pointer-like types such as _smart pointers_ +using a similar pattern as operator overloading or other expression syntax. That +is, it should rewrite the expression into a member function call on an +interface. Types can then implement this interface to expose pointer-like +_user-defined dereference_ syntax. + +The interface might look like: + +```carbon +interface Pointer { + let ValueT:! Type; + fn Dereference[self: Self]() -> ValueT*; +} +``` + +Here is an example using a hypothetical `TaggedPtr` that carries some extra +integer tag next to the pointer it emulates: + +```carbon +class TaggedPtr(T:! Type) { + var tag: Int32; + var ptr: T*; +} +external impl [T:! Type] TaggedPtr(T) as Pointer { + let ValueT:! T; + fn Dereference[self: Self]() -> T* { return self.ptr; } +} + +fn Test(arg: TaggedPtr(T), dest: TaggedPtr(TaggedPtr(T))) { + **dest = *arg; + *dest = arg; +} +``` + +There is one tricky aspect of this. The function in the interface which +implements a pointer-like dereference must return a raw pointer which the +language then actually dereferences to form a reference expression similar to +that formed by `var` declarations. This interface is implemented for normal +pointers as a no-op: + +```carbon +impl [T:! Type] T* as Pointer { + let ValueT:! Type = T; + fn Dereference[self: Self]() -> T* { return self; } +} +``` + +Dereference expressions such as `*x` are syntactically rewritten to use this +interface to get a raw pointer and then that raw pointer is dereferenced. If we +imagine this language level dereference to form a reference expression as a +unary `deref` operator, then `(*x)` becomes +`(deref (x.(Pointer.Dereference)()))`. + +Carbon will also use a simple syntactic rewrite for implementing `x->Method()` +as `(*x).Method()` without separate or different customization. + +## `const`-qualified types + +Carbon provides the ability to qualify a type `T` with the keyword `const` to +get a `const`-qualified type: `const T`. This is exclusively an API-subsetting +feature in Carbon -- for more fundamentally "immutable" use cases, value +expressions and bindings should be used instead. Pointers to `const`-qualified +types in Carbon provide access to an object with an API subset that can help +model important requirements like ensuring usage is exclusively by way of a +_thread-safe_ interface subset of an otherwise _thread-compatible_ type. + +Note that `const T` is a type qualification and is generally orthogonal to +expression categories or what form of pattern is used, including for object +parameters. Notionally, it can occur both with `addr` and value object +parameters. However, on value patterns, it is redundant as there is no +meaningful distinction between a value expression of type `T` and type +`const T`. For example, given a type and methods: + +```carbon +class X { + fn Method[self: Self](); + fn ConstMethod[self: const Self](); + fn AddrMethod[addr self: Self*](); + fn AddrConstMethod[addr self: const Self*](); +} +``` + +The methods can be called on different kinds of expressions according to the +following table: + +| Expression category: | `let x: X`
(value) | `let x: const X`
(const value) | `var x: X`
(reference) | `var x: const X`
(const reference) | +| --------------------: | ------------------------ | ------------------------------------ | ---------------------------- | ---------------------------------------- | +| `x.Method();` | ✅ | ✅ | ✅ | ✅ | +| `x.ConstMethod();` | ✅ | ✅ | ✅ | ✅ | +| `x.AddrMethod();` | ❌ | ❌ | ✅ | ❌ | +| `x.AddrConstMethod()` | ❌ | ❌ | ✅ | ✅ | + +The `const T` type has the same representation as `T` with the same field names, +but all of its field types are also `const`-qualified. Other than fields, all +other members `T` are also members of `const T`, and impl lookup ignores the +`const` qualification. There is an implicit conversion from `T` to `const T`, +but not the reverse. Conversion of reference expressions to value expressions is +defined in terms of `const T` reference expressions to `T` value expressions. + +It is expected that `const T` will largely occur as part of a +[pointer](#pointers), as the express purpose is to form reference expressions. +The precedence rules are even designed for this common case, `const T*` means +`(const T)*`, or a pointer-to-const. Carbon will support conversions between +pointers to `const`-qualified types that follow the same rules as used in C++ to +avoid inadvertent loss of `const`-qualification. + +The syntax details of `const` are also covered in the +[type operators](/docs/design/expressions/type_operators.md) documentation. + +## Lifetime overloading + +One potential use case that is not obviously or fully addressed by these designs +in Carbon is overloading function calls by observing the lifetime of arguments. +The use case here would be selecting different implementation strategies for the +same function or operation based on whether an argument lifetime happens to be +ending and viable to move-from. + +Carbon currently intentionally leaves this use case unaddressed. There is a +fundamental scaling problem in this style of overloading: it creates a +combinatorial explosion of possible overloads similar to other permutations of +indirection models. Consider a function with N parameters that would benefit +from lifetime overloading. If each parameter benefits _independently_ from the +others, as is commonly the case, we would need 2N overloads to +express all the possibilities. + +Carbon will initially see if code can be designed without this facility. Some of +the tools needed to avoid it are suggested above such as the +[consuming](#consuming-function-parameters) input pattern. But it is possible +that more will be needed in practice. It would be good to identify the specific +and realistic Carbon code patterns that cannot be expressed with the tools in +this proposal in order to motivate a minimal extension. Some candidates based on +functionality already proposed here or for [classes](/docs/design/classes.md): + +- Allow overloading between `addr me` and `me` in methods. This is among the + most appealing as it _doesn't_ have the combinatorial explosion. But it is + also very limited as it only applies to the implicit object parameter. +- Allow overloading between `var` and non-`var` parameters. +- Expand the `addr` technique from object parameters to all parameters, and + allow overloading based on it. + +Perhaps more options will emerge as well. Again, the goal isn't to completely +preclude pursuing this direction, but instead to try to ensure it is only +pursued based on a real and concrete need, and the minimal extension is adopted. + +## Value representation and customization + +The representation of a value expression is especially important because it +forms the calling convention used for the vast majority of function parameters +-- function inputs. Given this importance, it's important that it is predictable +and customizable by the value's type. Similarly, while Carbon code must be +correct with either a copy or a reference-based implementation, we want which +implementation strategy is used to be a predictable and customizable property of +the type of a value. + +A type can optionally control its value representation using a custom syntax +similar to customizing its [destructor](/docs/design/classes.md#destructors). +This syntax sets the representation to some type uses a keyword `value_rep` and +can appear where a member declaration would be valid within the type: + +```carbon +class SomeType { + value_rep = RepresentationType; +} +``` + +**Open question:** The syntax for this is just placeholder, using a placeholder +keyword. It isn't final at all and likely will need to change to read well. + +The provided representation type must be one of the following: + +- `const Self` -- this forces the use of a _copy_ of the object. +- `const Self *` -- this forces the use of a [_pointer_](#pointers) to the + original object. +- A custom type that is not `Self`, `const Self`, or a pointer to either. + +If the representation is `const Self` or `const Self *`, then the type fields +will be accessible as [_value expressions_](#value-expressions) using the normal +member access syntax for value expressions of a type. These will be implemented +by either accessing a copy of the object in the non-pointer case or a pointer to +the original object in the pointer case. A representation of `const Self` +requires copying to be valid for the type. This provides the builtin +functionality but allows explicitly controlling which representation should be +used. + +If no customization is provided, the implementation will select one based on a +set of heuristics. Some examples: + +- Non-copyable types and polymorphic types would use a `const Self*`. +- Small objects that are trivially copied in a machine register would use + `const Self`. + +When a custom type is provided, it must not be `Self`, `const Self`, or a +pointer to either. The type provided will be used on function call boundaries +and as the implementation representation for `let` bindings and other value +expressions referencing an object of the type. A specifier of `value_rep = T;` +will require that the type containing that specifier satisfies the constraint +`impls ReferenceImplicitAs where .T = T` using the following interface: + +```carbon +interface ReferenceImplicitAs { + let T:! type; + fn Convert[addr self: const Self*]() -> T; +} +``` + +Converting a reference expression into a value expression for such a type calls +this customization point to form a representation object from the original +reference expression. + +When using a custom representation type in this way, no fields are accessible +through a value expression. Instead, only methods can be called using member +access, as they simply bind the value expression to the `self` parameter. +However, one important method can be called -- `.(ImplicitAs(T).Convert)()`. +This implicitly converting a value expression for the type into its custom +representation type. The customization of the representation above and +`impls ReferenceImplicitAs where .T = T` causes the class to have a builtin +`impl as ImplicitAs(T)` which converts to the representation type as a no-op, +exposing the object created by calling `ReferenceImplicitAs.Convert` on the +original reference expression, and preserved as a representation of the value +expression. + +Here is a more complete example of code using these features: + +```carbon +class StringView { + private var data_ptr: Char*; + private var size: i64; + + fn Create(data_ptr: Char*, size: i64) -> StringView { + return {.data_ptr = data_ptr, .size = size}; + } + + // A typical readonly view of a string API... + fn ExampleMethod[self: Self]() { ... } +} + +class String { + // Customize the value representation to be `StringView`. + value_rep = StringView; + + private var data_ptr: Char*; + private var size: i64; + + private var capacity: i64; + + impl as ReferenceImplicitAs where .T = StringView { + fn Op[addr self: const Self*]() -> StringView { + // Because this is called on the String object prior to it becoming + // a value, we can access an SSO buffer or other interior pointers + // of `self`. + return StringView::Create(self->data_ptr, self->size); + } + } + + // We can directly declare methods that take `self` as a `StringView` which + // will cause the caller to implicitly convert value expressions to + // `StringView` prior to calling. + fn ExampleMethod[self: StringView]() { self.ExampleMethod(); } + + // Or we can use a value binding for `self` much like normal, but the + // implementation will be constrained because of the custom value rep. + fn ExampleMethod2[self: String]() { + // Error due to custom value rep: + self.data_ptr; + + // Fine, this uses the builtin `ImplicitAs(StringView)`. + (self as StringView).ExampleMethod(); + } + + // Note that even though the `Self` type is `const` qualified here, this + // cannot be called on a `String` value! That would require us to convert to a + // `StringView` that does not track the extra data member. + fn Capacity[addr self: const Self*]() -> i64 { + return self->capacity; + } +} +``` + +It is important to note that the _representation_ type of a value expression is +just its representation and does not impact the name lookup or type. Name lookup +and `impl` search occur for the same type regardless of the expression category. +But once a particular method or function is selected, an implicit conversion can +occur from the original type to the representation type as part of the parameter +or receiver type. In fact, this conversion is the _only_ operation that can +occur for a value whose type has a customized value representation. + +The example above also demonstrates the fundamental tradeoff made by customizing +the value representation of a type in this way. While it provides a great deal +of control, it may result in some surprising limitations. Above, a method that +is classically available on a C++ `const std::string&` like querying the +capacity cannot be implemented with the customized value representation because +it loses access to this additional state. Carbon allows type authors to make an +explicit choice about whether they want to work with a restricted API and +leverage a custom value representation or not. + +**Open question:** Beyond the specific syntax used where we currently have a +placeholder `value_rep = T;`, we need to explore exactly what the best +relationship is with the customization point. For example, should this syntax +immediately forward declare `impl as ReferenceImplicitAs where .T = T`, thereby +allowing an out-of-line definition of the `Convert` method and `... where _` to +pick up the associated type from the syntax. Alternatively, the syntactic marker +might be integrated into the `impl` declaration for `ReferenceImplicitAs` +itself. + +## Alternatives considered + +- [No `var` introducer keyword](/proposals/p0339.md#no-var-introducer-keyword) +- [Name of the `var` statement introducer](/proposals/p0339.md#name-of-the-var-statement-introducer) +- [Colon between type and identifier](/proposals/p0339.md#colon-between-type-and-identifier) +- [Type elision](/proposals/p0339.md#type-elision) +- [Type ordering](/proposals/p0618.md#type-ordering) +- [Elide the type instead of using `auto`](/proposals/p0851.md#elide-the-type-instead-of-using-auto) +- [Value expression escape hatches](/proposals/p2006.md#value-expression-escape-hatches) +- [References in addition to pointers](/proposals/p2006.md#references-in-addition-to-pointers) +- [Syntax-free or automatic dereferencing](/proposals/p2006.md#syntax-free-or-automatic-dereferencing) +- [Exclusively using references](/proposals/p2006.md#exclusively-using-references) +- [Alternative pointer syntaxes](/proposals/p2006.md#alternative-pointer-syntaxes) +- [Alternative syntaxes for locals](/proposals/p2006.md#alternative-syntaxes-for-locals) + +## References + +- [Proposal #257: Initialization of memory and values][p0257] +- [Proposal #339: `var` statement][p0339] +- [Proposal #618: `var` ordering][p0618] +- [Proposal #851: auto keyword for vars][p0851] +- [Proposal #2006: Values, variables, and pointers][p2006] + +[p0257]: /proposals/p0257.md +[p0339]: /proposals/p0339.md +[p0618]: /proposals/p0618.md +[p0851]: /proposals/p0851.md +[p2006]: /proposals/p2006.md diff --git a/docs/design/variables.md b/docs/design/variables.md deleted file mode 100644 index 109f522a3d0bd..0000000000000 --- a/docs/design/variables.md +++ /dev/null @@ -1,76 +0,0 @@ -# Variables - - - - - -## Table of contents - -- [Overview](#overview) -- [Notes](#notes) - - [Global variables](#global-variables) -- [Alternatives considered](#alternatives-considered) -- [References](#references) - - - -## Overview - -Carbon's local variable syntax is: - -- `var` _identifier_`:` _< expression |_ `auto` _> [_ `=` _value ]_`;` - -Blocks introduce nested scopes and can contain local variable declarations that -work similarly to function parameters. - -For example: - -``` -fn Foo() { - var x: i32 = 42; -} -``` - -This introduces a local variable named `x` into the block's scope. It has the -type `Int` and is initialized with the value `42`. These variable declarations -(and function declarations) have a lot more power than what we're covering just -yet, but this gives you the basic idea. - -If `auto` is used in place of the type, [type inference](type_inference.md) is -used to automatically determine the variable's type. - -While there can be global constants, there are no global variables. - -## Notes - -> TODO: Constant syntax is an ongoing discussion. - -### Global variables - -We are exploring several different ideas for how to design less bug-prone -patterns to replace the important use cases programmers still have for global -variables. We may be unable to fully address them, at least for migrated code, -and be forced to add some limited form of global variables back. We may also -discover that their convenience outweighs any improvements afforded. - -## Alternatives considered - -- [No `var` introducer keyword](/proposals/p0339.md#no-var-introducer-keyword) -- [Name of the `var` statement introducer](/proposals/p0339.md#name-of-the-var-statement-introducer) -- [Colon between type and identifier](/proposals/p0339.md#colon-between-type-and-identifier) -- [Type elision](/proposals/p0339.md#type-elision) -- [Type ordering](/proposals/p0618.md#type-ordering) -- [Elide the type instead of using `auto`](/proposals/p0851.md#elide-the-type-instead-of-using-auto) - -## References - -- Proposal - [#339: `var` statement](https://github.com/carbon-language/carbon-lang/pull/339) -- Proposal - [#618: `var` ordering](https://github.com/carbon-language/carbon-lang/pull/618) -- Proposal - [#851: auto keyword for vars](https://github.com/carbon-language/carbon-lang/pull/851) diff --git a/proposals/p2006.md b/proposals/p2006.md new file mode 100644 index 0000000000000..728b3f2f7634d --- /dev/null +++ b/proposals/p2006.md @@ -0,0 +1,1020 @@ +# Values, variables, pointers, and references + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/2006) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) + - [Conceptual integrity between local variables and parameters](#conceptual-integrity-between-local-variables-and-parameters) +- [Background](#background) +- [Proposal](#proposal) + - [Values, objects, and expressions](#values-objects-and-expressions) + - [Patterns, `var`, `let`, and local variables](#patterns-var-let-and-local-variables) + - [Pointers, dereferencing, and references](#pointers-dereferencing-and-references) + - [Indexing](#indexing) + - [`const`-qualified types](#const-qualified-types) + - [Interop with `const &` and `const` methods](#interop-with-const--and-const-methods) + - [Customization](#customization) +- [Rationale](#rationale) + - [Contrasting the rationale for pointers with `goto`](#contrasting-the-rationale-for-pointers-with-goto) +- [Alternatives considered](#alternatives-considered) + - [Value expression escape hatches](#value-expression-escape-hatches) + - [References in addition to pointers](#references-in-addition-to-pointers) + - [Syntax-free or automatic dereferencing](#syntax-free-or-automatic-dereferencing) + - [Syntax-free address-of](#syntax-free-address-of) + - [Exclusively using references](#exclusively-using-references) + - [Alternative pointer syntaxes](#alternative-pointer-syntaxes) + - [Pointer type alternative syntaxes](#pointer-type-alternative-syntaxes) + - [Pointer dereference syntax alternatives](#pointer-dereference-syntax-alternatives) + - [Pointer syntax conclusion](#pointer-syntax-conclusion) + - [Alternative syntaxes for locals](#alternative-syntaxes-for-locals) + - [Make `const` a postfix rather than prefix operator](#make-const-a-postfix-rather-than-prefix-operator) +- [Appendix: background, context, and use cases from C++](#appendix-background-context-and-use-cases-from-c) + - [`const` references versus `const` itself](#const-references-versus-const-itself) + - [Pointers](#pointers) + - [References](#references) + - [Special but critical case of `const T&`](#special-but-critical-case-of-const-t) + - [R-value references and forwarding references](#r-value-references-and-forwarding-references) + - [Mutable operands to user-defined operators](#mutable-operands-to-user-defined-operators) + - [User-defined dereference and indexed access syntax](#user-defined-dereference-and-indexed-access-syntax) + - [Member and subobject accessors](#member-and-subobject-accessors) + - [Non-null pointers](#non-null-pointers) + - [Syntax-free dereference](#syntax-free-dereference) + + + +## Abstract + +Introduce a concrete design for how Carbon values, objects, storage, variables, +and pointers will work. This includes fleshing out the design for: + +- The expression categories used in Carbon to represent values and objects, + how they interact, and terminology that anchors on their expression nature. +- An expression category model for readonly, abstract values that can + efficiently support input to functions. +- A customization system for value expression representations, especially as + seen on function boundaries in the calling convention. +- An expression category model for references instead of a type system model. +- How patterns match different expression categories. +- How initialization works in conjunction with function returns. +- Specific pointer syntax, semantics, and library customization mechanisms. +- A `const` type qualifier for use when the value expression category system + is too abstracted from the underlying objects in storage. + +## Problem + +Carbon needs a design for how values, variables, objects, pointers, and +references work within the language. These designs are heavily interdependent +and so they are presented together here. The design also needs to provide a +compelling solution for a wide range of use cases in the language: + +- Easy to use, and use correctly, function input and in/out parameters. +- Clear separation of read-only use cases like function inputs from mutable + use cases. +- Local names bound both to fixed values and mutable variables. +- Indirect (and mutable) access to objects (by way of pointers or references). +- Extensible models for dereferencing and indexing. +- A delineated method subset which forms the API for read-only or input + objects. +- A delineated method subset which forms the APIs for indirect access to + shared objects in thread-compatible ways. +- Interoperation between function input parameters and C++ `const &` + parameters. +- Complex type designs which expose both interior pointers and exterior + pointers transparently for extended dereferencing or indexing. + +### Conceptual integrity between local variables and parameters + +An additional challenge that this design attempts to address is retaining the +conceptual integrity between local variables and parameters. Two of the most +fundamental refactorings in software engineering are _inlining_ and _outlining_ +of regions of code. These operations introduce or collapse one of the most basic +abstraction boundaries in the language: functions. These refactorings translate +between local variables and parameters in both directions. In order to ensure +these translations are unsurprising and don't face significant expressive gaps +or behavioral differences, it is important to have strong semantic consistency +between local variables and function parameters. While there are some places +that these need to differ, there should be a strong overlap of the core +facilities, design, and behavior. + +## Background + +Much of this is informed by the experience of working with increasingly complex +"value categories" (actually categorizing expressions) and parameter passing in +C++ and how the language arrived there. Some background references on this area +of C++ and the problems encountered: + +- https://en.cppreference.com/w/cpp/language/value_category +- http://wg21.link/basic.lval +- https://www.scs.stanford.edu/~dm/blog/decltype.html +- https://medium.com/@barryrevzin/value-categories-in-c-17-f56ae54bccbe +- http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-conventional +- https://www.youtube.com/watch?v=Tz5drzXREW0 + +I've also written up a detailed walk-through of the different use cases and +considerations that touch on the space of values, references, function inputs, +and more across C++ in an +[appendix](#appendix-background-context-and-use-cases-from-c). + +Leads questions which informed the design proposed here: + +- [What syntax should we use for pointer types? (#523)][#523] + +It also builds on the design of the proposal ["Initialization of memory and +variables"][p0257] ([#257]), implementing part of [#1993]. + +[#257]: https://github.com/carbon-language/carbon-lang/pull/257 +[#523]: https://github.com/carbon-language/carbon-lang/issues/523 +[#1993]: https://github.com/carbon-language/carbon-lang/issues/1993 +[p0257]: /proposals/p0257.md + +## Proposal + +This section provides a condensed overview of the proposal. The details are +covered in the updated content in the design, and each section links into the +relevant content there. While this overview both duplicates and summarizes +content, it isn't intending anything different from the updates to the design +content, and the design content should be considered authoritative as it will +also continue to be maintained going forward. + +### Values, objects, and expressions + +Carbon has both abstract _values_ and concrete _objects_. Carbon _values_ are +things like `42`, `true`, and `i32` (a type value). Carbon _objects_ have +_storage_ where values can be read and written. Storage also allows taking the +address of an object in memory in Carbon. + +Both objects and values can be nested within each other. For example +`(true, true)` is both a value and also contains two sub-values. When a +two-tuple is stored somewhere, it is both a tuple-typed object and contains two +subobjects. + +> Details: +> [Values, objects, and expressions](/docs/design/values.md#values-objects-and-expressions) + +Expressions are categorized in a way that explains how they produce values or +refer to objects: + +- [_Value expressions_](/docs/design/values.md#value-expressions) produce + abstract, read-only _values_ that cannot be modified or have their address + taken. +- [_Reference expressions_](/docs/design/values.md#reference-expressions) + refer to _objects_ with _storage_ where a value may be read or written and + the object's address can be taken. + - [_Durable reference expressions_](/docs/design/values.md#durable-reference-expressions) + are reference expressions which cannot refer to _temporary_ storage, but + must refer to some storage that outlives the full expression. + - [_Ephemeral reference expressions_](/docs/design/values.md#ephemeral-reference-expressions) + are reference expressions which _can_ refer to temporary storage. +- [_Initializing expressions_](/docs/design/values.md#initializing-expressions) + which require storage to be provided implicitly when evaluating the + expression. The expression then initializes an object in that storage. These + are used to model function returns, which can construct the returned value + directly in the caller's storage. + +> Details: [Expression categories](/docs/design/values.md#expression-categories) + +### Patterns, `var`, `let`, and local variables + +Patterns are by default _value patterns_ and match _value expressions_, but can +be introduced with the `var` keyword to create a _variable pattern_ that has +_storage_ and matches _initializing expressions_. Names bound in value patterns +become value expressions, and names bound in a variable pattern become _durable +reference expressions_ referring to an object in that pattern's storage. + +Local patterns can be introduced with `let` to get the default behavior of a +readonly pattern, or they can be directly introduced with `var` to form a +variable pattern and declare mutable local variables. + +> Details: +> [Binding patterns and local variables with `let` and `var`](/docs/design/values.md#binding-patterns-and-local-variables-with-let-and-var) + +### Pointers, dereferencing, and references + +Pointers in Carbon are the primary mechanism for _indirect access_ to storage +containing some object. Dereferencing a pointer forms a +[_durable reference expression_](/docs/design/values.md#durable-reference-expressions) +to the object. + +Carbon pointers are heavily restricted compared to C++ pointers -- they cannot +be null and they cannot be indexed or have pointer arithmetic performed on them. +Carbon will have dedicated mechanisms that still provide this functionality, but +those are future work. + +> Details: [Pointers](/docs/design/values.md#pointers) + +The syntax for working with pointers is similar to C++: + +```carbon +var i: i32 = 42; +var p: i32* = &i; + +// Form a reference expression `*p` and assign `13` to the referenced storage. +*p = 13; +``` + +> Details: +> +> - [Pointer syntax](/docs/design/values.md#pointer-syntax) +> - [Pointer operators](/docs/design/expressions/pointer_operators.md) + +Carbon doesn't have reference types, just reference expressions. API designs in +C++ that use references (outside of a few common cases like `const &` function +parameters) will typically use pointers in Carbon. The goal is to simplify and +focus the type system on a primary model of indirect access to an object. + +> Details: [Reference types](/docs/design/values.md#reference-types) + +### Indexing + +Carbon supports indexing that both accesses directly contained storage like an +array and indirect storage like C++'s `std::span`. As a result, the exact +interfaces used for indexing reflect the expression category of the indexed +operand and the specific interface its type implements. This proposal just +updates and refines this design with the new terminology. + +> Details: [Indexing](/docs/design/expressions/indexing.md) + +### `const`-qualified types + +Carbon provides the ability to qualify a type `T` with the keyword `const` to +get a `const`-qualified type: `const T`. This is exclusively an API-subsetting +feature in Carbon -- for more fundamentally "immutable" use cases, value +expressions and bindings should be used instead. Pointers to `const`-qualified +types in Carbon provide a way to reference an object with an API subset that can +help model important requirements like ensuring usage is exclusively by way of a +_thread-safe_ interface subset of an otherwise _thread-compatible_ type. + +> Details: +> [`const`-qualified types](/docs/design/values.md#const-qualified-types) + +### Interop with `const &` and `const` methods + +Carbon makes value expressions interoperable with `const &` function parameters. +There are two dangers of this approach: + +- Those constructs in C++ _do_ allow the address to be taken, so Carbon will + synthesize an object and address for value expressions for the duration of + the C++ call as needed. C++ code already cannot rely on the address of + `const &` parameters continuing to exist once the call completes. +- C++ also allows casting away `const`. However, this doesn't allow a method + to safely _mutate_ a `const &` parameter, except its `mutable` members. + +In both cases, correct C++ code should already respect the limitations Carbon +needs for this interop. The same applies to the implicit `this` parameter of +`const` methods in C++. + +> Details: +> [Interop with C++ `const &` and `const` methods](/docs/design/values.md#interop-with-c-const--and-const-methods) + +### Customization + +Carbon will provide a way to customize the implementation representation of a +value expression by nominating some other type to act as its representation. +This will be indicated through an explicit syntactic marker in the class +definition, and will require the type to `impl` a customization interface +`ReferenceImplicitAs` to provide the needed functionality. The result will be to +form this custom type when forming a value expression from a reference +expression, and will restrict the operations on such a value expression to +implicitly converting to the nominated type. + +> Details: +> [Value representation and customization](/docs/design/values.md#value-representation-and-customization) + +Carbon will also allow customizing the behavior of a deference operation on a +type to allow building "smart pointers" or other pointer-like types in the +library. This will be done in a similar fashion to overloading other operators, +where the overloaded operation returns some other pointer that is then +dereferenced to form a reference expression. + +> Details: +> [Dereferencing customization](/docs/design/values.md#dereferencing-customization) + +## Rationale + +- Pointers are a fundamental components of all modern computer hardware -- + they are abstractly random-access machines -- and being able to directly + model and manipulate this is necessary for + [performance-critical software](/docs/project/goals.md#performance-critical-software). +- Simplifying the type system by avoiding both pointers and references is + expected to make + [code easier to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write). +- Creating space in both the syntax and type system to introduce ownership and + lifetime information is important to be able to address long term + [safety](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + needs. +- Pointers are expected to be deeply familiar to C++ programmers and easily + [interoperate with C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code). + +### Contrasting the rationale for pointers with `goto` + +There is an understandable concern about Carbon deeply incorporating pointers +into its design -- pointers in C and C++ have been the root of systematic +security and correctness bugs. There is an amazing wealth of "gotchas" tied to +pointers that it can seem unreasonable to build Carbon on top of these +foundations. By analogy, the `goto` control-flow construct is similarly +considered _deeply_ problematic and there is a wealth of literature in computer +science and programming language design on how and why to build languages on top +of structured control flow instead. It would be very concerning to build `goto` +into the fundamental design of Carbon as an integral and pervasive component of +its control flow. However, there are two important distinctions between pointers +and `goto`. + +First, C and C++ pointers are _very_ different from Carbon pointers, and their +rightly earned reputation as a source of bugs and confusion stems precisely from +these differences: + +- C pointers can be null, the ["billion dollar mistake"][null-mistake], + Carbon's cannot be. Carbon requires null-ness to be explicitly modeled in + the type system and dealt with prior to dereference. +- C pointers can be indexed even when not pointing to an array. Carbon's + pointers are to singular objects. Carbon handles indexing with a separate + construct and types to distinguish arrays where indexing is semantically + meaningful from the cases where it is not. +- C pointers support deeply surprising indexing patterns such as `3[pointer]`. + Carbon restricts to unsurprising indexing. +- C pointers don't enforce lifetime or any other memory safety properties. + While Carbon does not yet support these deeper protections, tackling memory + safety is an explicit plan of Carbon's and may yet result in changes to the + pointer type to reflect the safety risks posed by them. + +[null-mistake]: + https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/ + +While these are described in terms of C's pointers, C++ inherits these without +meaningful improvement from C. The core point here is that while Carbon is +building pointers into its foundations, it is building in the least problematic +and most useful aspect of pointers and leaving almost all of the legacy and risk +behind. That is a critical part of what makes pointers viable in Carbon. + +The second important contrast with `goto` is the lack of a comprehensive +alternative to pointers. Structured control flow has been thoroughly studied and +shown to address the practical needs of expressing control flow. As a +consequence, we have _good_ tools to replace `goto` within languages, and we +should use them. In contrast, Carbon programs are still expected to map onto +[von Neumann architecture](https://en.wikipedia.org/wiki/Von_Neumann_architecture) +machines and need to model the fundamental construct of a pointer to data. We +have no comprehensive alternative to solve all of the practical needs +surrounding indirect access to data on such an architecture. + +So despite including pointers as one of the building blocks of Carbon, we don't +need for `goto` to be the surfaced and visible building block of Carbon's +control flow. Even if we decide to support some limited forms of `goto` to +handle edge cases where structured constructs end up suboptimal, we can still +build the foundations in a structured way without impairing the use of Carbon to +work with the low-level hardware effectively. + +## Alternatives considered + +### Value expression escape hatches + +We could provide escape hatches for value expressions that (unsafely) take the +address or even perform mutations through a value expression. This would more +easily match patterns like `const_cast` in C++. However, there seem to be +effective ways of rewriting the code to avoid this need so this proposal +suggests not adding these escape hatches now. We will instead provide a more +limited escape exclusively for +[interop](#interop-with-const--and-const-methods). We can add more later if +experience proves this is an important pattern to support without the +contortions of manually creating a local copy (or changing to pointers). + +### References in addition to pointers + +The primary and most obvious alternative to the design proposed here is the one +used by C++: have _references_ in addition to pointers in the type system. This +initially allows zero-syntax modeling of L-values, which can in turn address +many use cases here much as they do in C++. Similarly, adding different kinds of +references can allow modeling more complex situations such as different lifetime +semantics. + +However, this approach has two fundamental downsides. First, it would add +overall complexity to the language as references don't form a superset of the +functionality provided by pointers -- there is still no way to distinguish +between the reference and the referenced object. This results in confusion where +references are understood to be syntactic sugar over a pointer, but cannot be +treated as such in several contexts. + +Second, this added complexity would reside exactly in the position of the type +system where additional safety complexity may be needed. We would like to leave +this area (pointers and references to non-local objects) as simple and minimal +as possible to ease the introduction of important safety features going forward +in Carbon. + +### Syntax-free or automatic dereferencing + +One way to make pointers behave very nearly the same as references without +adding complexity to the type system is to automatically dereference them in the +relevant contexts. This can, if done carefully, preserve the ability to +distinguish between the pointer and the pointed-to object while still enabling +pointers to be seamlessly used without syntactic overhead as L-values. + +This proposal does not currently provide a way to dereference with zero syntax, +even on function interface boundaries. The presence of a clear level of +indirection can be an important distinction for readability. It helps surface +that an object that may appear local to the caller is in fact escaped and +referenced externally to some degree. However, it can also harm readability by +forcing code that doesn't _need_ to look different to do so anyway. In the worst +case, this can potentially interfere with being generic. Currently, Carbon +prioritizes making the distinction here visible. + +Reasonable judgement calls about which direction to prefer may differ, but +Carbon's principle of +[preferring lower context sensitivity](/docs/project/principles/low_context_sensitivity.md) +leans (slightly) toward explicit dereferencing instead. That is the current +proposed direction. + +It may prove desirable in the future to provide an ergonomic aid to reduce +dereferencing syntax within function bodies, but this proposal suggests +deferring that in order to better understand the extent and importance of that +use case. If and when it is considered, a direction based around a way to bind a +name to a reference expression in a pattern appears to be a promising technique. +Alternatively, there are various languages with implicit- or +automatic-dereference designs that might be considered in the future such as +Rust. + +#### Syntax-free address-of + +A closely related concern to syntax-free dereference is syntax-free address-of. +Here, Carbon supports one very narrow form of this: implicitly taking the +address of the implicit object parameter of member functions. Currently that is +the only place with such an implicit affordance. It is designed to be +syntactically sound to extend to other parameters, but currently that is not +planned as we don't yet have enough experience to motivate it and it may prove +surprising. + +### Exclusively using references + +While framed differently, this is essentially equivalent to automatic +dereferencing of pointers. The key is that it does not add both options to the +type system but addresses the syntactic differences separately and uses +different operations to distinguish between the reference and the referenced +object when necessary. + +The same core arguments against automatic dereferencing applies equally to this +alternative -- this would remove the explicit visual marker for where non-local +memory is accessed and potentially mutated. + +### Alternative pointer syntaxes + +The syntax both for declaring a pointer type and dereferencing a pointer has +been extensively discussed in the leads question [#523]. + +[#523]: https://github.com/carbon-language/carbon-lang/issues/523 + +The primary sources of concern over a C++-based syntax: + +1. A prefix dereference operator composes poorly with postfix and infix + operations. + + - This is highlighted by the use of `->` for member access due to the poor + composition with `.`: `(*pointer).member`. + +2. Even without replicating the ["inside out"][inside-out] C/C++ challenges, we + would end up with a prefix, postfix, and infix operator `*`. + +[inside-out]: + https://faculty.cs.niu.edu/~mcmahon/CS241/Notes/reading_declarations.html + +The second issue was resolved in [#520], giving us at least the flexibility to +consider `*` both for dereference and pointer type, but we still considered +numerous alternatives given the first concerns. These were discussed in detail +in a [document][pointer-syntax-doc] but the key syntax alternatives are +extracted with modern syntax below. + +[pointer-syntax-doc]: + https://docs.google.com/document/d/1gsP74fLykZBCWZKQua9VP0GnQcADCkn5-TIC1JbUvdA/edit?resourcekey=0-8MsUybUvHDCuejrzadrbMg + +#### Pointer type alternative syntaxes + +**Postfix `*`:** + +``` +var p: i32* = &i; +``` + +Advantages: + +- Most familiar to C/C++ programmers + - Doesn't collide with prefix `*` for dereference so can have that + familiarity as well. + +Disadvantages: + +- Requires the [#520] logic to resolve parsing ambiguities. +- Visually, `*` is used for both pointers and multiplication. +- Interacts poorly with prefix array type syntaxes. +- Looks like a C++ pointer but likely has different semantics. + +**Prefix `*`:** + +``` +var p: *i32 = &i; +``` + +Advantages: + +- Used by Rust, Go, Zig, + [Jai](https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md#memory-management). +- Go in particular has good success transitioning C/C++ programmers to this + syntax. +- Allows reading types left-to-right. + +Disadvantages: + +- "Pointer to T" and "Dereference T" would not be distinguished grammatically, + could particularly be a problem in template code +- `*` used for both pointers and multiplication +- Uncanny valley -- it is very close to C++'s syntax but not quite the same. + +**Prefix `&`:** + +``` +var p: &i32 = &i; +``` + +Advantages: + +- "Type of a pointer is a pointer to the type" +- Allows reading types left-to-right. + +Disadvantages: + +- Visual ambiguity: + + ``` + let X:! type = i32; + // Can't actually write this, but there is a visual collision between + // whether this is the address of `X` or pointer-to-`i32`. + var y: auto = &X; + ``` + +**Prefix `^`:** + +``` +var p: ^i32 = &i; +``` + +Advantages: + +- Used by the Pascal family of languages like + [Delphi](http://rosettacode.org/wiki/Pointers_and_references#Delphi), and a + few others like + [BBC_BASIC](http://rosettacode.org/wiki/Pointers_and_references#BBC_BASIC) +- `^` looks pointy +- `^` operator is not heavily used otherwise (as a binary op it could be + bit-xor or raise-to-power). +- Allows reading types left-to-right. + +**`Ptr(T)`:** + +``` +var p: Ptr(i32) = &i; +``` + +Advantages: + +- It is simple and unambiguous. +- Better supports a variety of pointer types. +- Allows reading types left-to-right. + +Disadvantages: + +- Ends up making common code verbose. +- Not widely used in other languages. + +#### Pointer dereference syntax alternatives + +**Prefix `*`:** + +``` +*p = *p * *p + (*q)[3] + r->x; +``` + +Advantages: + +- Matches C/C++, and will be familiar to our initial users. +- When _not_ composed with other operations, prefix may be easier to recognize + visually than postfix. + +Disadvantages: + +- `*` is used for both pointers and multiplication. +- Need parens or operator `->` to resolve precedence issues when composing + with postfix and infix operations, which is common. + +**Postfix `^`:** + +``` +p^ = p^ * p^ + q^[3] + r^.x; +``` + +Advantages: + +- Generally fewer precedence issues. +- No need for `->` operator. +- Used by Pascal family of languages. + +Disadvantages: + +- Not familiar to C/C++ users. +- Potential conflict with `^` as xor or exponentiation. +- Non-ergonomic: would be very frequently typed, but also a stretching motion + for typing. + +**Postfix `[]`:** + +``` +p[] = p[] * p[] + q[][3] + r[].x; +``` + +Advantages: + +- Generally fewer precedence issues. +- No need for `->` operator. +- Used by Swift. + +Disadvantages: + +- Two characters is a bit heavier visually. +- Possibly confusing given that plain pointers are used for non-array + pointees. + +Maybe should pair this with prefix `[]` to make pointer types? Would also need +to distinguish slice types and maybe dynamically-sized-array types. + +#### Pointer syntax conclusion + +Ultimately, the decision in [#523] was to keep Carbon's syntax familiar to +C++'s. The challenges that presents and advantages of changes weren't sufficient +to overcome the advantage of familiarity and for the specific challenges we have +effective solutions. + +### Alternative syntaxes for locals + +Several other options for declaring locals were considered, but none ended up +outweighing the proposed option on balance. + +- `let` and `let mut`, based on the Rust names + - While nicely non-inventive, there was both concern that `mut` didn't as + effectively communicate the requirement of storage and concern around + not being as obviously or unambiguously a good default with `let`. Put + differently, the `mut` keyword feels more fitting in its use with + mutable borrows than as a declaration introducer the way it would work + in Carbon. + - There was a desire, for locals especially, to have both cases be + similarly easy to spell. While mutation being visible does seem helpful, + this specific approach seemed to go too far into being a tax. +- `val` and `var` + - Appealing to use `val` instead of `let` given that these form value + expression bindings. These are also likely to be taught and discussed as + "local values" which would align well with the `val` introducer. + - Not completely inventive, used in Kotlin for example. But rare, overall. + - However, _very_ visually similar to `var` which makes code harder to + read at a glance. + - Also difficult when speaking or listening to pronounce or hear the + distinction. +- `const` and `var` + - Some familiarity for C++ developers given the use of `const` there. + - `const` is used by other languages in a similar way to Carbon's `let`. + - However, very concerned about re-using `const` but having it mean + something fairly different from C++ as a declaration introducer. For + example, nesting a `var` pattern within `const` might be especially + surprising. + +Ultimately, the overwhelming most popular introducer for immutable local values +across popular languages that have such a distinction is `let`. Using that makes +Carbon unsurprising in the world of programming languages which is a good thing. +Using `var` to help signify the allocation of storage and given it also having +widespread usage across popular languages. + +### Make `const` a postfix rather than prefix operator + +Using a prefix operator for `const` and a postfix operator for `*` causes them +to require parentheses for complex nesting. We could avoid that by using a +postfix `const` and this would also allow more combinations in Carbon to be +valid in C++ with the same meaning as Carbon such as `T* const`. + +This direction isn't pursued because: + +- We expect pointers-to-`const` to be significantly more common in code than + `const`-pointers-to-non-`const`, even more-so than is already the case in + C++. And for `pointers-to-const`, we lean towards matching more widespread + convention of `const T*` rather than `T const*`. + - We are aware that some developers have a stylistic preference for "East + `const`" which would write `T const*` in C++. However, that preference + and advocacy for this style hasn't yet caused it to become more + widespread or widely adopted style. +- We don't expect the parenthesized form of `const (T*)` to be _confusing_ to + C++ developers even though C++ doesn't allow this approach to forming a + `const`-pointer-to-non-`const`. + +This alternative was only lightly considered and can be revisited if we get +evidence that motivates a change here. + +## Appendix: background, context, and use cases from C++ + +This appendix provides an examination of C++'s fundamental facilities in the +space involving `const` qualification, references (including R-value +references), and pointers. Beyond the expression categorization needed to +provide a complete model for the language, these use cases help inform the space +that should be covered by the proposed design. + +### `const` references versus `const` itself + +C++ provides overlapping but importantly separable semantic models which +interact with `const` references. + +1. An _immutable view_ of a value +2. A _thread-safe interface_ of a [thread-compatible type][] + +[thread-compatible type]: + https://abseil.io/blog/20180531-regular-types#:~:text=restrictions%20or%20both,No%20concurrent%20call + +Some examples of the immutable view use case are provided below. These include +`const` reference parameters and locals, as well as `const` declared local and +static objects. + +```cpp +void SomeFunction(const int &id) { + // Here `id` is an immutable view of some value provided by the caller. +} + +void OtherFunction(...) { + // ... + + const int &other_id = ; + + // Cannot mutate `other_id` here either, it is just a view of the result of + // `` above. But we can pass it along to another + // function accepting an immutable view: + SomeFunction(other_id); + + // We can also pass ephemeral values: + SomeFunction(other_id + 2); + + // Or values that may be backed by read-only memory: + static const int fixed_id = 42; SomeFunction(fixed_id); +} +``` + +The _immutable view_ `id` in `SomeFunction` can be thought of as requiring that +the semantics of the program be exactly the same whether it is implemented in +terms of a view of the initializing expression or a copy of that value, perhaps +in a register. + +The implications of the semantic equivalence help illustrate the requirements: + +- The input value must not change while the view is visible, or else a copy + would hide those changes. +- The view must not be used to mutate the value, or those mutations would be + lost if made to a copy. +- The identity of the object must not be relevant, or else inspection of its + address would reveal whether a copy was used. + +Put differently, these restrictions make a copy valid under the +[as-if rule](https://en.cppreference.com/w/cpp/language/as_if). + +The _thread-safe interface_ use case is the more prevalent use of `const` in +APIs. It is most commonly seen with code that looks like: + +```cpp +class MyThreadCompatibleType { + public: + // ... + + int Size() const { return size; } + + private: + int size; + + // ... +}; + +void SomeFunction(const MyThreadCompatibleType *thing) { + // .... + + // Users can expect calls to `Size` here to be correct even if running on + // multiple threads with a shared `thing`. + int thing_size = thing->Size(); + + // ... +} +``` + +The first can seem like a subset of the second, but this isn't really true. +There are cases where `const` works for the first use case but doesn't work well +for thread-safety: + +```cpp +void SomeFunction(...) { + // ... + + // We never want to release or re-allocate `data` and `const` makes sure that + // doesn't happen. But the actual data is completely mutable! + const std::unique_ptr data = ComputeBigData(); + + // ... +} +``` + +These two use cases can also lead to tension between shallow const and deep +const: + +- Immutability use cases will tend towards shallow(-er) const, like pointers. +- Thread safety use cases will tend towards deep(-er) const. + +### Pointers + +The core of C++'s indirect access to an object stored somewhere else comes from +C and its lineage of explicit pointer types. These create an unambiguous +separate layer between the pointer object and the pointee object, and introduce +dereference syntax (both the unary `*` operator and the `->` operator). + +C++ makes an important extension to this model to represent _smart pointers_ by +allowing the dereference operators to be overloaded. This can be seen across a +wide range of APIs such as `std::unique_ptr`, `std::shared_ptr`, +`std::weak_ptr`, etc. These user-defined types preserve a fundamental property +of C++ pointers: the separation between the pointer object and the pointee +object. + +The distinction between pointer and pointee is made syntactically explicit in +C++ both when _dereferencing_ a pointer, and when _forming_ the pointer or +taking an object's address. These two sides can be best illustrated when +pointers are used for function parameters. The caller code must explicitly take +the address of an object to pass it to the function, and the callee code must +explicitly dereference the pointer to access the caller-provided object. + +### References + +C++ provides for indirection _without_ the syntactic separation of pointers: +references. Because a reference provides no syntactic distinction between the +reference and the referenced object--that is their point!--it is impossible to +refer to the reference itself in C++. This creates a number of restrictions on +their design: + +- They _must_ be initialized when declared +- They cannot be rebound or unbound. +- Their address cannot be taken. + +References were introduced originally to enable operator overloading, but have +been extended repeatedly and as a consequence fill a wide range of use cases. +Separating these and understanding them is essential to forming a cohesive +proposal for Carbon -- that is the focus of the rest of our analysis of +references here. + +#### Special but critical case of `const T&` + +As mentioned above, one form of reference in C++ has unique properties: +`const T&` for some type `T`, or a _`const` reference_. The primary use for +these is also the one that motivates its unique properties: a zero-copy way to +provide an input function parameter without requiring the syntactic distinction +in the caller and callee needed when using a pointer. The intent is to safely +emulate passing by-value without the cost of copying. Provided the usage is +immutable, this emulation can safely be done with a reference and so a `const` +reference fits the bill here. + +However, to make zero-copy, pass-by-value to work in practice, it must be +possible to pass a _temporary_ object. That works well with by-value parameters +after all. To make this work, C++ allows a `const` reference to bind to a +temporary. However, the rules for parameters and locals are the same in C++ and +so this would create serious lifetime bugs. This is fixed in C++ by applying +_lifetime extension_ to the temporary. The result is that `const` references are +quite different from other references, but they are also quite useful: they are +the primary tool used to fill the _immutable view_ use case of `const`. + +One significant disadvantage of `const` references is that they are observably +still references. When used in function parameters, they cannot be implemented +with in-register parameters, etc. This complicates the selection of readonly +input parameter type for functions, as both using a `const` reference and a +by-value parameter force a particular form of overhead. Similarly, range based +`for` loops in C++ have to choose between a reference or value type when each +would be preferable in different situations. + +#### R-value references and forwarding references + +Another special set of use cases for references are R-value and forwarding +references. These are used to capture _lifetime_ information in the type system +in addition to binding a reference. By doing so, they can allow overload +resolution to select C++'s _move semantics_ when appropriate for operations. + +The primary use case for move semantics in function boundaries was to model +_consuming input_ parameters. Because move semantics were being added to an +existing language and ecosystem that had evolved exclusively using copies, +modeling consumption by moving into a by-value parameter would have forced an +eager and potentially expensive copy in many cases. Adding R-value reference +parameters and overloading on them allowed code to gracefully degrade in the +absence of move semantics -- their internal implementation could minimally copy +anything non-movable. These overloads also helped reduce the total number of +moves by avoiding moving first into the parameter and then out of the parameter. +This kind of micro-optimization of moves was seen as important because some +interesting data structures, especially in the face of exception safety +guarantees, either implemented moves as copies or in ways that required +non-trivial work like memory allocation. + +Using R-value references and overloading also provided a minor benefit to C++: +the lowest-level mechanics of move semantics such as move construction and +assignment easily fit into the function overloading model that already existed +for these special member functions. + +These special member functions are just a special case of a more general pattern +enabled by R-value references: designing interfaces that use _lifetime +overloading_ to _detect_ whether a move would be possible and change +implementation strategy based on how they are called. Both the move constructor +and the move-assignment operator in C++ work on this principle. However, other +use cases for this design pattern are so far rare. For example, Google's C++ +style +[forbids](https://google.github.io/styleguide/cppguide.html#Rvalue_references) +R-value references outside of an enumerated set of use cases, which has been +extended incrementally based on demonstrated need, and has now been stable for +some time. While overloading on lifetime is one of the allowed use cases, that +exemption was added almost four years after the initial exemption of move +constructors and move assignment operators. + +#### Mutable operands to user-defined operators + +C++ user-defined operators have their operands directly passed as parameters. +When these operators require _mutable operands_, references are used to avoid +the syntactic overhead and potential semantic confusion of taking their address +explicitly. This use case stems from the combined design decisions of having +operators that mutate their operands in-place and requiring the operand +expression to be directly passed as a normal function parameter. + +#### User-defined dereference and indexed access syntax + +C++ also allows user-defined operators that model dereference (or indirecting in +the C++ standard) and indexed access (`*` and `[]`). Because these operators +specifically model forming an L-value and because the return of the operator +definition is directly used as the expression, it is necessary to return a +reference to the already-dereferenced object. Returning a pointer would break +genericity with builtin pointers and arrays in addition to adding a very +significant syntactic overhead. + +#### Member and subobject accessors + +Another common use of references is in returns from member functions to provide +access to a member or subobject, whether const or mutable. This particular use +case is worth calling out specially as it has an interesting property: this is +often not a fully indirect access. Instead, it is often simply selecting a +particular member, field, or other subobject of the data structure. As a +consequence, making subsequent access transparent seems especially desirable. + +However, it is worth noting that this particular use case is also an especially +common source of lifetime bugs. A classic and pervasive example can be seen when +calling such a method on a temporary object. The returned reference is almost +immediately invalid. + +#### Non-null pointers + +A common reason for using mutable references outside of what has already been +described is to represent _non-null pointers_ with enforcement in the type +system. Because the canonical pointer types in C++ are allowed to be null, +systems that forbid a null in the type system use references to induce any null +checks to be as early as possible. This causes a +"[shift left](https://en.wikipedia.org/wiki/Shift-left_testing)" of handling +null pointers, both moving the error closer to its cause logically and +increasing the chance of moving earlier in the development process by making it +a static property enforced at compile time. + +References are imperfectly suited to modeling non-null pointers because they are +missing many of the fundamental properties of pointers such as being able to +rebind them, being able to take their address, etc. Also, references cannot be +safely made `const` in the same places that pointers can because that might +unintentionally change their semantics by allowing temporaries or extending +lifetimes. + +#### Syntax-free dereference + +Beyond serving as a non-null pointer, the other broad use case for references is +to remove the syntactic overhead of taking an address and dereferencing +pointers. In other words, they provide a way to have _syntax-free dereferences_. +Outside of function parameters, removing this distinction may provide a +genericity benefit, as it allows using the same syntax as would be used with +non-references. In theory code could simply use pointers everywhere, but this +would add syntactic overhead compared to local variables and references. For +immutable accesses, the syntactic overhead seems unnecessary and unhelpful. +However, having distinct syntax for _mutable_ iteration, container access, and +so on often makes code more readable. + +There are several cases that have come up in the design of common data +structures where the use of distinct syntaxes immutable and mutable operations +provides clear benefit: copy-on-write containers where the costs are +dramatically different, and associative containers which need to distinguish +between looking up an element and inserting an element. This tension should be +reflected in how we design _indexed access syntax_. + +Using mutable references for parameters to reduce syntactic overhead also +doesn't seem particularly compelling. For passing parameters, the caller syntax +seems to provide significant benefit to readability. When using _non-local_ +objects in expressions, the fact that there is a genuine indirection into memory +seems to also have high value to readability. These syntactic differences do +make inline code and outlined code look different, but that reflects a behavior +difference in this case.