diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 0e2d7c0a4..24552a915 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -8,12 +8,17 @@ >    | [BYTE_LITERAL]\ >    | [BYTE_STRING_LITERAL]\ >    | [RAW_BYTE_STRING_LITERAL]\ ->    | [INTEGER_LITERAL]\ +>    | [INTEGER_LITERAL][^out-of-range]\ >    | [FLOAT_LITERAL]\ >    | [BOOLEAN_LITERAL] +> +> [^out-of-range]: A value ≥ 2128 is not allowed. -A _literal expression_ consists of one of the [literal](../tokens.md#literals) forms described earlier. -It directly describes a number, character, string, or boolean value. +A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. + +A literal is a form of [constant expression], so is evaluated (primarily) at compile time. + +Each of the lexical [literal][literal tokens] forms described earlier can make up a literal expression. ```rust "hello"; // string type @@ -21,6 +26,146 @@ It directly describes a number, character, string, or boolean value. 5; // integer type ``` +## Character literal expressions + +A character literal expression consists of a single [CHAR_LITERAL] token. + +> **Note**: This section is incomplete. + +## String literal expressions + +A string literal expression consists of a single [STRING_LITERAL] or [RAW_STRING_LITERAL] token. + +> **Note**: This section is incomplete. + +## Byte literal expressions + +A byte literal expression consists of a single [BYTE_LITERAL] token. + +> **Note**: This section is incomplete. + +## Byte string literal expressions + +A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_BYTE_STRING_LITERAL] token. + +> **Note**: This section is incomplete. + +## Integer literal expressions + +An integer literal expression consists of a single [INTEGER_LITERAL] token. + +If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type. + +If the token has no suffix, the expression's type is determined by type inference: + +* If an integer type can be _uniquely_ determined from the surrounding program context, the expression has that type. + +* If the program context under-constrains the type, it defaults to the signed 32-bit integer `i32`. + +* If the program context over-constrains the type, it is considered a static type error. + +Examples of integer literal expressions: + +```rust +123; // type i32 +123i32; // type i32 +123u32; // type u32 +123_u32; // type u32 +let a: u64 = 123; // type u64 + +0xff; // type i32 +0xff_u8; // type u8 + +0o70; // type i32 +0o70_i16; // type i16 + +0b1111_1111_1001_0000; // type i32 +0b1111_1111_1001_0000i64; // type i64 + +0usize; // type usize +``` + +The value of the expression is determined from the string representation of the token as follows: + +* An integer radix is chosen by inspecting the first two characters of the string, as follows: + + * `0b` indicates radix 2 + * `0o` indicates radix 8 + * `0x` indicates radix 16 + * otherwise the radix is 10. + +* If the radix is not 10, the first two characters are removed from the string. + +* Any underscores are removed from the string. + +* The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix. +If the value does not fit in `u128`, the expression is rejected by the parser. + +* The `u128` value is converted to the expression's type via a [numeric cast]. + +> **Note**: The final cast will truncate the value of the literal if it does not fit in the expression's type. +> `rustc` includes a [lint check] named `overflowing_literals`, defaulting to `deny`, which rejects expressions where this occurs. + +> **Note**: `-1i8`, for example, is an application of the [negation operator] to the literal expression `1i8`, not a single integer literal expression. + +## Floating-point literal expressions + +A floating-point literal expression consists of a single [FLOAT_LITERAL] token. + +If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type. + +If the token has no suffix, the expression's type is determined by type inference: + +* If a floating-point type can be _uniquely_ determined from the surrounding program context, the expression has that type. + +* If the program context under-constrains the type, it defaults to `f64`. + +* If the program context over-constrains the type, it is considered a static type error. + +Examples of floating-point literal expressions: + +```rust +123.0f64; // type f64 +0.1f64; // type f64 +0.1f32; // type f32 +12E+99_f64; // type f64 +5f32; // type f32 +let x: f64 = 2.; // type f64 +``` + +The value of the expression is determined from the string representation of the token as follows: + +* Any underscores are removed from the string. + +* The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`]. + +> **Note**: `-1.0`, for example, is an application of the [negation operator] to the literal expression `1.0`, not a single floating-point literal expression. + +> **Note**: `inf` and `NaN` are not literal tokens. +> The [`f32::INFINITY`], [`f64::INFINITY`], [`f32::NAN`], and [`f64::NAN`] constants can be used instead of literal expressions. +> In `rustc`, a literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check. + +## Boolean literal expressions + +A boolean literal expression consists of a single [BOOLEAN_LITERAL] token. + +> **Note**: This section is incomplete. + +[constant expression]: ../const_eval.md#constant-expressions +[floating-point types]: ../types/numeric.md#floating-point-types +[lint check]: ../attributes/diagnostics.md#lint-check-attributes +[literal tokens]: ../tokens.md#literals +[numeric cast]: operator-expr.md#numeric-cast +[numeric types]: ../types/numeric.md +[suffix]: ../tokens.md#suffixes +[negation operator]: operator-expr.md#negation-operators +[`f32::from_str`]: ../../core/primitive.f32.md#method.from_str +[`f32::INFINITY`]: ../../core/primitive.f32.md#associatedconstant.INFINITY +[`f32::NAN`]: ../../core/primitive.f32.md#associatedconstant.NAN +[`f64::from_str`]: ../../core/primitive.f64.md#method.from_str +[`f64::INFINITY`]: ../../core/primitive.f64.md#associatedconstant.INFINITY +[`f64::NAN`]: ../../core/primitive.f64.md#associatedconstant.NAN +[`u128::from_str_radix`]: ../../core/primitive.u128.md#method.from_str_radix [CHAR_LITERAL]: ../tokens.md#character-literals [STRING_LITERAL]: ../tokens.md#string-literals [RAW_STRING_LITERAL]: ../tokens.md#raw-string-literals diff --git a/src/tokens.md b/src/tokens.md index 5516fb7b3..0db558330 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -18,11 +18,7 @@ table production] form, and appear in `monospace` font. ## Literals -A literal is an expression consisting of a single token, rather than a sequence -of tokens, that immediately and directly denotes the value it evaluates to, -rather than referring to it by name or some other evaluation rule. A literal is -a form of [constant expression](const_eval.md#constant-expressions), so is -evaluated (primarily) at compile time. +Literals are tokens used in [literal expressions]. ### Examples @@ -363,75 +359,56 @@ An _integer literal_ has one of four forms: (`0b`) and continues as any mixture (with at least one digit) of binary digits and underscores. -Like any literal, an integer literal may be followed (immediately, -without any spaces) by an _integer suffix_, which forcibly sets the -type of the literal. The integer suffix must be the name of one of the -integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, -`u128`, `i128`, `usize`, or `isize`. +Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]: +`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`. +See [literal expressions] for the effect of these suffixes. -The type of an _unsuffixed_ integer literal is determined by type inference: - -* If an integer type can be _uniquely_ determined from the surrounding - program context, the unsuffixed integer literal has that type. +Examples of integer literals of various forms: -* If the program context under-constrains the type, it defaults to the - signed 32-bit integer `i32`. +```rust +# #![allow(overflowing_literals)] +123; +123i32; +123u32; +123_u32; -* If the program context over-constrains the type, it is considered a - static type error. +0xff; +0xff_u8; +0x01_f32; // integer 7986, not floating-point 1.0 +0x01_e3; // integer 483, not floating-point 1000.0 -Examples of integer literals of various forms: +0o70; +0o70_i16; -```rust -123; // type i32 -123i32; // type i32 -123u32; // type u32 -123_u32; // type u32 -let a: u64 = 123; // type u64 +0b1111_1111_1001_0000; +0b1111_1111_1001_0000i64; +0b________1; -0xff; // type i32 -0xff_u8; // type u8 +0usize; -0o70; // type i32 -0o70_i16; // type i16 +// These are too big for their type, but are still valid tokens -0b1111_1111_1001_0000; // type i32 -0b1111_1111_1001_0000i64; // type i64 -0b________1; // type i32 +128_i8; +256_u8; -0usize; // type usize ``` +Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. + Examples of invalid integer literals: ```rust,compile_fail -// invalid suffixes - -0invalidSuffix; - // uses numbers of the wrong base -123AFB43; 0b0102; 0o0581; -// integers too big for their type (they overflow) - -128_i8; -256_u8; - // bin, hex, and octal literals must have at least one digit 0b_; 0b____; ``` -Note that the Rust syntax considers `-1i8` as an application of the [unary minus -operator] to an integer literal `1i8`, rather than -a single integer literal. - -[unary minus operator]: expressions/operator-expr.md#negation-operators - #### Tuple index > **Lexer**\ @@ -464,60 +441,124 @@ let horse = example.0b10; // ERROR no field named `0b10` > **Lexer**\ > FLOAT_LITERAL :\ >       DEC_LITERAL `.` -> _(not immediately followed by `.`, `_` or an [identifier]_)\ +> _(not immediately followed by `.`, `_` or an XID_Start character)_\ >    | DEC_LITERAL FLOAT_EXPONENT\ >    | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT?\ >    | DEC_LITERAL (`.` DEC_LITERAL)? > FLOAT_EXPONENT? FLOAT_SUFFIX > > FLOAT_EXPONENT :\ ->    (`e`|`E`) (`+`|`-`)? +>    (`e`|`E`) (`+`|`-`)? > (DEC_DIGIT|`_`)\* DEC_DIGIT (DEC_DIGIT|`_`)\* > > FLOAT_SUFFIX :\ >    `f32` | `f64` -A _floating-point literal_ has one of two forms: +A _floating-point literal_ has one of three forms: * A _decimal literal_ followed by a period character `U+002E` (`.`). This is optionally followed by another decimal literal, with an optional _exponent_. * A single _decimal literal_ followed by an _exponent_. +* A single _decimal literal_ (in which case a suffix is required). Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with `U+002E` (`.`). -The suffix forcibly sets the type of the literal. There are two valid -_floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point -types), which explicitly determine the type of the literal. - -The type of an _unsuffixed_ floating-point literal is determined by -type inference: - -* If a floating-point type can be _uniquely_ determined from the - surrounding program context, the unsuffixed floating-point literal - has that type. - -* If the program context under-constrains the type, it defaults to `f64`. - -* If the program context over-constrains the type, it is considered a - static type error. +There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]). +See [literal expressions] for the effect of these suffixes. Examples of floating-point literals of various forms: ```rust -123.0f64; // type f64 -0.1f64; // type f64 -0.1f32; // type f32 -12E+99_f64; // type f64 -5f32; // type f32 -let x: f64 = 2.; // type f64 +123.0f64; +0.1f64; +0.1f32; +12E+99_f64; +5f32; +let x: f64 = 2.; ``` This last example is different because it is not possible to use the suffix syntax with a floating point literal ending in a period. `2.f64` would attempt to call a method named `f64` on `2`. -The representation semantics of floating-point numbers are described in -["Machine Types"][machine types]. +Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. + +#### Number pseudoliterals + +> **Lexer**\ +> NUMBER_PSEUDOLITERAL :\ +>       DEC_LITERAL ( . DEC_LITERAL )? FLOAT_EXPONENT\ +>          ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\ +>    | DEC_LITERAL . DEC_LITERAL\ +>          ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\ +>    | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\ +>    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\ +>          ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX ) +> +> NUMBER_PSEUDOLITERAL_SUFFIX :\ +>    IDENTIFIER_OR_KEYWORD _not matching INTEGER_SUFFIX or FLOAT_SUFFIX_ +> +> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\ +>    NUMBER_PSEUDOLITERAL_SUFFIX _not beginning with `e` or `E`_ + +Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above. +These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments. + +Examples of such tokens: +```rust,compile_fail +0invalidSuffix; +123AFB43; +0b010a; +0xAB_CD_EF_GH; +2.0f80; +2e5f80; +2e5e6; +2.0e5e6; +1.3e10u64; +0b1111_f32; +``` + +#### Reserved forms similar to number literals + +> **Lexer**\ +> RESERVED_NUMBER :\ +>       BIN_LITERAL \[`2`-`9`​]\ +>    | OCT_LITERAL \[`8`-`9`​]\ +>    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \ +>          _(not immediately followed by `.`, `_` or an XID_Start character)_\ +>    | ( BIN_LITERAL | OCT_LITERAL ) `e`\ +>    | `0b` `_`\* _end of input or not BIN_DIGIT_\ +>    | `0o` `_`\* _end of input or not OCT_DIGIT_\ +>    | `0x` `_`\* _end of input or not HEX_DIGIT_\ +>    | DEC_LITERAL ( . DEC_LITERAL)? (`e`|`E`) (`+`|`-`)? _end of input or not DEC_DIGIT_ + +The following lexical forms similar to number literals are _reserved forms_. +Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens. + +* An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix. + +* An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals). + +* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`. + +* Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits). + +* Input which has the form of a floating-point literal with no digits in the exponent. + +Examples of reserved forms: + +```rust,compile_fail +0b0102; // this is not `0b010` followed by `2` +0o1279; // this is not `0o127` followed by `9` +0x80.0; // this is not `0x80` followed by `.` and `0` +0b101e; // this is not a pseudoliteral, or `0b101` followed by `e` +0b; // this is not a pseudoliteral, or `0` followed by `b` +0b_; // this is not a pseudoliteral, or `0` followed by `b_` +2e; // this is not a pseudoliteral, or `2` followed by `e` +2.0e; // this is not a pseudoliteral, or `2.0` followed by `e` +2em; // this is not a pseudoliteral, or `2` followed by `em` +2.0em; // this is not a pseudoliteral, or `2.0` followed by `em` +``` ### Boolean literals @@ -542,8 +583,6 @@ Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in macros. -[loop labels]: expressions/loop-expr.md - ## Punctuation Punctuation symbol tokens are listed here for completeness. Their individual @@ -609,6 +648,41 @@ them are referred to as "token trees" in [macros]. The three types of brackets | `[` `]` | Square brackets | | `(` `)` | Parentheses | +## Reserved prefixes + +> **Lexer 2021+**\ +> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b` or `r` or `br`_ | `_` ) `"`\ +> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b`_ | `_` ) `'`\ +> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _Except `r` or `br`_ | `_` ) `#` + +Some lexical forms known as _reserved prefixes_ are reserved for future use. + +Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. + +Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix. + +Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes. + +> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros). +> +> Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token). +> +> Examples accepted in all editions: +> ```rust +> macro_rules! lexes {($($_:tt)*) => {}} +> lexes!{a #foo} +> lexes!{continue 'foo} +> lexes!{match "..." {}} +> lexes!{r#let#foo} // three tokens: r#let # foo +> ``` +> +> Examples accepted before the 2021 edition but rejected later: +> ```rust,edition2018 +> macro_rules! lexes {($($_:tt)*) => {}} +> lexes!{a#foo} +> lexes!{continue'foo} +> lexes!{match"..." {}} +> ``` [Inferred types]: types/inferred.md [Range patterns]: patterns.md#range-patterns @@ -629,6 +703,7 @@ them are referred to as "token trees" in [macros]. The three types of brackets [extern crates]: items/extern-crates.md [extern]: items/external-blocks.md [field]: expressions/field-expr.md +[floating-point types]: types/numeric.md#floating-point-types [function pointer type]: types/function-pointer.md [functions]: items/functions.md [generics]: items/generics.md @@ -636,12 +711,14 @@ them are referred to as "token trees" in [macros]. The three types of brackets [if let]: expressions/if-expr.md#if-let-expressions [keywords]: keywords.md [lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators -[machine types]: types/numeric.md +[literal expressions]: expressions/literal-expr.md +[loop labels]: expressions/loop-expr.md [macros]: macros-by-example.md [match]: expressions/match-expr.md [negation]: expressions/operator-expr.md#negation-operators [negative impls]: items/implementations.md [never type]: types/never.md +[numeric types]: types/numeric.md [paths]: paths.md [patterns]: patterns.md [question]: expressions/operator-expr.md#the-question-mark-operator @@ -656,42 +733,7 @@ them are referred to as "token trees" in [macros]. The three types of brackets [tuple structs]: items/structs.md [tuple variants]: items/enumerations.md [tuples]: types/tuple.md +[unary minus operator]: expressions/operator-expr.md#negation-operators [use declarations]: items/use-declarations.md [use wildcards]: items/use-declarations.md [while let]: expressions/loop-expr.md#predicate-pattern-loops - -## Reserved prefixes - -> **Lexer 2021+**\ -> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b` or `r` or `br`_ | `_` ) `"`\ -> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b`_ | `_` ) `'`\ -> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _Except `r` or `br`_ | `_` ) `#` - -Some lexical forms known as _reserved prefixes_ are reserved for future use. - -Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. - -Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix. - -Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes. - -> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros). -> -> Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token). -> -> Examples accepted in all editions: -> ```rust -> macro_rules! lexes {($($_:tt)*) => {}} -> lexes!{a #foo} -> lexes!{continue 'foo} -> lexes!{match "..." {}} -> lexes!{r#let#foo} // three tokens: r#let # foo -> ``` -> -> Examples accepted before the 2021 edition but rejected later: -> ```rust,edition2018 -> macro_rules! lexes {($($_:tt)*) => {}} -> lexes!{a#foo} -> lexes!{continue'foo} -> lexes!{match"..." {}} -> ```