diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md
index 0e2d7c0a4..24552a915 100644
--- a/src/expressions/literal-expr.md
+++ b/src/expressions/literal-expr.md
@@ -8,12 +8,17 @@
> | [BYTE_LITERAL]\
> | [BYTE_STRING_LITERAL]\
> | [RAW_BYTE_STRING_LITERAL]\
-> | [INTEGER_LITERAL]\
+> | [INTEGER_LITERAL][^out-of-range]\
> | [FLOAT_LITERAL]\
> | [BOOLEAN_LITERAL]
+>
+> [^out-of-range]: A value ≥ 2128 is not allowed.
-A _literal expression_ consists of one of the [literal](../tokens.md#literals) forms described earlier.
-It directly describes a number, character, string, or boolean value.
+A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule.
+
+A literal is a form of [constant expression], so is evaluated (primarily) at compile time.
+
+Each of the lexical [literal][literal tokens] forms described earlier can make up a literal expression.
```rust
"hello"; // string type
@@ -21,6 +26,146 @@ It directly describes a number, character, string, or boolean value.
5; // integer type
```
+## Character literal expressions
+
+A character literal expression consists of a single [CHAR_LITERAL] token.
+
+> **Note**: This section is incomplete.
+
+## String literal expressions
+
+A string literal expression consists of a single [STRING_LITERAL] or [RAW_STRING_LITERAL] token.
+
+> **Note**: This section is incomplete.
+
+## Byte literal expressions
+
+A byte literal expression consists of a single [BYTE_LITERAL] token.
+
+> **Note**: This section is incomplete.
+
+## Byte string literal expressions
+
+A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_BYTE_STRING_LITERAL] token.
+
+> **Note**: This section is incomplete.
+
+## Integer literal expressions
+
+An integer literal expression consists of a single [INTEGER_LITERAL] token.
+
+If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type.
+
+If the token has no suffix, the expression's type is determined by type inference:
+
+* If an integer type can be _uniquely_ determined from the surrounding program context, the expression has that type.
+
+* If the program context under-constrains the type, it defaults to the signed 32-bit integer `i32`.
+
+* If the program context over-constrains the type, it is considered a static type error.
+
+Examples of integer literal expressions:
+
+```rust
+123; // type i32
+123i32; // type i32
+123u32; // type u32
+123_u32; // type u32
+let a: u64 = 123; // type u64
+
+0xff; // type i32
+0xff_u8; // type u8
+
+0o70; // type i32
+0o70_i16; // type i16
+
+0b1111_1111_1001_0000; // type i32
+0b1111_1111_1001_0000i64; // type i64
+
+0usize; // type usize
+```
+
+The value of the expression is determined from the string representation of the token as follows:
+
+* An integer radix is chosen by inspecting the first two characters of the string, as follows:
+
+ * `0b` indicates radix 2
+ * `0o` indicates radix 8
+ * `0x` indicates radix 16
+ * otherwise the radix is 10.
+
+* If the radix is not 10, the first two characters are removed from the string.
+
+* Any underscores are removed from the string.
+
+* The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix.
+If the value does not fit in `u128`, the expression is rejected by the parser.
+
+* The `u128` value is converted to the expression's type via a [numeric cast].
+
+> **Note**: The final cast will truncate the value of the literal if it does not fit in the expression's type.
+> `rustc` includes a [lint check] named `overflowing_literals`, defaulting to `deny`, which rejects expressions where this occurs.
+
+> **Note**: `-1i8`, for example, is an application of the [negation operator] to the literal expression `1i8`, not a single integer literal expression.
+
+## Floating-point literal expressions
+
+A floating-point literal expression consists of a single [FLOAT_LITERAL] token.
+
+If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type.
+
+If the token has no suffix, the expression's type is determined by type inference:
+
+* If a floating-point type can be _uniquely_ determined from the surrounding program context, the expression has that type.
+
+* If the program context under-constrains the type, it defaults to `f64`.
+
+* If the program context over-constrains the type, it is considered a static type error.
+
+Examples of floating-point literal expressions:
+
+```rust
+123.0f64; // type f64
+0.1f64; // type f64
+0.1f32; // type f32
+12E+99_f64; // type f64
+5f32; // type f32
+let x: f64 = 2.; // type f64
+```
+
+The value of the expression is determined from the string representation of the token as follows:
+
+* Any underscores are removed from the string.
+
+* The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`].
+
+> **Note**: `-1.0`, for example, is an application of the [negation operator] to the literal expression `1.0`, not a single floating-point literal expression.
+
+> **Note**: `inf` and `NaN` are not literal tokens.
+> The [`f32::INFINITY`], [`f64::INFINITY`], [`f32::NAN`], and [`f64::NAN`] constants can be used instead of literal expressions.
+> In `rustc`, a literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check.
+
+## Boolean literal expressions
+
+A boolean literal expression consists of a single [BOOLEAN_LITERAL] token.
+
+> **Note**: This section is incomplete.
+
+[constant expression]: ../const_eval.md#constant-expressions
+[floating-point types]: ../types/numeric.md#floating-point-types
+[lint check]: ../attributes/diagnostics.md#lint-check-attributes
+[literal tokens]: ../tokens.md#literals
+[numeric cast]: operator-expr.md#numeric-cast
+[numeric types]: ../types/numeric.md
+[suffix]: ../tokens.md#suffixes
+[negation operator]: operator-expr.md#negation-operators
+[`f32::from_str`]: ../../core/primitive.f32.md#method.from_str
+[`f32::INFINITY`]: ../../core/primitive.f32.md#associatedconstant.INFINITY
+[`f32::NAN`]: ../../core/primitive.f32.md#associatedconstant.NAN
+[`f64::from_str`]: ../../core/primitive.f64.md#method.from_str
+[`f64::INFINITY`]: ../../core/primitive.f64.md#associatedconstant.INFINITY
+[`f64::NAN`]: ../../core/primitive.f64.md#associatedconstant.NAN
+[`u128::from_str_radix`]: ../../core/primitive.u128.md#method.from_str_radix
[CHAR_LITERAL]: ../tokens.md#character-literals
[STRING_LITERAL]: ../tokens.md#string-literals
[RAW_STRING_LITERAL]: ../tokens.md#raw-string-literals
diff --git a/src/tokens.md b/src/tokens.md
index 5516fb7b3..0db558330 100644
--- a/src/tokens.md
+++ b/src/tokens.md
@@ -18,11 +18,7 @@ table production] form, and appear in `monospace` font.
## Literals
-A literal is an expression consisting of a single token, rather than a sequence
-of tokens, that immediately and directly denotes the value it evaluates to,
-rather than referring to it by name or some other evaluation rule. A literal is
-a form of [constant expression](const_eval.md#constant-expressions), so is
-evaluated (primarily) at compile time.
+Literals are tokens used in [literal expressions].
### Examples
@@ -363,75 +359,56 @@ An _integer literal_ has one of four forms:
(`0b`) and continues as any mixture (with at least one digit) of binary digits
and underscores.
-Like any literal, an integer literal may be followed (immediately,
-without any spaces) by an _integer suffix_, which forcibly sets the
-type of the literal. The integer suffix must be the name of one of the
-integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`,
-`u128`, `i128`, `usize`, or `isize`.
+Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]:
+`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`.
+See [literal expressions] for the effect of these suffixes.
-The type of an _unsuffixed_ integer literal is determined by type inference:
-
-* If an integer type can be _uniquely_ determined from the surrounding
- program context, the unsuffixed integer literal has that type.
+Examples of integer literals of various forms:
-* If the program context under-constrains the type, it defaults to the
- signed 32-bit integer `i32`.
+```rust
+# #![allow(overflowing_literals)]
+123;
+123i32;
+123u32;
+123_u32;
-* If the program context over-constrains the type, it is considered a
- static type error.
+0xff;
+0xff_u8;
+0x01_f32; // integer 7986, not floating-point 1.0
+0x01_e3; // integer 483, not floating-point 1000.0
-Examples of integer literals of various forms:
+0o70;
+0o70_i16;
-```rust
-123; // type i32
-123i32; // type i32
-123u32; // type u32
-123_u32; // type u32
-let a: u64 = 123; // type u64
+0b1111_1111_1001_0000;
+0b1111_1111_1001_0000i64;
+0b________1;
-0xff; // type i32
-0xff_u8; // type u8
+0usize;
-0o70; // type i32
-0o70_i16; // type i16
+// These are too big for their type, but are still valid tokens
-0b1111_1111_1001_0000; // type i32
-0b1111_1111_1001_0000i64; // type i64
-0b________1; // type i32
+128_i8;
+256_u8;
-0usize; // type usize
```
+Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`.
+
Examples of invalid integer literals:
```rust,compile_fail
-// invalid suffixes
-
-0invalidSuffix;
-
// uses numbers of the wrong base
-123AFB43;
0b0102;
0o0581;
-// integers too big for their type (they overflow)
-
-128_i8;
-256_u8;
-
// bin, hex, and octal literals must have at least one digit
0b_;
0b____;
```
-Note that the Rust syntax considers `-1i8` as an application of the [unary minus
-operator] to an integer literal `1i8`, rather than
-a single integer literal.
-
-[unary minus operator]: expressions/operator-expr.md#negation-operators
-
#### Tuple index
> **Lexer**\
@@ -464,60 +441,124 @@ let horse = example.0b10; // ERROR no field named `0b10`
> **Lexer**\
> FLOAT_LITERAL :\
> DEC_LITERAL `.`
-> _(not immediately followed by `.`, `_` or an [identifier]_)\
+> _(not immediately followed by `.`, `_` or an XID_Start character)_\
> | DEC_LITERAL FLOAT_EXPONENT\
> | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT?\
> | DEC_LITERAL (`.` DEC_LITERAL)?
> FLOAT_EXPONENT? FLOAT_SUFFIX
>
> FLOAT_EXPONENT :\
-> (`e`|`E`) (`+`|`-`)?
+> (`e`|`E`) (`+`|`-`)?
> (DEC_DIGIT|`_`)\* DEC_DIGIT (DEC_DIGIT|`_`)\*
>
> FLOAT_SUFFIX :\
> `f32` | `f64`
-A _floating-point literal_ has one of two forms:
+A _floating-point literal_ has one of three forms:
* A _decimal literal_ followed by a period character `U+002E` (`.`). This is
optionally followed by another decimal literal, with an optional _exponent_.
* A single _decimal literal_ followed by an _exponent_.
+* A single _decimal literal_ (in which case a suffix is required).
Like integer literals, a floating-point literal may be followed by a
suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
-The suffix forcibly sets the type of the literal. There are two valid
-_floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point
-types), which explicitly determine the type of the literal.
-
-The type of an _unsuffixed_ floating-point literal is determined by
-type inference:
-
-* If a floating-point type can be _uniquely_ determined from the
- surrounding program context, the unsuffixed floating-point literal
- has that type.
-
-* If the program context under-constrains the type, it defaults to `f64`.
-
-* If the program context over-constrains the type, it is considered a
- static type error.
+There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]).
+See [literal expressions] for the effect of these suffixes.
Examples of floating-point literals of various forms:
```rust
-123.0f64; // type f64
-0.1f64; // type f64
-0.1f32; // type f32
-12E+99_f64; // type f64
-5f32; // type f32
-let x: f64 = 2.; // type f64
+123.0f64;
+0.1f64;
+0.1f32;
+12E+99_f64;
+5f32;
+let x: f64 = 2.;
```
This last example is different because it is not possible to use the suffix
syntax with a floating point literal ending in a period. `2.f64` would attempt
to call a method named `f64` on `2`.
-The representation semantics of floating-point numbers are described in
-["Machine Types"][machine types].
+Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`.
+
+#### Number pseudoliterals
+
+> **Lexer**\
+> NUMBER_PSEUDOLITERAL :\
+> DEC_LITERAL ( . DEC_LITERAL )? FLOAT_EXPONENT\
+> ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\
+> | DEC_LITERAL . DEC_LITERAL\
+> ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\
+> | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\
+> | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\
+> ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX )
+>
+> NUMBER_PSEUDOLITERAL_SUFFIX :\
+> IDENTIFIER_OR_KEYWORD _not matching INTEGER_SUFFIX or FLOAT_SUFFIX_
+>
+> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\
+> NUMBER_PSEUDOLITERAL_SUFFIX _not beginning with `e` or `E`_
+
+Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above.
+These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments.
+
+Examples of such tokens:
+```rust,compile_fail
+0invalidSuffix;
+123AFB43;
+0b010a;
+0xAB_CD_EF_GH;
+2.0f80;
+2e5f80;
+2e5e6;
+2.0e5e6;
+1.3e10u64;
+0b1111_f32;
+```
+
+#### Reserved forms similar to number literals
+
+> **Lexer**\
+> RESERVED_NUMBER :\
+> BIN_LITERAL \[`2`-`9`​]\
+> | OCT_LITERAL \[`8`-`9`​]\
+> | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \
+> _(not immediately followed by `.`, `_` or an XID_Start character)_\
+> | ( BIN_LITERAL | OCT_LITERAL ) `e`\
+> | `0b` `_`\* _end of input or not BIN_DIGIT_\
+> | `0o` `_`\* _end of input or not OCT_DIGIT_\
+> | `0x` `_`\* _end of input or not HEX_DIGIT_\
+> | DEC_LITERAL ( . DEC_LITERAL)? (`e`|`E`) (`+`|`-`)? _end of input or not DEC_DIGIT_
+
+The following lexical forms similar to number literals are _reserved forms_.
+Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens.
+
+* An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix.
+
+* An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals).
+
+* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`.
+
+* Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits).
+
+* Input which has the form of a floating-point literal with no digits in the exponent.
+
+Examples of reserved forms:
+
+```rust,compile_fail
+0b0102; // this is not `0b010` followed by `2`
+0o1279; // this is not `0o127` followed by `9`
+0x80.0; // this is not `0x80` followed by `.` and `0`
+0b101e; // this is not a pseudoliteral, or `0b101` followed by `e`
+0b; // this is not a pseudoliteral, or `0` followed by `b`
+0b_; // this is not a pseudoliteral, or `0` followed by `b_`
+2e; // this is not a pseudoliteral, or `2` followed by `e`
+2.0e; // this is not a pseudoliteral, or `2.0` followed by `e`
+2em; // this is not a pseudoliteral, or `2` followed by `em`
+2.0em; // this is not a pseudoliteral, or `2.0` followed by `em`
+```
### Boolean literals
@@ -542,8 +583,6 @@ Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any
LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in
macros.
-[loop labels]: expressions/loop-expr.md
-
## Punctuation
Punctuation symbol tokens are listed here for completeness. Their individual
@@ -609,6 +648,41 @@ them are referred to as "token trees" in [macros]. The three types of brackets
| `[` `]` | Square brackets |
| `(` `)` | Parentheses |
+## Reserved prefixes
+
+> **Lexer 2021+**\
+> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b` or `r` or `br`_ | `_` ) `"`\
+> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b`_ | `_` ) `'`\
+> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _Except `r` or `br`_ | `_` ) `#`
+
+Some lexical forms known as _reserved prefixes_ are reserved for future use.
+
+Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix.
+
+Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix.
+
+Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
+
+> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
+>
+> Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token).
+>
+> Examples accepted in all editions:
+> ```rust
+> macro_rules! lexes {($($_:tt)*) => {}}
+> lexes!{a #foo}
+> lexes!{continue 'foo}
+> lexes!{match "..." {}}
+> lexes!{r#let#foo} // three tokens: r#let # foo
+> ```
+>
+> Examples accepted before the 2021 edition but rejected later:
+> ```rust,edition2018
+> macro_rules! lexes {($($_:tt)*) => {}}
+> lexes!{a#foo}
+> lexes!{continue'foo}
+> lexes!{match"..." {}}
+> ```
[Inferred types]: types/inferred.md
[Range patterns]: patterns.md#range-patterns
@@ -629,6 +703,7 @@ them are referred to as "token trees" in [macros]. The three types of brackets
[extern crates]: items/extern-crates.md
[extern]: items/external-blocks.md
[field]: expressions/field-expr.md
+[floating-point types]: types/numeric.md#floating-point-types
[function pointer type]: types/function-pointer.md
[functions]: items/functions.md
[generics]: items/generics.md
@@ -636,12 +711,14 @@ them are referred to as "token trees" in [macros]. The three types of brackets
[if let]: expressions/if-expr.md#if-let-expressions
[keywords]: keywords.md
[lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators
-[machine types]: types/numeric.md
+[literal expressions]: expressions/literal-expr.md
+[loop labels]: expressions/loop-expr.md
[macros]: macros-by-example.md
[match]: expressions/match-expr.md
[negation]: expressions/operator-expr.md#negation-operators
[negative impls]: items/implementations.md
[never type]: types/never.md
+[numeric types]: types/numeric.md
[paths]: paths.md
[patterns]: patterns.md
[question]: expressions/operator-expr.md#the-question-mark-operator
@@ -656,42 +733,7 @@ them are referred to as "token trees" in [macros]. The three types of brackets
[tuple structs]: items/structs.md
[tuple variants]: items/enumerations.md
[tuples]: types/tuple.md
+[unary minus operator]: expressions/operator-expr.md#negation-operators
[use declarations]: items/use-declarations.md
[use wildcards]: items/use-declarations.md
[while let]: expressions/loop-expr.md#predicate-pattern-loops
-
-## Reserved prefixes
-
-> **Lexer 2021+**\
-> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b` or `r` or `br`_ | `_` ) `"`\
-> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b`_ | `_` ) `'`\
-> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _Except `r` or `br`_ | `_` ) `#`
-
-Some lexical forms known as _reserved prefixes_ are reserved for future use.
-
-Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix.
-
-Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix.
-
-Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
-
-> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
->
-> Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token).
->
-> Examples accepted in all editions:
-> ```rust
-> macro_rules! lexes {($($_:tt)*) => {}}
-> lexes!{a #foo}
-> lexes!{continue 'foo}
-> lexes!{match "..." {}}
-> lexes!{r#let#foo} // three tokens: r#let # foo
-> ```
->
-> Examples accepted before the 2021 edition but rejected later:
-> ```rust,edition2018
-> macro_rules! lexes {($($_:tt)*) => {}}
-> lexes!{a#foo}
-> lexes!{continue'foo}
-> lexes!{match"..." {}}
-> ```