From 6ad0b1d9b2134620a29a5137515cb960a4985154 Mon Sep 17 00:00:00 2001 From: Mark Juggurnauth-Thomas Date: Sun, 21 Feb 2021 22:59:56 +0000 Subject: [PATCH 1/6] declarative macro metavariable expressions --- text/0000-macro-metavar-expr.md | 332 ++++++++++++++++++++++++++++++++ 1 file changed, 332 insertions(+) create mode 100644 text/0000-macro-metavar-expr.md diff --git a/text/0000-macro-metavar-expr.md b/text/0000-macro-metavar-expr.md new file mode 100644 index 00000000000..7abdfc12533 --- /dev/null +++ b/text/0000-macro-metavar-expr.md @@ -0,0 +1,332 @@ +- Feature Name: `macro_metavar_expr` +- Start Date: 2021-01-23 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Add new syntax to declarative macros to give their authors easy access to +additional metadata about macro metavariables, such as the index, length, or +count of macro repetitions. + +# Motivation +[motivation]: #motivation + +Macros with repetitions often expand to code that needs to know or could +benefit from knowing how many repetitions there are, or which repetition is +currently being expanded. Consider the example macro used in the guide to +introduce the concept of macro repetitions: building a vector, recreating the +`vec!` macro from the standard library: + +``` +macro_rules! vec { + ( $( $x:expr ),* ) => { + { + let mut temp_vec = Vec::new(); + $( + temp_vec.push($x); + )* + temp_vec + } + }; +} +``` + +This would be more efficient if it could use `Vec::with_capacity` to +preallocate the vector with the correct length. However, there is no standard +facility in declarative macros to achieve this, as there is no way to obtain +the *number* of repetitions of `$x`. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +The [example `vec` macro defininition in the guide][guide-vec] could be made +more efficient if it could use `Vec::with_capacity` to pre-allocate a vector +with the correct capacity. To do this, we need to know the number of +repetitions. + +[guide-vec]: https://doc.rust-lang.org/book/ch19-06-macros.html#declarative-macros-with-macro_rules-for-general-metaprogramming + +Metadata about metavariables, like the number of repetitions, can be accessed +using **metavariable expressions**. The metavariable expression for the +count of the number of repetitions of a metavariable `x` is `${count(x)}`, so +we can improve the `vec` macro as follows: + +``` +#[macro_export] +macro_rules! vec { + ( $( $x:expr ),* ) => { + { + let mut temp_vec = Vec::with_capacity(${count(x)}); + $( + temp_vec.push($x); + )* + temp_vec + } + }; +} +``` + +The following metavariable expressions are available: + +| Expression | Meaning | +|----------------------------|------------| +| `${count(ident)}` | The number of times `$ident` repeats in total. | +| `${count(ident, depth)}` | The number of times `$ident` repeats at up to `depth` nested repetition depths. | +| `${index()}` | The current index of the inner-most repetition. | +| `${index(depth)}` | The current index of the nested repetition at `depth` steps out. | +| `${length()}` | The length of the inner-most repetition. | +| `${length(depth)}` | The length of the nested repetition at `depth` steps out. | +| `${ignore(ident)}` | Binds `$ident` for repetition, but expands to nothing. | +| `$$` | Expands to a single `$`, for removing ambiguity in recursive macro definitions. | + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +Metavariable expressions in declarative macros provide expansions for +information about metavariables that are otherwise not easily obtainable. + +This is a backwards-compatible change as both `$$` and `${ .. }` are not +currently accepted as valid. + +The metavariable expressions added in this RFC are concerned with declarative +macro metavariable repetitions, and obtaining the information that the +compiler knows about the repetitions that are being processed. + +## Count + +The `${count(x)}` metavariable expression shown in the `vec` example in the +previous section counts the number of repetitions that will occur if the +identifier is used in a repetition at this depth. This means that in a macro +expansion like: + +``` + ${count(x)} $( $x )* +``` + +the expression `${count(x)}` will expand to the number of times the `$( $x )*` +repetition will repeat. + +If repetitions are nested, then an optional depth parameter can be used to +limit the number of nested repetitions that are counted. For example, a macro +expansion like: + +``` + ${count(x, 1)} ${count(x, 2)} ${count(x, 3)} $( a $( b $( $x )* )* )* +``` + +The three values this expands to are the number of outer-most repetitions (the +number of times `a` would be generated), the sum of the number of middle +repetitions (the number of times `b` would be generated), and the total number +of repetitions of `$x`. + +## Index and length + +Within a repetition, the `${index()}` and `${length()}` metavariable +expressions give the index of the current repetition and the length of the +repetition (i.e., the number of times it will repeat). The index value ranges +from `0` to `length - 1`. + +For nested repetitions, the `${index()}` and `${length()}` metavariable +expressions expand to the inner-most index and length respectively. +If the `depth` parameter is specified, then the metavariable expression +expands to the index or length of the surrounding nested repetition, counting +outwards from the inner-most repetition. The expressions `${index()}` and +`${index(0)}` are equivalent. + +For example in the expression: + +``` + $( a $( b $( c $x ${index()}/${length()} ${index(1)}/${length(1)} ${index(2)}/${length(2)} )* )* )* +``` + +the first pair of values are the index and length of the inner-most +repetition, the second pair are the index and length of the middle +repetition, and the third pair are the index and length of the outer-most +repetition. + +## Ignore + +Sometimes it is desired to repeat an expansion the same number of times as a +metavariable repeats but without actually expanding the metavariable. It may +be possible to work around this by expanding the metavariable in an expression +like `{ $x ; 1 }`, where the expanded value of `$x` is ignored, but this +is only possible if what `$x` expands to is valid in this kind of expression. + +The `${ignore(ident)}` metavariable acts as if `ident` was used for the purposes +of repetition, but expands to nothing. This means a macro expansion like: + +``` + $( ${ignore(x)} a )* +``` + +will expand to a sequence of `a` tokens repeated the number of times that `x` repeats. + +## Dollar dollar + +Since metavariable expressions always apply during the expansion of the macro, +they cannot be used in recursive macro definitions. To allow recursive macro +definitions to use metavariable expressions, the `$$` expression expands to a +single `$` token. + +This is also necessary for unambiguously defining repetitions in nested +macros. For example, this resolves [issue 35853], as the example in +that issue can be expressed as: + +``` +macro_rules! foo { + () => { + macro_rules! bar { + ( $$( $$any:tt )* ) => { $$( $$any )* }; + } + }; +} + +fn main() { foo!(); } +``` + +[issue 35853]: https://github.com/rust-lang/rust/issues/35853 + +## Larger example + +For a larger example of these metavariable expressions in use, consider the +following macro that operates over three nested repetitions: + +``` +macro_rules! example { + ( $( [ $( ( $( $x:ident )* ) )* ] )* ) => { + counts = (${count(x, 1)}, ${count(x, 2)}, ${count(x)}) + nested: + $( + indexes = (${index()}/${length()}) + counts = (${count(x, 1)}, ${count(x)}) + nested: + $( + indexes = (${index(1)}/${length(1)}, ${index()}/${length()}) + counts = (${count(x)}) + nested: + $( + indexes = (${index(2)}/${length(2)}, ${index(1)}/${length(1)}, ${index()}/${length()}) + ${ignore(x)} + )* + )* + )* + }; +} +``` + +Given this input: +``` + example! { + [ ( A B C D ) ( E F G H ) ( I J ) ] + [ ( K L M ) ] + } +``` + +The macro would expand to: +``` + counts = (2, 4, 13) + nested: + indexes = (0/2) + counts = (3, 10) + nested: + indexes = (0/2, 0/3) + counts = (4) + nested: + indexes = (0/2, 0/3, 0/4) + indexes = (0/2, 0/3, 1/4) + indexes = (0/2, 0/3, 2/4) + indexes = (0/2, 0/3, 3/4) + indexes = (0/2, 1/3) + counts = (4) + nested: + indexes = (0/2, 1/3, 0/4) + indexes = (0/2, 1/3, 1/4) + indexes = (0/2, 1/3, 2/4) + indexes = (0/2, 1/3, 3/4) + indexes = (0/2, 2/3) + counts = (2) + nested: + indexes = (0/2, 2/3, 0/2) + indexes = (0/2, 2/3, 1/2) + indexes = (1/2) + counts = (1, 3) + nested: + indexes = (1/2, 0/1) + counts = (3) + nested: + indexes = (1/2, 0/1, 0/3) + indexes = (1/2, 0/1, 1/3) + indexes = (1/2, 0/1, 2/3) +``` + + +# Drawbacks +[drawbacks]: #drawbacks + +This adds additional syntax to the language, that program authors must learn +and understand. We may not want to add more syntax. + +The author believes it is worth the overhead of new syntax, as even though +there exist workarounds for obtaining the information if it's really needed, +these workarounds are sometimes difficult to discover and naive +implementations can significantly harm compiler performance. + +Furthermore, the additional syntax is limited to declarative macros, and its +use should be limited to specific circustances where it is more understandable +than the alternatives. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This RFC proposes a modest but powerful extension to macro syntax that makes +it possible to obtain information that the compiler already knows, but +requires inefficient and complex techniques to obtain in the macro. + +The original proposal was for a shorter syntax to provide the count of +repetitions: `$#ident`. During discussions of this syntax, it became clear +that it was not obvious as to which number this referred to: the count of +repetitions at this level, or the length of the current repetition. It also +does not provide a way to discover counts or lengths for other repetition +depths. There was also interest in being able to discover the index of the +current repetition, and the `#` character had been used in similar proposals +for that. There was some reservation expressed for the use of the `#` token +because of the cognitive burden of another sigil, and its common use in the +`quote!` macro. + +The meaning of the `depth` parameter in `index` and `count` originally +counted inwards from the outer-most nesting. This was changed to count +outwards from the inner-most nesting so that expressions can be copied +to a different nesting depth without needing to change them. + +# Prior art +[prior-art]: #prior-art + +Declarative macros with repetition are commonly used in Rust for things that +are implemented using variadic functions in other languages. Usually these +other languages provide mechanisms for finding the number of variadic +arguments, and it is a notable limitation that Rust does not. + +Scripting languages, like Bash, which use `$var` for variables, often use +similar `${...}` syntax for values based on variables: for example `${#var}` +is used for the length of `$var`. This means `${...}` expressions should not +seem too weird to developers familiar with these scripting languages. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +No unresolved questions at present. + +While more expressions are possible, expressions beyond those defined in this RFC are out-of-scope. + +# Future possibilities +[future-possibilities]: #future-possibilities + +The metavariable expression syntax (`${...}`) is purposefully generic, and may +be extended in future RFCs to anything that may be useful for the macro +expander to produce. + +The syntax `$[...]` is still invalid, and so remains available for any other +extensions which may come in the future and don't fit in with metavariable +expression syntax. From abfc9f295314c8c88f55787c2cec91c761e8434c Mon Sep 17 00:00:00 2001 From: Mark Juggurnauth-Thomas Date: Sat, 13 Mar 2021 11:05:39 +0000 Subject: [PATCH 2/6] clarify that count, index and length expand to integer literals --- text/0000-macro-metavar-expr.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/text/0000-macro-metavar-expr.md b/text/0000-macro-metavar-expr.md index 7abdfc12533..f2108af677e 100644 --- a/text/0000-macro-metavar-expr.md +++ b/text/0000-macro-metavar-expr.md @@ -105,8 +105,10 @@ expansion like: ${count(x)} $( $x )* ``` -the expression `${count(x)}` will expand to the number of times the `$( $x )*` -repetition will repeat. +the expression `${count(x)}` will expand to an integer literal equal to the +number of times the `$( $x )*` repetition will repeat. For example, if +the metavariable `$x` repeats four times then it will expand to the integer +literal `4`. If repetitions are nested, then an optional depth parameter can be used to limit the number of nested repetitions that are counted. For example, a macro @@ -126,7 +128,7 @@ of repetitions of `$x`. Within a repetition, the `${index()}` and `${length()}` metavariable expressions give the index of the current repetition and the length of the repetition (i.e., the number of times it will repeat). The index value ranges -from `0` to `length - 1`. +from `0` to `length - 1`, and the expanded values are integer literals. For nested repetitions, the `${index()}` and `${length()}` metavariable expressions expand to the inner-most index and length respectively. From 7ef6540c982868efbb145466f725bc20feb57005 Mon Sep 17 00:00:00 2001 From: Mark Juggurnauth-Thomas Date: Sat, 13 Mar 2021 11:18:18 +0000 Subject: [PATCH 3/6] add section about RFC 88 --- text/0000-macro-metavar-expr.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/text/0000-macro-metavar-expr.md b/text/0000-macro-metavar-expr.md index f2108af677e..e8d69e2fa25 100644 --- a/text/0000-macro-metavar-expr.md +++ b/text/0000-macro-metavar-expr.md @@ -315,6 +315,20 @@ similar `${...}` syntax for values based on variables: for example `${#var}` is used for the length of `$var`. This means `${...}` expressions should not seem too weird to developers familiar with these scripting languages. +A proposal for counting sequence repetitions was made in [RFC 88]. That RFC +proposed several options for additional syntax, however the issue was +postponed to after the 1.0 release. This RFC addresses the needs of RFC 88, +and also goes further, as it proposes a more general syntax useful for more +than just counting repetitions, such as obtaing the index of the current +repetition. Since the generated values are integer literals, it also +addresses the ability to index tuples in repetitions (using `tup.${index()}`), +which was noted as an omission in RFC 88. It's also not possible to implement +efficiently as a procedural macro, as the procedural macro would not have +access to the repetition counts without generating a sequence and then +counting it again. + +[RFC 88]: https://github.com/rust-lang/rfcs/pull/88 + # Unresolved questions [unresolved-questions]: #unresolved-questions From fb506f28c141dcb6c0a12097508c8da2b05b2ab5 Mon Sep 17 00:00:00 2001 From: Mark Juggurnauth-Thomas Date: Sat, 13 Mar 2021 11:29:41 +0000 Subject: [PATCH 4/6] Add description of alternative syntax options --- text/0000-macro-metavar-expr.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/text/0000-macro-metavar-expr.md b/text/0000-macro-metavar-expr.md index e8d69e2fa25..344b010338b 100644 --- a/text/0000-macro-metavar-expr.md +++ b/text/0000-macro-metavar-expr.md @@ -302,6 +302,15 @@ counted inwards from the outer-most nesting. This was changed to count outwards from the inner-most nesting so that expressions can be copied to a different nesting depth without needing to change them. +This RFC proposes using `${ ... }` as the delimiter for metavariable +expressions. Available alternatives are: +* `$[ ... ]`, e.g.: `$[count(value)]` +* `$:`, e.g. `$:count(value)` +* `$@`, e.g. `$@count(value)` +* `$!`, e.g. `$!count(value)` +* Another sigil, although `#` should be avoided to avoid clashes with the + `quote!` macro. + # Prior art [prior-art]: #prior-art @@ -345,4 +354,5 @@ expander to produce. The syntax `$[...]` is still invalid, and so remains available for any other extensions which may come in the future and don't fit in with metavariable -expression syntax. +expression syntax. Additionally, any symbol after `$` is also invalid, so +other sequences, such as `$@`, are available. From 843b9f82ecc31ac12da3ccc270adc06ea7314a6c Mon Sep 17 00:00:00 2001 From: Mark Juggurnauth-Thomas Date: Sat, 13 Mar 2021 11:48:43 +0000 Subject: [PATCH 5/6] Add section describing why new syntax is needed --- text/0000-macro-metavar-expr.md | 43 +++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/text/0000-macro-metavar-expr.md b/text/0000-macro-metavar-expr.md index 344b010338b..f3280afbb3a 100644 --- a/text/0000-macro-metavar-expr.md +++ b/text/0000-macro-metavar-expr.md @@ -311,6 +311,49 @@ expressions. Available alternatives are: * Another sigil, although `#` should be avoided to avoid clashes with the `quote!` macro. +## Why not a proc macro or built-in macro? + +To avoid extending the language with new syntax, we could consider writing +something that looks like a macro invocation, such as `count!(value)`, which +would be implemented as a procedural macro or built-in to the compiler. + +While this is compelling from a language simplicity perspective, it creates +some problems due to the way macro expansions are processed. During macro +transcription, other macro invocations are not evaluated, so in the macro: + +``` +macro_rules! example { + ($($x:ident),*) => count!(x) +} +``` + +During transcription, `example!(a, b, c)` would expand to `count!(x)`. At this +point, the knowledge of the metavariable `x` and its repetition is lost, and +no procedural macro or built-in macro would be able to work out the count. + +To workaround this we would need to re-expand the repetition +(`count!($($x),*)`, forcing the `count!` macro to re-parse and count the +repetitions. This is additional unnecessary work that this RFC seeks to +address. + +Another way to think of metavariable expressions is as "macro transcriber +directives". You can think of the macro transcriber as performing the +following operations: + +* `$var` => the value of `var` +* `$( ... ) ...` => a repetition + +This RFC adds two more: + +* `${ directive(args) }` => a special transcriber directive +* `$$` => `$` + +We could special-case certain macro invocations like `count!` during +transcription, but that feels like a worse solution. It would make it harder +to understand what the macro transcriber is going to do with arbitrary code +without remembering all of the special macros that don't work like other +macros. + # Prior art [prior-art]: #prior-art From b3b5009d1fb19c64405916128019481d83505a4c Mon Sep 17 00:00:00 2001 From: Mark Juggurnauth-Thomas Date: Tue, 16 Mar 2021 19:24:35 +0000 Subject: [PATCH 6/6] clarify that expanded literals are unsuffixed --- text/0000-macro-metavar-expr.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/text/0000-macro-metavar-expr.md b/text/0000-macro-metavar-expr.md index f3280afbb3a..0f3e8db7a83 100644 --- a/text/0000-macro-metavar-expr.md +++ b/text/0000-macro-metavar-expr.md @@ -105,10 +105,10 @@ expansion like: ${count(x)} $( $x )* ``` -the expression `${count(x)}` will expand to an integer literal equal to the -number of times the `$( $x )*` repetition will repeat. For example, if -the metavariable `$x` repeats four times then it will expand to the integer -literal `4`. +the expression `${count(x)}` will expand to an unsuffixed integer literal +equal to the number of times the `$( $x )*` repetition will repeat. For +example, if the metavariable `$x` repeats four times then it will expand to +the integer literal `4`. If repetitions are nested, then an optional depth parameter can be used to limit the number of nested repetitions that are counted. For example, a macro @@ -128,7 +128,8 @@ of repetitions of `$x`. Within a repetition, the `${index()}` and `${length()}` metavariable expressions give the index of the current repetition and the length of the repetition (i.e., the number of times it will repeat). The index value ranges -from `0` to `length - 1`, and the expanded values are integer literals. +from `0` to `length - 1`, and the expanded values are unsuffixed integer +literals so they are also suitable for tuple indexing. For nested repetitions, the `${index()}` and `${length()}` metavariable expressions expand to the inner-most index and length respectively.