From 1572d66b408791394aa3b330efb99bc194107697 Mon Sep 17 00:00:00 2001 From: Scott McMurray Date: Sat, 6 Mar 2021 01:01:27 -0800 Subject: [PATCH 1/5] raw_keywords: `k#yeet` without an edition --- text/0000-raw-keywords.md | 255 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100644 text/0000-raw-keywords.md diff --git a/text/0000-raw-keywords.md b/text/0000-raw-keywords.md new file mode 100644 index 00000000000..2e8adce42da --- /dev/null +++ b/text/0000-raw-keywords.md @@ -0,0 +1,255 @@ +- Feature Name: raw_keywords +- Start Date: 2021-03-05 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + + + +Reserve `k#keyword` in edition 2021 and beyond as a general syntax for adding keywords mid-edition instead of needing speculative reservations. + +# Motivation +[motivation]: #motivation + + + +There were a few attempts to reserve keywords for the the 2018 edition. Some of those proved controversial, and the language team eventually [decided](https://github.com/rust-lang/rfcs/pull/2441#issuecomment-395256368) not to accept any reservations for not-yet-approved features: + +> [...] felt particularly strongly that up-front reservations are wrong and a mistake in the initial Edition proposal, basically for the reasons I've already outlined in the thread: they force up-front decisions about surface issues of features that are not yet fully proposed, let alone accepted or implemented. That just seems totally backwards and is going to keep leading to unworkable discussions. We both feel that the role of Editions here is that they can absorb any keyword-flags that have accumulated in the meantime. +> +> In all, there is certainly no consensus to merge this RFC as-is, and I think there are no objections to instead closing it, under the assumption that we'll add a keyword-flag mechanism (or something like it) as needed later. + +This RFC is thus a proposal to add that general mechanism. + +The other thing that was learned with the 2018 edition is that the period between editions is long enough that the normal "stability without stagnation" principle of "it can just wait for the next train" doesn't work. Instead, it encouraged rushing to try to get things in on time, which had negative quality of life consequences for many contributors. As such, it's important that an alternative mechanism be made available so that missing an edition train doesn't mean having to wait another 3 years -- even if that alternative has syntax that's slightly less nice until the next train. + +As an additional bonus, this gives a space in which experimental syntax can be implemented on nightly without risking breakage. In the past, this was sometimes done in conjunction with other keywords, for example `do catch { ... }` instead of just `catch { ... }` to avoid the grammar conflict with a struct initializer. With this RFC, it could instead have been implemented as `k#catch { ... }` directly without worry. + + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + + + +*Pretend the year is 2023 and Rust has just stabilized `trust_me { ... }` blocks as a clearer syntax for `unsafe { ... }` blocks. The blog post in which they stabilize might say something like this.* + +This release stabilizes "trust me" blocks! Newcomers to rust are often confused by the difference between `unsafe` functions and `unsafe` blocks, as they do very different things. So these do a better job of emphasizing that these blocks are the place in which you can call unsafe code. + +Because of Rust's commitment to its stability guarantees, these are available to edition 2021 code using the syntax `k#trust_me { ... do unsafe things here ... }` to avoid breaking hypothetical code using `trust_me` as a function/type/etc name. In another year when the next edition comes out on its usual train, `trust_me` will be a reserved keyword in it and the edition migration will remove the `k#` for you. But for now you'll need to keep it. + +*(This RFC is, of course, not actually proposing "trust me" blocks.)* + +## What code could I have written that this breaks? + +`k#keyword` is never valid rust code on its own, so this is only relevant inside calls to macros, where it will affect tokenization. + +For example, consider [this code](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f50aea0afcd1f65896335b6aa5cae88a) in the 2018 edition: +```rust +macro_rules! demo { + ( $x:tt ) => { "one" }; + ( $a:tt $b:tt $c:tt ) => { "three" }; +} + +fn main() { + dbg!(demo!(k#keyword)); + dbg!(demo!(r#keyword)); + dbg!(demo!(k#struct)); + dbg!(demo!(r#struct)); + dbg!(demo!(k #struct)); + dbg!(demo!(r #struct)); +} +``` + +It produces the following output: +```text +[src/main.rs:7] demo!(k # keyword) = "three" +[src/main.rs:8] demo!(r#keyword) = "one" +[src/main.rs:9] demo!(k # struct) = "three" +[src/main.rs:10] demo!(r#struct) = "one" +[src/main.rs:11] demo!(k # struct) = "three" +[src/main.rs:12] demo!(r # struct) = "three" +``` + +In the 2021 edition and beyond it will instead be +```text +[src/main.rs:7] demo!(k#keyword) = "one" +[src/main.rs:8] demo!(r#keyword) = "one" +[src/main.rs:9] demo!(k#struct) = "one" +[src/main.rs:10] demo!(r#struct) = "one" +[src/main.rs:11] demo!(k # struct) = "three" +[src/main.rs:12] demo!(r # struct) = "three" +``` + +So it will only affect you if you're making calls with all three of those tokens *directly* adjacent. The edition pre-migration fix will update such calls to add spaces around the `#` such that the called macro will continue to see three tokens. + +## How do I implement a feature that needs a new keyword? + +For a feature using a new keyword `foo`, follow these steps: + +1. Implement it in nightly as `k#foo`, ensuring that all uses of `k#foo` are feature-gated in the parsing code. +2. Test and debug the feature as you would any other feature. +3. Pause here until ready to stabilize. +4. Add an edition pre-migration fix to replace all uses of `foo` with `r#foo`. +5. Make it parse as both `foo` and `k#foo` in edition vNext. +6. Add an edition post-migration fix to replace all uses of `k#foo` with `foo`. +7. Be sure to reference the test for those steps in the stabilization report for FCP. + + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + + + +A new tokenizer rule is introduced: + +> RAW_KEYWORD : `k#` IDENTIFIER_OR_KEYWORD + +Unlike RAW_IDENTIFIER, this doesn't need the `crate`/`self`/`super`/`Self` exclusions, as those are all keywords anyway. + +Analogously to [raw identifiers](https://rust-lang.github.io/rfcs/2151-raw-identifiers.html#reference-level-explanation), +raw keywords are always interpreted as keywords and never as plain identifiers, regardless of context. They are also treated equivalent to a keyword that wasn't raw. + +For contextual keywords, that mean that a raw keyword is only accepted where it's being used as a keyword, not as an identifier. For example, `k#union Foo { x: i32, y: u32 }` is valid, but `fn k#union() {}` is not. + +## Edition migration support + +The pre-migration fix will look for the tokens "`k` `#` ident" in a macro call without whitespace between either pair, and will add a single space on either side of the `#`. + + +# Drawbacks +[drawbacks]: #drawbacks + + + +- This adds more ways of writing the same thing. +- This makes macro token rules even more complicated than they already were. +- This only works for keywords that will match the existing IDENTIFIER_OR_KEYWORD category. +- This is more complicated than just telling people to wait for the next edition. +- This cannot be done in the 2015 and 2018 editions, with the proposed regex. + + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + + + +There are a few fundamental differences between raw keywords and raw identifiers: + +- **It was important that old editions support raw identifiers, but old editions do not need to support raw keywords.** \ + Raw identifiers in 2015 were needed so that pre-migration fixes could be applied to rename `async` -> `r#async` separately from updating the edition number. There's no 2015 nor 2018 edition code that *needs* raw keywords, however. [Editions are meant to be adopted](https://github.com/nikomatsakis/rfcs/blob/edition-2021-or-bust/text/0000-edition-2021.md#editions-are-meant-to-be-adopted), so it's fine to expect actively-developed code that wants to write (necessarily) *new* code using new features to move to a new edition in order to do so. + +- **Raw identifiers can be forced on you by another crate, but raw keywords are up to you.** \ + If a crate you're using has a method named `r#crate`, then you're stuck using a raw identifier to call it (unless you fork the crate). But nothing going on in an external crate can force you to use a feature that needs a raw keyword. If you want to only use things once they're available in the new edition as full keywords, you can do that. + +- **We hope that code won't need raw identifiers, but expect people will use raw keywords.** \ + Part of the decision process for a new keyword involves looking at the impact it would have. That's not to say it's a controlling factor -- we don't need to pick [a](https://en.cppreference.com/w/cpp/keyword/co_await) suboptimal keyword just to avoid breakage -- but the goal is that is that it not create a pervasive issue. Whereas accepting a new feature implies that it's useful enough that many people will likely wish to use it immediately, despite the extra lexical wart. + +In concert, these push for a particular tradeoff: + +> **It's better for raw keywords to be nice on 2021 than for them to be supported on 2018** + +There *is* lexical space available even in 2015 that could be used: `r#$keyword` was brought up, for example. But the extra noise of that isn't worth it. (And while it's easy enough to type on a standard US keyboard, that's no longer true on others, such as Linux's UK international keyboard layout.) + + +# Prior art +[prior-art]: #prior-art + + + +This is patterned on [RFC #2151, `raw_identifiers`](https://rust-lang.github.io/rfcs/2151-raw-identifiers.html). + +Some scripting languages take the opposite approach and essentially reserve all unprefixed identifiers as keywords, requiring a sigil (such as `$foo`) to have it be interpreted as an identifier in an expression. This is clearly infeasible for rust, due to the extraordinary churn it would require. + +C reserves all identifiers starting with an underscore, and uses that along with `#define` to add features. For example, it added `_Bool`, and made that available as `bool` only when `#include ` is specified. Rust doesn't need this for types (as `i32` and friends are not keywords), but could add new syntax constructs as macros. + +C# releases new versions [irregularly](https://en.wikipedia.org/wiki/C_Sharp_%28programming_language%29#Versions), major versions of which may include source-breaking changes such as new keywords. Rust could decide to just roll editions more often instead of introducing features in the middle of them. + +C# also leverages contextual keywords heavily. For example, `await` is only a keyword inside functions using the `async` contextual keyword, so they could be introduced as non-breaking. This kind of contextual behaviour is more awkward for rust, which needs to be able to parse an `expr` to pass it to a macro. + +Python uses [*future statements*](https://docs.python.org/3/reference/simple_stmts.html#future) to allow use of the new features on a per-module basis before those feature become standard. Rust's `#![feature(foo)]` on nightly is similar here. + +Haskell has the [`LANGUAGE` pragma](https://ghc.readthedocs.io/en/8.0.2/glasgow_exts.html#language-pragma), which `ghc` also supports as command line parameters. This is again similar to Rust's `#![feature(foo)]` on nightly. + + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + + + +- What I'd put is probably ignorant of compiler implementation realities. +- I've probably missed something that needs specifying around macros. + + +# Future possibilities +[future-possibilities]: #future-possibilities + + + +- Since an edition fix that can do it is required anyway, it may be good to have a lint on by default that suggests removing superfluous `k#`s. + From 81774b5a0e3dcddeaeecefa74ac28a114ff64d31 Mon Sep 17 00:00:00 2001 From: Scott McMurray Date: Mon, 29 Mar 2021 14:05:33 -0700 Subject: [PATCH 2/5] Address 2015 & 2018 further Only the 2021 change is time-critical -- anything using already-reserved syntax space can be decided upon later. --- text/0000-raw-keywords.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/text/0000-raw-keywords.md b/text/0000-raw-keywords.md index 2e8adce42da..5a53dcc1375 100644 --- a/text/0000-raw-keywords.md +++ b/text/0000-raw-keywords.md @@ -178,7 +178,18 @@ In concert, these push for a particular tradeoff: > **It's better for raw keywords to be nice on 2021 than for them to be supported on 2018** -There *is* lexical space available even in 2015 that could be used: `r#$keyword` was brought up, for example. But the extra noise of that isn't worth it. (And while it's easy enough to type on a standard US keyboard, that's no longer true on others, such as Linux's UK international keyboard layout.) +There *is* lexical space available even in 201[58]that could be used: `r#$keyword` was brought up, for example. But the extra noise of that isn't worth it. (And while it's easy enough to type on a standard US keyboard, that's no longer true on others, such as Linux's UK international keyboard layout.) + + +## Something for the 2015 and 2018 editions + +As mentioned, it would be possible to support `r#$keyword` in 2015 and 2018 (or in 2021+) without it being a breaking change. + +This RFC, however, doesn't include that, as it's not urgent for the edition. + +It can be added in future, either for those editions only or for all editions, should experience with this change demonstrate that there are important-enough situations where code *needs* to use a new feature despite not having migrated to a modern edition. + +This is also a problem that lessens over time. Once we reach the year 2029, any code still using the 2021 edition will be ancient, but would still be able to use `k#foo` to use new features which will only be true keywords in the 2030 edition. # Prior art From a1d590c2d5b18309502cea129685b2f24bc635d6 Mon Sep 17 00:00:00 2001 From: Scott McMurray Date: Mon, 29 Mar 2021 21:16:02 -0700 Subject: [PATCH 3/5] Add details about `k#pineapple` errors --- text/0000-raw-keywords.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/0000-raw-keywords.md b/text/0000-raw-keywords.md index 5a53dcc1375..accbd494c4a 100644 --- a/text/0000-raw-keywords.md +++ b/text/0000-raw-keywords.md @@ -135,6 +135,8 @@ raw keywords are always interpreted as keywords and never as plain identifiers, For contextual keywords, that mean that a raw keyword is only accepted where it's being used as a keyword, not as an identifier. For example, `k#union Foo { x: i32, y: u32 }` is valid, but `fn k#union() {}` is not. +In a rust version where `k#pineapple` is not a known keyword, it causes a tokenization error. (Like using [`r#$pineapple` does today](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=04f2f8d52487b03c93e2caa00446594e), and like how [`r#pineapple` did before raw identifiers were a thing](https://rust.godbolt.org/z/eeGvzMq8r).) + ## Edition migration support The pre-migration fix will look for the tokens "`k` `#` ident" in a macro call without whitespace between either pair, and will add a single space on either side of the `#`. From 9ec4f51defc2be85c911933f4aaf613dadca50b1 Mon Sep 17 00:00:00 2001 From: Scott McMurray Date: Tue, 30 Mar 2021 12:38:35 -0700 Subject: [PATCH 4/5] Update unresolved questions --- text/0000-raw-keywords.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0000-raw-keywords.md b/text/0000-raw-keywords.md index accbd494c4a..ce5739f7b90 100644 --- a/text/0000-raw-keywords.md +++ b/text/0000-raw-keywords.md @@ -237,8 +237,9 @@ Haskell has the [`LANGUAGE` pragma](https://ghc.readthedocs.io/en/8.0.2/glasgow_ - What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? --> -- What I'd put is probably ignorant of compiler implementation realities. -- I've probably missed something that needs specifying around macros. +To be decided in nightly: +- [ ] Is it worth adding `r#$foo` or similar to 2015 and 2018 to allow this on those editions? \ + (This isn't a breaking change, so can be decided at any point.) # Future possibilities From 6f4e762537a093951f98d0854c3eb2a2c737b978 Mon Sep 17 00:00:00 2001 From: Scott McMurray Date: Tue, 30 Mar 2021 20:54:13 -0700 Subject: [PATCH 5/5] Josh wants `r#$pineapple` --- text/0000-raw-keywords.md | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/text/0000-raw-keywords.md b/text/0000-raw-keywords.md index ce5739f7b90..479f905672d 100644 --- a/text/0000-raw-keywords.md +++ b/text/0000-raw-keywords.md @@ -141,6 +141,20 @@ In a rust version where `k#pineapple` is not a known keyword, it causes a tokeni The pre-migration fix will look for the tokens "`k` `#` ident" in a macro call without whitespace between either pair, and will add a single space on either side of the `#`. +## Support for past editions + +A new tokenizer rule is introduced: + +> RAW_KEYWORD : `r#$` IDENTIFIER_OR_KEYWORD + +This is supported for use in 2015 and 2018, as well as in 2021 for edition migration purposes. In 2024 and beyond, this will no longer be supported. + +However, it's strongly recommended that everyone migrate to a current edition rather than use `r#$`. For example, code wanting to use `async.await` should just move to the 2018 edition, not use `.r#$await`. + +Semantically, it will do the same as the equivalent `k#`, just with different syntax. + +There is a warn-by-default lint against using `r#$pineapple` in 2021, which will be included as a post-migration `--fix` lint, so that code using `foo.r$#await` in 2018 will be changed to using `foo.k#await` in 2021. + # Drawbacks [drawbacks]: #drawbacks @@ -178,20 +192,9 @@ There are a few fundamental differences between raw keywords and raw identifiers In concert, these push for a particular tradeoff: -> **It's better for raw keywords to be nice on 2021 than for them to be supported on 2018** - -There *is* lexical space available even in 201[58]that could be used: `r#$keyword` was brought up, for example. But the extra noise of that isn't worth it. (And while it's easy enough to type on a standard US keyboard, that's no longer true on others, such as Linux's UK international keyboard layout.) - - -## Something for the 2015 and 2018 editions - -As mentioned, it would be possible to support `r#$keyword` in 2015 and 2018 (or in 2021+) without it being a breaking change. - -This RFC, however, doesn't include that, as it's not urgent for the edition. - -It can be added in future, either for those editions only or for all editions, should experience with this change demonstrate that there are important-enough situations where code *needs* to use a new feature despite not having migrated to a modern edition. +> **It's better for raw keywords to be nice on 2021 than for them to be consistent with 2015** -This is also a problem that lessens over time. Once we reach the year 2029, any code still using the 2021 edition will be ancient, but would still be able to use `k#foo` to use new features which will only be true keywords in the 2030 edition. +Arguably they never *should* be used in 2015 (or even in 2018, since there are no features planned to use this before 2021 stabilizes), as it's always better to move to the newest-available edition before adopting new features, but they're available with a worse syntax there for completeness. # Prior art @@ -237,9 +240,7 @@ Haskell has the [`LANGUAGE` pragma](https://ghc.readthedocs.io/en/8.0.2/glasgow_ - What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? --> -To be decided in nightly: -- [ ] Is it worth adding `r#$foo` or similar to 2015 and 2018 to allow this on those editions? \ - (This isn't a breaking change, so can be decided at any point.) +None # Future possibilities