-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of RFC 2151, Raw Identifiers #48942
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @pnkfelix (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
3d9ecae
to
d644e65
Compare
☔ The latest upstream changes (presumably #48811) made this pull request unmergeable. Please resolve the merge conflicts. |
I don't think there's much else worth adding to this, unless it'd be better to also modify rustdoc in this PR? I don't even know where to begin with diagonstics, so, better to leave that to someone else. |
☔ The latest upstream changes (presumably #49051) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #48097) made this pull request unmergeable. Please resolve the merge conflicts. |
Help with this conflict? I'm not sure what's the right thing to do with this. |
|
src/libsyntax/feature_gate.rs
Outdated
@@ -455,6 +455,9 @@ declare_features! ( | |||
|
|||
// Parentheses in patterns | |||
(active, pattern_parentheses, "1.26.0", None, None), | |||
|
|||
// Raw identifiers allowing items with keyword names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowing keyword names to be used in arbitrary identifier positions, not only item names
src/libsyntax/parse/parser.rs
Outdated
@@ -793,6 +793,9 @@ impl<'a> Parser<'a> { | |||
return Err(err); | |||
} | |||
} | |||
if is_raw { | |||
self.sess.raw_identifier_spans.borrow_mut().push(self.span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure all identifiers go through parse_ident_common
, it's probably better to do in the lexer when an identifier is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh huh. I didn't catch that ParseSess
was available in the lexer too. Definitely a better place for it.
src/libsyntax/parse/token.rs
Outdated
@@ -273,17 +310,32 @@ impl Token { | |||
} | |||
} | |||
|
|||
pub fn ident(&self) -> Option<ast::Ident> { | |||
fn ident_common(&self, allow_raw: bool) -> Option<ast::Ident> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be just pub fn ident(&self) -> Option<(ast::Ident, /* is_raw */ bool)>
IMO.
It will avoid introducing all those many little helper functions.
(The fn nonraw_ident
vs fn ident
distinction also looks error-prone.)
20d0962bfe851db9ce3e19da0c7c7674a091c266 updates the LLVM submodule unintendedly. |
src/libsyntax/parse/token.rs
Outdated
@@ -203,6 +233,11 @@ impl Token { | |||
Token::Interpolated(Lrc::new((nt, LazyTokenStream::new()))) | |||
} | |||
|
|||
/// Recovers a `Token` from an `ast::Ident`. This creates a raw identifier if necessary. | |||
pub fn from_ast_ident(ident: ast::Ident) -> Token { | |||
Ident(ident, is_reserved_ident(ident)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function and all its uses are very suspicious.
It's used when we previously lost the "rawness" property and then we are trying to guess it - if it's a keyword, then it was a raw identifier otherwise not. We shouldn't do this guessing, we need to keep the bool
instead or use the "r#{}"
string encoding like in proc macros if keeping bool
is not possible for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem exists when we go from tokens to AST (including MetaItem
) and then have to back to tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the AST should this information be encoded? MetaItem
can be handled with a new struct field, but in the other use of this method was impl ToTokens for ast::Ident
(how is ToToken
used anyway?). In the issue, you mentioned that ast::Ident would be a bad place for an is_raw
parameter, because of how size could be an issue for it.
Putting it in the Symbol
doesn't work in this case, because Symbol
comparison is simply an integer comparison, and r#foo
should match foo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to leave this as is for now, it's possible that we won't have this roundtrip "Tokens -> AST -> Tokens" in the future and will just keep the original tokens.
I need to think about this a bit more.
src/libsyntax/diagnostics/plugin.rs
Outdated
@@ -82,10 +82,10 @@ pub fn expand_register_diagnostic<'cx>(ecx: &'cx mut ExtCtxt, | |||
token_tree.get(1), | |||
token_tree.get(2) | |||
) { | |||
(1, Some(&TokenTree::Token(_, token::Ident(ref code))), None, None) => { | |||
(1, Some(&TokenTree::Token(_, token::Ident(ref code, false))), None, None) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the rawness property is not relevant to diagnostics/plugin.rs, these false
s should probably be _
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a huge deal either way? These are internal macros, aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a huge deal either way?
No :)
src/libsyntax/parse/lexer/mod.rs
Outdated
return Ok(self.with_str_from(start, |string| { | ||
// FIXME: perform NFKC normalization here. (Issue #2253) | ||
if string == "_" { | ||
token::Underscore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with r#_
? It should be an error.
Could you add a test for it?
src/libsyntax/parse/lexer/mod.rs
Outdated
if is_raw_ident && ( | ||
ident.name == keywords::SelfValue.name() || | ||
ident.name == keywords::SelfType.name() || | ||
ident.name == keywords::Super.name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full list is Self
, self
, super
, extern
and crate
, see fn is_path_segment_keyword
.
(Well, there's also $crate
, but it's never produced by lexer.)
Questions from #48589:
Raw identifiers can appear in any context normal identifiers can appear. It looks like this PR already implements this behavior.
macro_rules! test_macro {
(a) => { ... };
(r#a) => { ... };
} This is an interesting question. |
src/libsyntax/feature_gate.rs
Outdated
@@ -452,6 +452,9 @@ declare_features! ( | |||
|
|||
// `use path as _;` and `extern crate c as _;` | |||
(active, underscore_imports, "1.26.0", Some(48216), None), | |||
|
|||
// Raw identifiers allowing keyword names to be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence seems unfinished.
r=me after addressing the new comments |
@@ -268,7 +268,7 @@ impl<'a> base::Resolver for Resolver<'a> { | |||
if k > 0 { | |||
tokens.push(TokenTree::Token(path.span, Token::ModSep).into()); | |||
} | |||
let tok = Token::Ident(segment.identifier); | |||
let tok = Token::from_ast_ident(segment.identifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this in from_ast_ident
, since it seems this is the best behavior in all these cases.
@bors r=petrochenkov |
@Lymia: 🔑 Insufficient privileges: Not in reviewers |
@petrochenkov I'm not sure what you want me to do here. :D I clearly can't do that. |
@bors r+ |
📌 Commit 57f9c4d has been approved by |
@bors: r- This unfortunately failed in a rollup, but fear not! I'm preemptively r-'ing this in case the rollup fails and we have to abandon it, but otherwise I've already fixed the issue. If the rollup lands it's best to not rebase/push new commits here so this PR will automatically get closed, but if the rollup ends up getting closed then those fixes can be folded into this PR and we can re-r+ |
What was the issue that |
Oh the builders run more tests than the default ones locally (typically those test suites fail very rarely), but I think this one could in theory have been detected by |
See issue #48589.
This implements most of the actual compiler part, though diagnostics and documentation are somewhat lacking. Namely,
rustdoc
and errors do not yet refer to items with keyword names using raw identifiers.