-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,55 +111,46 @@ pub(crate) fn validate(root: &SyntaxNode) -> Vec<SyntaxError> { | |
errors | ||
} | ||
|
||
// FIXME: kill duplication | ||
fn validate_literal(literal: ast::Literal, acc: &mut Vec<SyntaxError>) { | ||
fn unquote(text: &str, prefix_len: usize, end_delimiter: char) -> Option<&str> { | ||
text.rfind(end_delimiter).and_then(|end| text.get(prefix_len..end)) | ||
} | ||
|
||
let token = literal.token(); | ||
let text = token.text().as_str(); | ||
|
||
let mut push_err = |prefix_len, (off, err): (usize, unescape::EscapeError)| { | ||
let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len); | ||
acc.push(SyntaxError::new(err.into(), off)); | ||
}; | ||
Comment on lines
+122
to
+125
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really a big fan of this, but see my comments below for a possible solution. |
||
|
||
match token.kind() { | ||
BYTE => { | ||
if let Some(end) = text.rfind('\'') { | ||
if let Some(without_quotes) = text.get(2..end) { | ||
if let Err((off, err)) = unescape::unescape_byte(without_quotes) { | ||
let off = token.text_range().start() + TextUnit::from_usize(off + 2); | ||
acc.push(SyntaxError::new(err.into(), off)) | ||
} | ||
} | ||
if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) { | ||
push_err(2, e); | ||
} | ||
Comment on lines
+129
to
131
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is quite good already, but would it make sense to pull it out into a seperate function: fn validate_char(token: &SyntaxToken, prefix_len: usize, end_delimiter: char, acc: &mut Vec<SyntaxError>) -> {
let text = token.text().as_str();
if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
acc.push(SyntaxError::new(err.into(), off));
}
} and then the match case just becomes: BYTE => validate_char(&token, 2, '\'', acc), This would avoid the the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, only one addition, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, hadn't noticed they used different functions, but that makes sense :) This might require a type parameter for the function, are you comfortable with that or do you need a hint? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I am comfortable with that, just wanted our approval) |
||
} | ||
CHAR => { | ||
if let Some(end) = text.rfind('\'') { | ||
if let Some(without_quotes) = text.get(1..end) { | ||
if let Err((off, err)) = unescape::unescape_char(without_quotes) { | ||
let off = token.text_range().start() + TextUnit::from_usize(off + 1); | ||
acc.push(SyntaxError::new(err.into(), off)) | ||
} | ||
} | ||
if let Some(Err(e)) = unquote(text, 1, '\'').map(unescape::unescape_char) { | ||
push_err(1, e); | ||
} | ||
} | ||
BYTE_STRING => { | ||
if let Some(end) = text.rfind('\"') { | ||
if let Some(without_quotes) = text.get(2..end) { | ||
unescape::unescape_byte_str(without_quotes, &mut |range, char| { | ||
if let Err(err) = char { | ||
let off = range.start; | ||
let off = token.text_range().start() + TextUnit::from_usize(off + 2); | ||
acc.push(SyntaxError::new(err.into(), off)) | ||
} | ||
}) | ||
} | ||
if let Some(without_quotes) = unquote(text, 2, '"') { | ||
unescape::unescape_byte_str(without_quotes, &mut |range, char| { | ||
if let Err(err) = char { | ||
push_err(2, (range.start, err)); | ||
} | ||
}) | ||
Comment on lines
+139
to
+144
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could do the same here as described for char, e.g. pull into a separate function. I think that would be nice :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've stumbled upon a language limitation here. fn validate_byte_or_char_str_literal(..., impl FnOnce(&str, impl FnMut(...))) {} Notice the nested There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be quite a hack to do it with a dynamic dispatch here. fn validate_str_or_byte_str_literal<T>(
token: &SyntaxToken,
prefix_len: usize,
suffix_len: usize,
acc: &mut Vec<SyntaxError>,
unescape_fn: impl FnOnce(&str, &mut dyn FnMut(Range<usize>, Result<T, unescape::EscapeError>)),
) {
let text = token.text().as_str();
if let Some(without_quotes) = unquote(text, prefix_len, suffix_len) {
unescape_fn(without_quotes, &mut |range, char| {
if let Err(err) = char {
let off = token.text_range().start() + TextUnit::from_usize(range.start + prefix_len);
acc.push(SyntaxError::new(err.into(), off));
}
});
}
}
validate_str_or_byte_str_literal(
&token,
2,
1,
acc,
// we need to explicitly wrap this into two lambdas because cb is `&mut dyn FnMut(...)`
|s, cb| unescape::unescape_byte_str(s, &mut |range, byte| cb(range, byte)),
); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been unable make the above function reasonable, e.g. less lines of code than what it replaces, so lets just keep it at de-duplicating char and byte for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same opinion here 😏 |
||
} | ||
} | ||
STRING => { | ||
if let Some(end) = text.rfind('\"') { | ||
if let Some(without_quotes) = text.get(1..end) { | ||
unescape::unescape_str(without_quotes, &mut |range, char| { | ||
if let Err(err) = char { | ||
let off = range.start; | ||
let off = token.text_range().start() + TextUnit::from_usize(off + 1); | ||
acc.push(SyntaxError::new(err.into(), off)) | ||
} | ||
}) | ||
} | ||
if let Some(without_quotes) = unquote(text, 1, '"') { | ||
unescape::unescape_str(without_quotes, &mut |range, char| { | ||
if let Err(err) = char { | ||
push_err(1, (range.start, err)); | ||
} | ||
}) | ||
} | ||
} | ||
_ => (), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to pull this one out of the function, so it's ready for re-use when we get to #540 at some point :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad, @kiljacken I have one concern about this function. Why is it so that we do a linear search for the quote character backwards in the token text string but make a constant offset from its beginning instead?
I guess we should either do a constant offset from the begging and the end of the string or search for a quote character from both ends of the string but not the mixed approach...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to check the lexer source to be sure, bit using a constant offset, for everything that's not a raw string (due to the hash matching) should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiljacken I do think that constant offset from both ends should be viable too.
Though, I'd like to write tests for this.
Yes, raw strings should have a separate
unquote_raw_string()
logic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the supposition that the offset from the end of the string is constant.
If fact it is not (maybe it is a bug, I am not sure).
But there is one test that fails when I put a debug assertion to check the precondition about that, which is this one:
https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_assists/src/assists/raw_string.rs#L245
That partial string actually becomes a STRING token where there is only one starting qoute and no ending one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad maybe you could clarify on whether this behaviour is intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Veetaha it's true that string literals are not guaranteed to be syntactically valid. The code should not panic for any string, but it doesn't have to report errors if the string itself is not a valid string literal.
This is because we intentionally make the parser and the lexer robust and capable of parsing completely invalid code, so something like
r##"
might be parsed as a raw string literal.