-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834
Conversation
…e_literal() function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
While already a nice cleanup, I left a few comments that could improve it, especially with regards to future use :)
fn unquote(text: &str, prefix_len: usize, end_delimiter: char) -> Option<&str> { | ||
text.rfind(end_delimiter).and_then(|end| text.get(prefix_len..end)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to pull this one out of the function, so it's ready for re-use when we get to #540 at some point :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad, @kiljacken I have one concern about this function. Why is it so that we do a linear search for the quote character backwards in the token text string but make a constant offset from its beginning instead?
I guess we should either do a constant offset from the begging and the end of the string or search for a quote character from both ends of the string but not the mixed approach...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to check the lexer source to be sure, bit using a constant offset, for everything that's not a raw string (due to the hash matching) should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiljacken I do think that constant offset from both ends should be viable too.
Though, I'd like to write tests for this.
Yes, raw strings should have a separate unquote_raw_string()
logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the supposition that the offset from the end of the string is constant.
If fact it is not (maybe it is a bug, I am not sure).
But there is one test that fails when I put a debug assertion to check the precondition about that, which is this one:
https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_assists/src/assists/raw_string.rs#L245
That partial string actually becomes a STRING token where there is only one starting qoute and no ending one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad maybe you could clarify on whether this behaviour is intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Veetaha it's true that string literals are not guaranteed to be syntactically valid. The code should not panic for any string, but it doesn't have to report errors if the string itself is not a valid string literal.
This is because we intentionally make the parser and the lexer robust and capable of parsing completely invalid code, so something like r##"
might be parsed as a raw string literal.
let mut push_err = |prefix_len, (off, err): (usize, unescape::EscapeError)| { | ||
let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len); | ||
acc.push(SyntaxError::new(err.into(), off)); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a big fan of this, but see my comments below for a possible solution.
if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) { | ||
push_err(2, e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite good already, but would it make sense to pull it out into a seperate function:
fn validate_char(token: &SyntaxToken, prefix_len: usize, end_delimiter: char, acc: &mut Vec<SyntaxError>) -> {
let text = token.text().as_str();
if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
acc.push(SyntaxError::new(err.into(), off));
}
}
and then the match case just becomes:
BYTE => validate_char(&token, 2, '\'', acc),
This would avoid the the push_err
lambda, and it would play nicely with #540.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, only one addition, unescape::unescape_byte()
should also be forwarded as a callback function to validate_char()
this way the function will become generic and will have 5 parameters. If you are okay with that, tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, hadn't noticed they used different functions, but that makes sense :) This might require a type parameter for the function, are you comfortable with that or do you need a hint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am comfortable with that, just wanted our approval)
if let Some(without_quotes) = unquote(text, 2, '"') { | ||
unescape::unescape_byte_str(without_quotes, &mut |range, char| { | ||
if let Err(err) = char { | ||
push_err(2, (range.start, err)); | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do the same here as described for char, e.g. pull into a separate function. I think that would be nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've stumbled upon a language limitation here.
Since here we would have a second level higher order function
something like
fn validate_byte_or_char_str_literal(..., impl FnOnce(&str, impl FnMut(...))) {}
Notice the nested impl FnMut(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be quite a hack to do it with a dynamic dispatch here.
unescape::unescape_str(&str, impl FnMut(Range<usize>, Result<u8, EscapeError>))
This function expects the second argument (callback) to be Sized
.
This means we would end up with the following hack:
fn validate_str_or_byte_str_literal<T>(
token: &SyntaxToken,
prefix_len: usize,
suffix_len: usize,
acc: &mut Vec<SyntaxError>,
unescape_fn: impl FnOnce(&str, &mut dyn FnMut(Range<usize>, Result<T, unescape::EscapeError>)),
) {
let text = token.text().as_str();
if let Some(without_quotes) = unquote(text, prefix_len, suffix_len) {
unescape_fn(without_quotes, &mut |range, char| {
if let Err(err) = char {
let off = token.text_range().start() + TextUnit::from_usize(range.start + prefix_len);
acc.push(SyntaxError::new(err.into(), off));
}
});
}
}
validate_str_or_byte_str_literal(
&token,
2,
1,
acc,
// we need to explicitly wrap this into two lambdas because cb is `&mut dyn FnMut(...)`
|s, cb| unescape::unescape_byte_str(s, &mut |range, byte| cb(range, byte)),
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been unable make the above function reasonable, e.g. less lines of code than what it replaces, so lets just keep it at de-duplicating char and byte for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same opinion here 😏
bors r=@kiljacken Thanks @Veetaha and @kiljacken ! |
2834: refactor(ra_syntax.validation): removed code duplication from validate_literal() r=kiljacken a=Veetaha Hi! This is my first ever contribution to this project. I've taken some dirty job from issue #223 This is a simple atomic PR to remove code duplication according to FIXME comment in the function that is the main focus of the further development. I just didn't want to mix refactoring with the implementation of new features... I am not sure whether you prefer such atomic PRs here or you'd rather have a single PR that contains all atomic commits inside of it? So if you want me to add all that validation in one PR I'll mark this one as WIP and update it when the work is finished, otherwise, I'll go with the option of creating separate PRs per each feature of validation of strings, numbers, and comments respectively. ### Comments about refactoring Yeah, reducing the duplication is quite hard here, extracting into stateless functions could be another option but the number of their arguments would be very big and repeated across char and string implementations so that just writing their types and names would become cumbersome. I tried the option of having everything captured implicitly in the closure but failed since rust doesn't have templated (or generic) closures as C++ does, this is needed because `unescape_byte*()` and `unescape_char|str()` have different return types... Maybe I am missing something here? I may be wrong because I am not enough experienced in Rust... Well, I am awaiting any kind of feedback! Co-authored-by: Veetaha <gerzoh1@gmail.com>
Build succeeded
|
@kiljacken, I was taking some time to study this project codebase and architecture. I didn't expect that this will be merged so soon, though I will create a new PR with amendments according to the review soon as you suggested |
I guess I needed to mark this one as [WIP], didn't I? |
No need to apologize, I just think bors mis-parsed Aleksey's command, so it got merged instead of allowing me to merge it (which wasn't really necessary as he already gave me r+ rights haha) |
Hi! This is my first ever contribution to this project.
I've taken some dirty job from issue #223
This is a simple atomic PR to remove code duplication according to FIXME comment in the function that is the main focus of the further development.
I just didn't want to mix refactoring with the implementation of new features...
I am not sure whether you prefer such atomic PRs here or you'd rather have a single PR that contains all atomic commits inside of it?
So if you want me to add all that validation in one PR I'll mark this one as WIP and update it when the work is finished, otherwise, I'll go with the option of creating separate PRs per each feature of validation of strings, numbers, and comments respectively.
Comments about refactoring
Yeah, reducing the duplication is quite hard here, extracting into stateless functions could be another option but the number of their arguments would be very big and repeated across char and string implementations so that just writing their types and names would become cumbersome.
I tried the option of having everything captured implicitly in the closure but failed since rust doesn't have templated (or generic) closures as C++ does, this is needed because
unescape_byte*()
andunescape_char|str()
have different return types...Maybe I am missing something here? I may be wrong because I am not enough experienced in Rust...
Well, I am awaiting any kind of feedback!