Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ra_syntax.validation): removed code duplication from validate_literal() #2834

Merged
merged 1 commit into from
Jan 14, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 26 additions & 35 deletions crates/ra_syntax/src/validation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,55 +111,46 @@ pub(crate) fn validate(root: &SyntaxNode) -> Vec<SyntaxError> {
errors
}

// FIXME: kill duplication
fn validate_literal(literal: ast::Literal, acc: &mut Vec<SyntaxError>) {
fn unquote(text: &str, prefix_len: usize, end_delimiter: char) -> Option<&str> {
text.rfind(end_delimiter).and_then(|end| text.get(prefix_len..end))
}
Comment on lines +115 to +117
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to pull this one out of the function, so it's ready for re-use when we get to #540 at some point :)

Copy link
Contributor Author

@Veetaha Veetaha Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad, @kiljacken I have one concern about this function. Why is it so that we do a linear search for the quote character backwards in the token text string but make a constant offset from its beginning instead?
I guess we should either do a constant offset from the begging and the end of the string or search for a quote character from both ends of the string but not the mixed approach...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to check the lexer source to be sure, bit using a constant offset, for everything that's not a raw string (due to the hash matching) should be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiljacken I do think that constant offset from both ends should be viable too.
Though, I'd like to write tests for this.
Yes, raw strings should have a separate unquote_raw_string() logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the supposition that the offset from the end of the string is constant.
If fact it is not (maybe it is a bug, I am not sure).
But there is one test that fails when I put a debug assertion to check the precondition about that, which is this one:
https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_assists/src/assists/raw_string.rs#L245

That partial string actually becomes a STRING token where there is only one starting qoute and no ending one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad maybe you could clarify on whether this behaviour is intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Veetaha it's true that string literals are not guaranteed to be syntactically valid. The code should not panic for any string, but it doesn't have to report errors if the string itself is not a valid string literal.

This is because we intentionally make the parser and the lexer robust and capable of parsing completely invalid code, so something like r##" might be parsed as a raw string literal.


let token = literal.token();
let text = token.text().as_str();

let mut push_err = |prefix_len, (off, err): (usize, unescape::EscapeError)| {
let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
acc.push(SyntaxError::new(err.into(), off));
};
Comment on lines +122 to +125
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a big fan of this, but see my comments below for a possible solution.


match token.kind() {
BYTE => {
if let Some(end) = text.rfind('\'') {
if let Some(without_quotes) = text.get(2..end) {
if let Err((off, err)) = unescape::unescape_byte(without_quotes) {
let off = token.text_range().start() + TextUnit::from_usize(off + 2);
acc.push(SyntaxError::new(err.into(), off))
}
}
if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
push_err(2, e);
}
Comment on lines +129 to 131
Copy link
Contributor

@kiljacken kiljacken Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite good already, but would it make sense to pull it out into a seperate function:

fn validate_char(token: &SyntaxToken, prefix_len: usize, end_delimiter: char, acc: &mut Vec<SyntaxError>) -> {
    let text = token.text().as_str();
    if let Some(Err(e)) = unquote(text, 2, '\'').map(unescape::unescape_byte) {
        let off = token.text_range().start() + TextUnit::from_usize(off + prefix_len);
        acc.push(SyntaxError::new(err.into(), off));
    }
}

and then the match case just becomes:

BYTE => validate_char(&token, 2, '\'', acc),

This would avoid the the push_err lambda, and it would play nicely with #540.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, only one addition, unescape::unescape_byte() should also be forwarded as a callback function to validate_char() this way the function will become generic and will have 5 parameters. If you are okay with that, tho?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, hadn't noticed they used different functions, but that makes sense :) This might require a type parameter for the function, are you comfortable with that or do you need a hint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am comfortable with that, just wanted our approval)

}
CHAR => {
if let Some(end) = text.rfind('\'') {
if let Some(without_quotes) = text.get(1..end) {
if let Err((off, err)) = unescape::unescape_char(without_quotes) {
let off = token.text_range().start() + TextUnit::from_usize(off + 1);
acc.push(SyntaxError::new(err.into(), off))
}
}
if let Some(Err(e)) = unquote(text, 1, '\'').map(unescape::unescape_char) {
push_err(1, e);
}
}
BYTE_STRING => {
if let Some(end) = text.rfind('\"') {
if let Some(without_quotes) = text.get(2..end) {
unescape::unescape_byte_str(without_quotes, &mut |range, char| {
if let Err(err) = char {
let off = range.start;
let off = token.text_range().start() + TextUnit::from_usize(off + 2);
acc.push(SyntaxError::new(err.into(), off))
}
})
}
if let Some(without_quotes) = unquote(text, 2, '"') {
unescape::unescape_byte_str(without_quotes, &mut |range, char| {
if let Err(err) = char {
push_err(2, (range.start, err));
}
})
Comment on lines +139 to +144
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do the same here as described for char, e.g. pull into a separate function. I think that would be nice :)

Copy link
Contributor Author

@Veetaha Veetaha Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stumbled upon a language limitation here.
Since here we would have a second level higher order function
something like

fn validate_byte_or_char_str_literal(..., impl FnOnce(&str, impl FnMut(...))) {}

Notice the nested impl FnMut(...)

Copy link
Contributor Author

@Veetaha Veetaha Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be quite a hack to do it with a dynamic dispatch here.
unescape::unescape_str(&str, impl FnMut(Range<usize>, Result<u8, EscapeError>))
This function expects the second argument (callback) to be Sized.
This means we would end up with the following hack:

fn validate_str_or_byte_str_literal<T>(
    token: &SyntaxToken,
    prefix_len: usize,
    suffix_len: usize,
    acc: &mut Vec<SyntaxError>,
    unescape_fn: impl FnOnce(&str, &mut dyn FnMut(Range<usize>, Result<T, unescape::EscapeError>)),
) {
    let text = token.text().as_str();
    if let Some(without_quotes) = unquote(text, prefix_len, suffix_len) {
        unescape_fn(without_quotes, &mut |range, char| {
            if let Err(err) = char {
                let off = token.text_range().start() + TextUnit::from_usize(range.start + prefix_len);
                acc.push(SyntaxError::new(err.into(), off));
            }
        });
    }
}


validate_str_or_byte_str_literal(
    &token,
    2,
    1,
    acc,
    // we need to explicitly wrap this into two lambdas because cb is `&mut dyn FnMut(...)`
    |s, cb| unescape::unescape_byte_str(s, &mut |range, byte| cb(range, byte)),
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been unable make the above function reasonable, e.g. less lines of code than what it replaces, so lets just keep it at de-duplicating char and byte for now.

Copy link
Contributor Author

@Veetaha Veetaha Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same opinion here 😏

}
}
STRING => {
if let Some(end) = text.rfind('\"') {
if let Some(without_quotes) = text.get(1..end) {
unescape::unescape_str(without_quotes, &mut |range, char| {
if let Err(err) = char {
let off = range.start;
let off = token.text_range().start() + TextUnit::from_usize(off + 1);
acc.push(SyntaxError::new(err.into(), off))
}
})
}
if let Some(without_quotes) = unquote(text, 1, '"') {
unescape::unescape_str(without_quotes, &mut |range, char| {
if let Err(err) = char {
push_err(1, (range.start, err));
}
})
}
}
_ => (),
Expand Down