Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error for too many trailing #s in raw string literals could be improved #60762

Closed
fbstj opened this issue May 12, 2019 · 18 comments · Fixed by #70522
Closed

error for too many trailing #s in raw string literals could be improved #60762

fbstj opened this issue May 12, 2019 · 18 comments · Fixed by #70522
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.

Comments

@fbstj
Copy link
Contributor

fbstj commented May 12, 2019

for example (play)

let foo = r##"bar"###;

results in

error: expected one of `.`, `;`, `?`, or an operator, found `#`
 --> src/main.rs:2:25
  |
2 |     let foo = r##"bar"###;
  |                         ^ expected one of `.`, `;`, `?`, or an operator here

while too few has a much nicer error:

error: unterminated raw string
 --> src/main.rs:3:15
  |
3 |     let baz = r##"quxx"#;
  |               ^ unterminated raw string
  |
  = note: this raw string should be terminated with `"##`

This issue has been assigned to @rcoh via this comment.

@estebank estebank added A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. labels May 12, 2019
@hellow554
Copy link
Contributor

hellow554 commented May 13, 2019

Gonna work on it (my first work on the compiler internals, so please bear with me :) )

@estebank
Copy link
Contributor

estebank commented May 13, 2019

Feel free to ask for help!


I feel both of these errors could be tweaked slightly:

error: expected one of `.`, `;`, `?`, or an operator, found `#`
 --> src/main.rs:2:25
  |
2 |     let foo = r##"bar"###;
  |                --       ^
  |                |        |
  |                |        the raw string needs 2 trailing `#`, but found 3
  |                |        help: remove this `#` (this should be a hidden suggestion for rustfix' sake)
  |                the raw string has 2 leading `#`

and

error: unterminated raw string
 --> src/main.rs:3:15
  |
3 |     let baz = r##"quxx"#;
  |               ^        -
  |               |        |
  |               |        the raw string needs two trailing `#`, but found 1
  |               |        help: close the raw string: `##`
  |               unterminated raw string

The later might be harder to accomplish, needing to keep track of possible but incorrect closing spans (probably just looking for "# in the string being processed). This seems like a different enough task, and harder enough to be worthy of a separate PR.

@hellow554
Copy link
Contributor

hellow554 commented May 14, 2019

I indeed have a question @estebank. The fatal error itself is emitted at

pub fn expect_one_of(
but my logic is currently somewhere around
'r' => {
let start_bpos = self.pos;
self.bump();
let mut hash_count: u16 = 0;
while self.ch_is('#') {
if hash_count == 65535 {
let bpos = self.next_pos;
self.fatal_span_(start_bpos,
bpos,
"too many `#` symbols: raw strings may be \
delimited by up to 65535 `#` symbols").raise();
}
self.bump();
hash_count += 1;
}
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
} else if !self.ch_is('"') {
let last_bpos = self.pos;
let curr_char = self.ch.unwrap();
self.fatal_span_char(start_bpos,
last_bpos,
"found invalid character; only `#` is allowed \
in raw string delimitation",
curr_char).raise();
}
self.bump();
let content_start_bpos = self.pos;
let mut content_end_bpos;
let mut valid = true;
'outer: loop {
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
}
// if self.ch_is('"') {
// content_end_bpos = self.pos;
// for _ in 0..hash_count {
// self.bump();
// if !self.ch_is('#') {
// continue 'outer;
let c = self.ch.unwrap();
match c {
'"' => {
content_end_bpos = self.pos;
for _ in 0..hash_count {
self.bump();
if !self.ch_is('#') {
continue 'outer;
}
}
break;
}
'\r' => {
if !self.nextch_is('\n') {
let last_bpos = self.pos;
self.err_span_(start_bpos,
last_bpos,
"bare CR not allowed in raw string, use \\r \
instead");
valid = false;
}
}
_ => (),
}
self.bump();
}
self.bump();
let id = if valid {
self.name_from_to(content_start_bpos, content_end_bpos)
} else {
Symbol::intern("??")
};
let suffix = self.scan_optional_raw_name();
Ok(token::Literal(token::StrRaw(id, hash_count), suffix))
}

I don't see how I can add labels to that error message without actually creating my own fatal which would most likely will end up in code duplication.

Solutions I see:

  1. Emit my own fatal which is very similar to the one already emitted
    * downside would be code duplication and increased complexity
    * plus would be, same error message as before
  2. Emit a warning with the suggested solution
    * downside, more text,
    * plus, very little additional code needed
  3. Emit a different error than we do now
    * downside, another error, which might ne not suitable, because the current one is already very good
    * plus, very little additional code needed

This is what my solution currently looks like.

error: The raw string needs 2 trailing `#`, but found 3
  --> /home/op/me/rust2/src/test/ui/parser/raw/raw-literal-too-many.rs:2:26
   |
LL |     let _foo = r##"bar"###;
   |                          ^ remove this `#`

error: expected one of `.`, `;`, `?`, or an operator, found `#`
  --> /home/op/me/rust2/src/test/ui/parser/raw/raw-literal-too-many.rs:2:26
   |
LL |     let _foo = r##"bar"###;
   |                          ^ expected one of `.`, `;`, `?`, or an operator here

error: aborting due to 2 previous errors

@hellow554
Copy link
Contributor

hellow554 commented May 14, 2019

In addition to too many #, I was looking at the case where there are less # than expected, but not in the same line, but over a lot of lines (in my newly added testcase there are 18 lines now).
I would like to improve that error message as well, something like:

error: unterminated raw string
  --> $DIR/raw-string-long.rs:2:13
   |
LL |     let a = r##"This
   |             ^
   |             started here with 2 `#`
...
   |
LL |        ends"#;
   |             ^ ended here with only 1 `#`
   |             help: close the raw string with `"##`

error: aborting due to previous error

I'm not sure about how to do that, if I just can use a span from the very beginning to the end and it will leave out the intermediate lines automatically or do I have to do something about that?

On the other hand, is that even possible to detect? I don't know if one can find the end of this, except there an EOF. What do you think about this?

@hellow554
Copy link
Contributor

I looked at the first implementation of this cc #48546 @GuillaumeGomez . What do you think about this?

@GuillaumeGomez
Copy link
Member

That might make things more clear, indeed. 👍

@estebank
Copy link
Contributor

estebank commented May 14, 2019

I'm not sure about how to do that, if I just can use a span from the very beginning to the end and it will leave out the intermediate lines automatically or do I have to do something about that?

@hellow554, you should be able to synthesize a span that covers the entire closing sigil ("####) where we look for it and break if we don't find it. We should hold a vec of spans that we add to every time we find them, because it is possible that we indeed have a raw string that contains "# multiple times and we can't assume that the first we find is correct. We could have some extra smarts in case we were writing something like r###"r##"r#""#"##"### or r###"r##"r#""#"#"# to avoid suggesting all three closing spans, but for now the naïve way should be fine.

When supplying a span that covers multiple lines, the cli output will show the first 5(?) and last two lines of the covered span and draw the ascii art pointing at the start and end. The output you envision would happen by providing independent spans, and it's what I would prefer.

For the too many # at the end, you could proactively look for them before the break, by following case 1 and using something like

let lo = self.span;
let mut too_many = false;
while self.ch_is('#') {
    too_many = true;
    self.bump();
}
if too_many {
    let sp = lo.to(self.span);
    let mut err = self.sess.span_diagnostic.struct_span_err(sp, "too many `#` when terminating raw string");
    err.span_label(sp, "too many `#` for raw string");
    err.hidden_span_suggestion(
        sp,
        "remove the unneeded `#`"
        String::new(),
        Applicability::MachineApplicable,
    );
    err.emit();
}

By doing this, the parser (the lexer, rather) will continue on its merry way and parse the rest of the file as if nothing had happened.

@hellow554
Copy link
Contributor

hellow554 commented May 15, 2019

Thanks estebank. That was very helpful indeed. I haven't thought of consuming the bad characters so that the lexer can do the rest for me. Very nice.
I'm a bit undecided where the journey goes. Currently I have this:

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |               ^^^^^^^^^^--
  |               |         |
  |               |         help: remove the unneeded `#`
  |               The raw string needs 2 trailing `#`, but found 4

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs 
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |  _____________^
3  | |     is
4  | |     a
5  | |     very
...  |
19 | |     lines
20 | |     "###;
   | |        ^
   | |        |
   | |________help: remove the unneeded `#`
   |          The raw string needs 2 trailing `#`, but found 3

This is nice and looks very neat :) This is not a hidden suggestion as you see, is that okay for you as well? I mean, yes, it's kind of obvious which ones to remove, but I also like the expressiveness here.

@hellow554
Copy link
Contributor

I also tried to add the text "The raw string has {} leading #", but it looks very crowded to me, when the raw string only is one line.

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |               ^-- The raw string has 2 leading `#`
   |  _____________|
   | |
3  | |     is
4  | |     a
5  | |     very
...  |
19 | |     lines
20 | |     "###;
   | |        ^
   | |        |
   | |________help: remove the unneeded `#`
   |          The raw string needs 2 trailing `#`, but found 3

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |               ^--^^^^^^^--
  |               ||        |
  |               ||        help: remove the unneeded `#`
  |               |The raw string has 2 leading `#`
  |               The raw string needs 2 trailing `#`, but found 4

Should I add an check if the raw string only spans over one line? If yes, how to do that? ^^

@estebank
Copy link
Contributor

estebank commented May 15, 2019

There are two options, either we don't use the full string's span (showing only the start and the end), or use is_multiline to provide different output depending on the visual result.

I think a good output using the former approach would be:

error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |                --     ^^^^
  |                |      |
  |                |      ...but it's closed with 4
  |                the raw string has 2 leading `#`...

With a hidden suggestion pointing at the last two # to avoid clutter.

This is not a hidden suggestion as you see, is that okay for you as well? I mean, yes, it's kind of obvious which ones to remove, but I also like the expressiveness here.

I'm ok with using visible suggestions when possible, but for borderline cases where the course of action is obvious and the output would be too cluttered, I'd lean towards hiding the suggestion. They will still be visible in VSCode and actionable by rustfix.

@hellow554
Copy link
Contributor

hellow554 commented May 16, 2019

That looks very promising!

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |               ^-- The raw string has 2 leading `#`...
   |  _____________|
   | |
3  | |     is
4  | |     a
5  | |     very
...  |
19 | |     lines
20 | |     "###;
   | |______--^
   |        |
   |        ...but is closed with 3.
   = help: remove the unneeded `#`

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |               ^--^^^^^----
  |                |      |
  |                |      ...but is closed with 4.
  |                The raw string has 2 leading `#`...
  = help: remove the unneeded `#`

The only thing is (that bugs me a little bit) are the ^^^ between the ----.

grafik
but I haven't found a way to get rid of them. I think it's okay, but they are not needed IMHO. Time to work on the other suggestion!

Reducing the error span to only the ending `# symbols does not look nice for multi line string literals.

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:23
  |
2 |     let foo = r##"bar"####; //~ERROR too many `#` when terminating raw string
  |                --     ^^^^ ...but is closed with 4.
  |                |
  |                The raw string has 2 leading `#`...
  = help: remove the unneeded `#`

error: aborting due to previous error

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs 
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:20:6
   |
2  |     let a = r##"This
   |              -- The raw string has 2 leading `#`...
...
20 |     "###; //~ERROR too many `#` when terminating raw string"
   |      ^^^ ...but is closed with 3.
   = help: remove the unneeded `#`

error: aborting due to previous error

Therefore I'm afraid that's not an option. Any other way to get a nice result?

@hellow554
Copy link
Contributor

Wow.. TIL the code for a raw string, aka r#"foo"# and a raw byte string aka br#foo"# are actually two different code paths... Let me try to merge them :/ (

fn scan_raw_byte_string(&mut self) -> token::Lit {
let start_bpos = self.pos;
self.bump();
let mut hash_count = 0;
while self.ch_is('#') {
if hash_count == 65535 {
let bpos = self.next_pos;
self.fatal_span_(start_bpos,
bpos,
"too many `#` symbols: raw byte strings may be \
delimited by up to 65535 `#` symbols").raise();
}
self.bump();
hash_count += 1;
}
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
} else if !self.ch_is('"') {
let pos = self.pos;
let ch = self.ch.unwrap();
self.fatal_span_char(start_bpos,
pos,
"found invalid character; only `#` is allowed in raw \
string delimitation",
ch).raise();
}
self.bump();
let content_start_bpos = self.pos;
let mut content_end_bpos;
'outer: loop {
match self.ch {
None => {
self.fail_unterminated_raw_string(start_bpos, hash_count);
}
Some('"') => {
content_end_bpos = self.pos;
for _ in 0..hash_count {
self.bump();
if !self.ch_is('#') {
continue 'outer;
}
}
break;
}
Some(c) => {
if c > '\x7F' {
let pos = self.pos;
self.err_span_char(pos, pos, "raw byte string must be ASCII", c);
}
}
}
self.bump();
}
self.bump();
token::ByteStrRaw(self.name_from_to(content_start_bpos, content_end_bpos), hash_count)
}
vs
'r' => {
let start_bpos = self.pos;
self.bump();
let mut hash_count: u16 = 0;
while self.ch_is('#') {
if hash_count == 65535 {
let bpos = self.next_pos;
self.fatal_span_(start_bpos,
bpos,
"too many `#` symbols: raw strings may be \
delimited by up to 65535 `#` symbols").raise();
}
self.bump();
hash_count += 1;
}
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
} else if !self.ch_is('"') {
let last_bpos = self.pos;
let curr_char = self.ch.unwrap();
self.fatal_span_char(start_bpos,
last_bpos,
"found invalid character; only `#` is allowed \
in raw string delimitation",
curr_char).raise();
}
self.bump();
let content_start_bpos = self.pos;
let mut content_end_bpos;
let mut valid = true;
'outer: loop {
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
}
// if self.ch_is('"') {
// content_end_bpos = self.pos;
// for _ in 0..hash_count {
// self.bump();
// if !self.ch_is('#') {
// continue 'outer;
let c = self.ch.unwrap();
match c {
'"' => {
content_end_bpos = self.pos;
for _ in 0..hash_count {
self.bump();
if !self.ch_is('#') {
continue 'outer;
}
}
break;
}
'\r' => {
if !self.nextch_is('\n') {
let last_bpos = self.pos;
self.err_span_(start_bpos,
last_bpos,
"bare CR not allowed in raw string, use \\r \
instead");
valid = false;
}
}
_ => (),
}
self.bump();
}
self.bump();
let id = if valid {
self.name_from_to(content_start_bpos, content_end_bpos)
} else {
Symbol::intern("??")
};
let suffix = self.scan_optional_raw_name();
Ok(token::Literal(token::StrRaw(id, hash_count), suffix))
}
)
I'm really going forward in this issue and I really like the architecture of the compiler (although it's confusing sometimes, because sometimes you need to pass a &str but sometimes a String which is kind of weird and has no reason, but meh... 🙄)

If you could me assist further with the error messages I would be happy.

@estebank
Copy link
Contributor

If you could me assist further with the error messages I would be happy.

@hellow554 Of course! Feel free to reach out with whatever problem you might face.

As for the output, I think the following should be ok for all cases (even though it hides a big of content in between that isn't strictly relevant):

error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |                -- this raw string has 2 leading `#`
...
20 |      "###;
   |       ^^^ this raw string is closed with 3 trailing `#`, but should be 2
   = help: remove the unneeded `#`

@hellow554
Copy link
Contributor

@rustbot claim

@rustbot rustbot self-assigned this May 23, 2019
@hellow554
Copy link
Contributor

hellow554 commented May 23, 2019

Ummm.... not what I expected, but okay.... @pietroalbini can you take a look at this? ^^

@pietroalbini
Copy link
Member

@hellow554 that's the correct behavior: you can't be assigned to the issue directly because GitHub only allows organization members to be assigned to issues. The only way we have to get around that is to assign the bot itself and add a note in the top message that you're actually the one assigned.

@hellow554
Copy link
Contributor

@hellow554 that's the correct behavior: you can't be assigned to the issue directly because GitHub only allows organization members to be assigned to issues. The only way we have to get around that is to assign the bot itself and add a note in the top message that you're actually the one assigned.

Thanks for the clarification. I haven't seen the "This issue has been assigned to @hellow554 via this comment." in the top post. Thanks for that.

@rcoh
Copy link
Contributor

rcoh commented Mar 28, 2020

@rustbot claim

Working on this now, should have a diff in a few days

@rustbot rustbot self-assigned this Mar 28, 2020
rcoh added a commit to rcoh/rust that referenced this issue Mar 29, 2020
This diff improves error messages around raw strings in a few ways:
- Catch extra trailing `#` in the parser. This can't be handled in the lexer because we could be in a macro that actually expects another # (see test)
- Refactor & unify error handling in the lexer between ByteStrings and RawByteStrings
- Detect potentially intended terminators (longest sequence of "#*" is suggested)
Centril added a commit to Centril/rust that referenced this issue Mar 30, 2020
…rochenkov

Improve error messages for raw strings (rust-lang#60762)

This diff improves error messages around raw strings in a few ways:
- Catch extra trailing `#` in the parser. This can't be handled in the lexer because we could be in a macro that actually expects another # (see test)
- Refactor & unify error handling in the lexer between ByteStrings and RawByteStrings
- Detect potentially intended terminators (longest sequence of "#*" is suggested)

Fixes rust-lang#60762
cc @estebank who reviewed the original (abandoned) PR for the same ticket.
r? @Centril
Centril added a commit to Centril/rust that referenced this issue Mar 31, 2020
…rochenkov

Improve error messages for raw strings (rust-lang#60762)

This diff improves error messages around raw strings in a few ways:
- Catch extra trailing `#` in the parser. This can't be handled in the lexer because we could be in a macro that actually expects another # (see test)
- Refactor & unify error handling in the lexer between ByteStrings and RawByteStrings
- Detect potentially intended terminators (longest sequence of "#*" is suggested)

Fixes rust-lang#60762
cc @estebank who reviewed the original (abandoned) PR for the same ticket.
r? @Centril
Centril added a commit to Centril/rust that referenced this issue Apr 1, 2020
…rochenkov

Improve error messages for raw strings (rust-lang#60762)

This diff improves error messages around raw strings in a few ways:
- Catch extra trailing `#` in the parser. This can't be handled in the lexer because we could be in a macro that actually expects another # (see test)
- Refactor & unify error handling in the lexer between ByteStrings and RawByteStrings
- Detect potentially intended terminators (longest sequence of "#*" is suggested)

Fixes rust-lang#60762
cc @estebank who reviewed the original (abandoned) PR for the same ticket.
r? @Centril
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 1, 2020
Rollup of 6 pull requests

Successful merges:

 - rust-lang#70511 (Add `-Z dump-mir-dataflow` flag for dumping dataflow results visualization)
 - rust-lang#70522 (Improve error messages for raw strings (rust-lang#60762))
 - rust-lang#70547 (Add `can_unwind` field to `FnAbi`)
 - rust-lang#70591 (Ensure LLVM is in the link path for "fulldeps" tests)
 - rust-lang#70627 (Use place directly its copy)
 - rust-lang#70652 (Add git repo address to unstable book)

Failed merges:

 - rust-lang#70634 (Remove some reexports in `rustc_middle`)

r? @ghost
@bors bors closed this as completed in c739465 Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.
Projects
None yet
7 participants