\xHH escaping suggestions are wrong for characters that are too big #87397

SkiFire13 · 2021-07-23T08:38:40Z

Consider this code:

fn main() {
    b'字';
    b"字";
}

playground

The current output is:

error: non-ASCII character in byte constant
 --> src/main.rs:2:7
  |
2 |     b'字';
  |       ^^
  |       |
  |       byte constant must be ASCII
  |       help: use a \xHH escape for a non-ASCII byte: `\x5B57`

error: non-ASCII character in byte constant
 --> src/main.rs:3:7
  |
3 |     b"字";
  |       ^^
  |       |
  |       byte constant must be ASCII
  |       help: use a \xHH escape for a non-ASCII byte: `\x5B57`

error: could not compile `playground` due to 2 previous errors

The suggestions are incorrect: \x5B57 is not a valid escaped character, only the \x5B part is interpreted as an escaped character while the 57 part is interpreted as two normal characters.

In the case of byte characters the suggestion leads to a byte character literal with more than one character, thus in another compile error. However in the case of byte strings this will silently compile, even though that probably wasn't what the user wanted nor expected.

This happens in both the current stable and nightly compilers.

The text was updated successfully, but these errors were encountered:

nagisa · 2021-07-23T10:03:10Z

For the string variant we can still suggest using a sequence of escapes representing the unicode encoding: \xE5\xAD\x97 in this case.

The behaviour is also pretty poor for unicode code points that do fit inside a byte:

error: non-ASCII character in byte constant
 --> src/main.rs:2:15
  |
2 |     let x = b"µ";
  |               ^
  |               |
  |               byte constant must be ASCII
  |               help: use a \xHH escape for a non-ASCII byte: `\xB5`

but \xB5 is not a valid UTF-8 character, so the behaviour is likely to be incorrect.

5225225 · 2021-07-23T10:16:36Z

You probably don't want to assume UTF-8 here. (If the string was pure UTF-8, why are they using a byte string?). Or at least make mention of the suggestion being only correct if you want the UTF-8 encoding.

You'd also drop the applicability of the suggestion to MaybeIncorrect, because it's impossible to tell what they actually wanted.

FabianWolff · 2021-07-26T18:13:16Z

Yes, I guess it makes sense to suggest b"\xE5\xAD\x97" (i.e., the UTF-8 encoding) if the user wrote b"字", with Applicability::MaybeIncorrect and some text that questions whether this really is what was intended.

@rustbot claim

SkiFire13 added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 23, 2021

rustbot assigned FabianWolff Jul 26, 2021

FabianWolff mentioned this issue Jul 31, 2021

Fix invalid suggestions for non-ASCII characters in byte constants #87659

Merged

bors closed this as completed in 4380056 Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

\xHH escaping suggestions are wrong for characters that are too big #87397

\xHH escaping suggestions are wrong for characters that are too big #87397

SkiFire13 commented Jul 23, 2021 •

edited by rustbot

Loading

nagisa commented Jul 23, 2021

5225225 commented Jul 23, 2021

FabianWolff commented Jul 26, 2021

\xHH escaping suggestions are wrong for characters that are too big #87397

\xHH escaping suggestions are wrong for characters that are too big #87397

Comments

SkiFire13 commented Jul 23, 2021 • edited by rustbot Loading

nagisa commented Jul 23, 2021

5225225 commented Jul 23, 2021

FabianWolff commented Jul 26, 2021

SkiFire13 commented Jul 23, 2021 •

edited by rustbot

Loading