Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\xHH escaping suggestions are wrong for characters that are too big #87397

Closed
SkiFire13 opened this issue Jul 23, 2021 · 3 comments · Fixed by #87659
Closed

\xHH escaping suggestions are wrong for characters that are too big #87397

SkiFire13 opened this issue Jul 23, 2021 · 3 comments · Fixed by #87659
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@SkiFire13
Copy link
Contributor

SkiFire13 commented Jul 23, 2021

Consider this code:

fn main() {
    b'字';
    b"字";
}

playground

The current output is:

error: non-ASCII character in byte constant
 --> src/main.rs:2:7
  |
2 |     b'字';
  |       ^^
  |       |
  |       byte constant must be ASCII
  |       help: use a \xHH escape for a non-ASCII byte: `\x5B57`

error: non-ASCII character in byte constant
 --> src/main.rs:3:7
  |
3 |     b"字";
  |       ^^
  |       |
  |       byte constant must be ASCII
  |       help: use a \xHH escape for a non-ASCII byte: `\x5B57`

error: could not compile `playground` due to 2 previous errors

The suggestions are incorrect: \x5B57 is not a valid escaped character, only the \x5B part is interpreted as an escaped character while the 57 part is interpreted as two normal characters.

In the case of byte characters the suggestion leads to a byte character literal with more than one character, thus in another compile error. However in the case of byte strings this will silently compile, even though that probably wasn't what the user wanted nor expected.

This happens in both the current stable and nightly compilers.

@SkiFire13 SkiFire13 added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 23, 2021
@nagisa
Copy link
Member

nagisa commented Jul 23, 2021

For the string variant we can still suggest using a sequence of escapes representing the unicode encoding: \xE5\xAD\x97 in this case.

The behaviour is also pretty poor for unicode code points that do fit inside a byte:

error: non-ASCII character in byte constant
 --> src/main.rs:2:15
  |
2 |     let x = b"µ";
  |               ^
  |               |
  |               byte constant must be ASCII
  |               help: use a \xHH escape for a non-ASCII byte: `\xB5`

but \xB5 is not a valid UTF-8 character, so the behaviour is likely to be incorrect.

@5225225
Copy link
Contributor

5225225 commented Jul 23, 2021

You probably don't want to assume UTF-8 here. (If the string was pure UTF-8, why are they using a byte string?). Or at least make mention of the suggestion being only correct if you want the UTF-8 encoding.

You'd also drop the applicability of the suggestion to MaybeIncorrect, because it's impossible to tell what they actually wanted.

@FabianWolff
Copy link
Contributor

Yes, I guess it makes sense to suggest b"\xE5\xAD\x97" (i.e., the UTF-8 encoding) if the user wrote b"字", with Applicability::MaybeIncorrect and some text that questions whether this really is what was intended.

@rustbot claim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants