Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint against non-NFC items? #120723

Open
workingjubilee opened this issue Feb 6, 2024 · 1 comment
Open

Lint against non-NFC items? #120723

workingjubilee opened this issue Feb 6, 2024 · 1 comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-unicode Area: Unicode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@workingjubilee
Copy link
Member

Code

struct Mask();
const FAÇADE: Mask = Mask();
const FAÇADE: Mask = Mask();

Current output

error[E0428]: the name `FAÇADE` is defined multiple times
 --> src/lib.rs:3:1
  |
2 | const FAÇADE: Mask = Mask();
  | ---------------------------- previous definition of the value `FAÇADE` here
3 | const FAÇADE: Mask = Mask();
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `FAÇADE` redefined here
  |
  = note: `FAÇADE` must be defined only once in the value namespace of this module

Desired output

warning[EIEIO]: the name `FAÇADE` was normalized to `FAÇADE`
3 | const FAÇADE: Mask = Mask();
  |       ^^^^^^ `FAÇADE` defined here
  |
  = note: rustc applies Normalization Form C to identifiers

error[E0428]: the name `FAÇADE` is defined multiple times
 --> src/lib.rs:3:1
  |
2 | const FAÇADE: Mask = Mask();
  | ---------------------------- previous definition of the value `FAÇADE` here
3 | const FAÇADE: Mask = Mask();
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `FAÇADE` redefined here
  |
  = note: `FAÇADE` must be defined only once in the value namespace of this module

Rationale and extra context

Spinoff of #120697

Not the same as uncommon_codepoints! We NFC-normalize idents, as described in RFC #2457. In the (admittedly unlikely) case where someone actually includes an ident that is normalized to match another ident, this can result in a "you wrote (or more likely, mechanically emitted) something, then a nuance of that got silently ignored by the compiler, and now get compiler errors". Only a very, very unusual source file would want to even try to separately include both the NFD and NFC forms of an ident, and I think this would only happen due to machine-generated code or multiple splat-includes, so this is no harm done in reality. But "the byte strings for two idents can not match, yet still resolve to the same value" can still be surprising, especially to reason about in the machine-generation case where a programmer is likely to reason significantly about equating identifiers and strings, so we could probably emit a warning when we normalize an ident in a source file and it results in an actual difference, much like we emit warnings when we leave other data unused.

Other cases

No response

Rust Version

rustc 1.75.0 (82e1608df 2023-12-21)
binary: rustc
commit-hash: 82e1608dfa6e0b5569232559e3d385fea5a93112
commit-date: 2023-12-21
host: x86_64-unknown-linux-gnu
release: 1.75.0
LLVM version: 17.0.6

Anything else?

No response

@workingjubilee workingjubilee added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-unicode Area: Unicode labels Feb 6, 2024
@Jules-Bertholet
Copy link
Contributor

One issue that was brought up in the non_ascii_idents RFC discussion thread is that certain Vietnamese (and maybe other languages/scripts) input methods output non-NFC text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. A-unicode Area: Unicode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants