Add new lint: Mixed locale ident #7376

popzxc · 2021-06-19T11:00:59Z

This PR adds a new lint to check that the identifier name has multiple locales.

I think that it's not the thing that must happen normally, as it both makes hand-writing the code much harder, and can lead to confusing errors (rustc's built-in lint mixed_script_confusables can be implicitly shadowed, which makes it not really reliable).

stderr example:

error: multiple locales used in identifier `Blоck`: Cyrillic, Latin
  --> $DIR/mixed_locale_idents.rs:5:12
   |
LL | pub struct Blоck;
   |            ^^^^^
   |
   = note: `-D clippy::mixed-locale-idents` implied by `-D warnings`

error: multiple locales used in identifier `black_чёрный_黒い_काला`: Devanagari, Cyrillic, Hiragana, Han, Latin
  --> $DIR/mixed_locale_idents.rs:8:9
   |
LL |     let black_чёрный_黒い_काला = "good luck hand-writing it";
   |         ^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to 2 previous errors

Please write a short comment explaining your change (or "none" for internal only changes)

changelog: [`mixed_locale_idents`]

rust-highfive · 2021-06-19T11:01:02Z

r? @phansch

(rust-highfive has picked a reviewer for you, use r? to override)

xFrednet · 2021-06-19T18:28:13Z

What would the lint say about a value name like nutzer_zähler (German for user_counter)? Does the included library figure out that both words can originate from the same alphabet?

popzxc · 2021-06-19T18:40:44Z

What would the lint say about a value name like nutzer_zähler (German for user_counter)?

Good question and great test case! Added it, and yes, it does not spawn the lint (I'm not a specialist in unicode by any means, but AFAIK diaeresis doesn't move ä into any different locale).

flip1995

Ping @Manishearth since you maintain the unicode crates in question here. Do you think adding this dep to Clippy (with the same configuration as in rustc) is fine?

clippy_lints/Cargo.toml

Co-authored-by: Philipp Krones <hello@philkrones.com>

Manishearth · 2021-06-21T15:04:26Z

Strongly against this: I designed the lints that are in rustc for this and they were pretty carefully designed, over months of discussion. This particular thing was not considered to be a case worth addressing because idents like try_看 actually do make sense to have; and a core principle behind the design of those lints was to avoid malicious and confusing situations without actually harming legitimate use.

I don't understand why "handwriting the code is hard" is a concern at all; nor do I understand what you mean by the builtin getting implicitly shadowed.

Manishearth · 2021-06-21T15:08:52Z

Oh, I see what you mean by "shadowing". Yes, this was also considered when designing the rustc lint, and also determined to not be worth it: if someone has chosen to use Cyrillic in their code it's not worth it to try and nitpick "good usage" from "bad usage" because those can be pretty linked.

There are some other potential designs here: e.g. warning about mixed locales when it's only mixed-script confusables and only when not separated by underscores. But the current proposed design will catch too many legitimate use cases. Probably more than illegitimate ones, even, this kind of situation is super rare to trigger by accident.

It does sound like a reasonable style guideline to not mix scripts across underscores in a way that one of the scripts is purely using confusables.

popzxc · 2021-06-21T16:11:41Z

That's why it's in clippy and not in the compiler, isn't it? It's completely OK to 'allow' lints you find not suiting your style.

I guess not every project will be happy with the approach in rustc (at least I am personally have plans to actively dogfood this lint in projects I participate in).

Maybe just put this lint into 'pedantic' category?

Manishearth · 2021-06-21T16:54:49Z

That's why it's in clippy and not in the compiler, isn't it? It's completely OK to 'allow' lints you find not suiting your style.

Yes, but adding it to clippy still makes it a value judgement; and I do not consider this a good value judgement in this case. I'm not making this comment as a personal comment of style, I'm making this comment as a clippy maintainer who thinks that we need to be certain of the value judgements we are making when we add lints and as the person who did the research and design of the non ascii idents RFC. The lint as currently posed would likely warn on a lot of good code, more than the bad code.

Also 99% of the people who would enable this by default probably would also be okay with #[warn(non_ascii_idents)] IMO. Otherwise I would think this makes some sense as a restriction lint.

There's a way to do this better that I already proposed, which would work as a pedantic lint, though it's trickier to implement.

Manishearth · 2021-06-21T16:55:21Z

Another thing that I would be fine with adding would be a restriction lint that lets you allowlist the set of scripts allowed in your codebase based on a clippy config.

popzxc · 2021-06-21T17:16:07Z

Well, actually it makes sense.

Gonna implement.

popzxc · 2021-06-22T06:26:06Z

I think it's done.
You can see updated test examples here.

Does it look better now?

flip1995 · 2021-06-22T08:33:50Z

r? @Manishearth (reassigning since @phansch is taking a break)

Manishearth

this should probably be a pedantic lint

still kinda feel like a restriction lint with a list of scripts is the more extensible way to go.

Manishearth · 2021-06-22T16:20:33Z

clippy_lints/src/mixed_locale_idents.rs

+#[derive(Debug)]
+enum Case {
+    /// E.g. `SomeStruct`, delimiter is uppercase letter.
+    Camel,


Okay so a problem with this is that not all writing systems have uppercase. More thought needs to be put into how that will work.

But perhaps using only mixed script confusables without underscores in a character is enough for us for now. Idk.

popzxc · 2021-06-22T16:35:28Z

still kinda feel like a restriction lint with a list of scripts is the more extensible way to go.

I was going to implement it after this one gets merged, as script-detection dependency is introduced here.

I don't think that having two lints in one PR is going to make things easier, assuming that it's already kinda sensitive 😅

popzxc · 2021-06-22T16:42:47Z

However, this going farther and farther from my original intent, and become more and more complicated.
I guess it makes no sense to build a hardly-usable Frankenstein with blurry use-cases (initially I wanted to provide simple lint that is pretty restrictive, but now it going towards an alternative version of mixed_script_confusables, which is definitely not something I was aiming for).

I'll close this PR and will implement a restriction lint for locales instead.

Manishearth · 2021-06-22T16:44:40Z

Thank you!

New lint: `disallowed_script_idents` This PR implements a new lint to restrict locales that can be used in the code, as proposed in #7376. Current concerns / unresolved questions: - ~~Mixed usage of `script` (as a Unicode term) and `locale` (as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful.~~ `script` is now used everywhere. - ~~Having to mostly copy-paste `AllowedScript`. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to the `unicode_script::Script`, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC.~~ `unicode::Script` is used together with a filtering deserialize function. - Should we stick to the list of "recommended scripts" from [UAX #31](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts) in the configuration? *Please write a short comment explaining your change (or "none" for internal only changes)* changelog: ``[`disallowed_script_idents`]`` r? `@Manishearth`

popzxc added 7 commits June 19, 2021 13:07

Implement mixed locale ident lint

a30a38a

Add test for mixed locale ident lint

2c4968d

Rename mixed-locale-ident to mixed-locale-idents

2ffbc1d

Fix failing tests

46b16c9

'cargo dev' chore

2b9fa0a

Add ticks around identifier name in lint message

213c948

Improve wording

ee81a3d

rust-highfive assigned phansch Jun 19, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Jun 19, 2021

Add german example to UI test

b474a08

flip1995 reviewed Jun 21, 2021

View reviewed changes

clippy_lints/Cargo.toml Outdated Show resolved Hide resolved

Update clippy_lints/Cargo.toml

0b2bd3e

Co-authored-by: Philipp Krones <hello@philkrones.com>

popzxc added 7 commits June 22, 2021 07:42

Update implementation to check for identifier parts only

c9a42b3

Add test cases for the new lint behavior

9bed7d5

Add more test cases

7a86665

Update stderr file

c426816

Add complex example

d2449e5

Fix clippy issues

6984e1c

Add a few more tests

61d4792

rust-highfive assigned Manishearth and unassigned phansch Jun 22, 2021

Manishearth reviewed Jun 22, 2021

View reviewed changes

popzxc closed this Jun 22, 2021

popzxc deleted the mixed-locale-ident branch June 25, 2021 09:37

popzxc mentioned this pull request Jun 25, 2021

New lint: disallowed_script_idents #7400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new lint: Mixed locale ident #7376

Add new lint: Mixed locale ident #7376

popzxc commented Jun 19, 2021

rust-highfive commented Jun 19, 2021

xFrednet commented Jun 19, 2021

popzxc commented Jun 19, 2021

flip1995 left a comment

Manishearth commented Jun 21, 2021 •

edited

Loading

Manishearth commented Jun 21, 2021 •

edited

Loading

popzxc commented Jun 21, 2021

Manishearth commented Jun 21, 2021

Manishearth commented Jun 21, 2021

popzxc commented Jun 21, 2021

popzxc commented Jun 22, 2021

flip1995 commented Jun 22, 2021

Manishearth left a comment

Manishearth Jun 22, 2021

Manishearth Jun 22, 2021

popzxc commented Jun 22, 2021

popzxc commented Jun 22, 2021

Manishearth commented Jun 22, 2021

Add new lint: Mixed locale ident #7376

Add new lint: Mixed locale ident #7376

Conversation

popzxc commented Jun 19, 2021

rust-highfive commented Jun 19, 2021

xFrednet commented Jun 19, 2021

popzxc commented Jun 19, 2021

flip1995 left a comment

Choose a reason for hiding this comment

Manishearth commented Jun 21, 2021 • edited Loading

Manishearth commented Jun 21, 2021 • edited Loading

popzxc commented Jun 21, 2021

Manishearth commented Jun 21, 2021

Manishearth commented Jun 21, 2021

popzxc commented Jun 21, 2021

popzxc commented Jun 22, 2021

flip1995 commented Jun 22, 2021

Manishearth left a comment

Choose a reason for hiding this comment

Manishearth Jun 22, 2021

Choose a reason for hiding this comment

Manishearth Jun 22, 2021

Choose a reason for hiding this comment

popzxc commented Jun 22, 2021

popzxc commented Jun 22, 2021

Manishearth commented Jun 22, 2021

Manishearth commented Jun 21, 2021 •

edited

Loading

Manishearth commented Jun 21, 2021 •

edited

Loading