-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect confusing unicode characters and show the alternative... #29837
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
@Manishearth So, here's some rough artwork for you :) "rough", because I haven't checked it yet (just wanted to show the progress). I suppose we should also add tests for this? |
Ah! limited to 100 chars (I thought it shared Servo's 120 chars limit)... |
@@ -0,0 +1,156 @@ | |||
const ASCII_ARRAY: &'static [(char, &'static str)] = &[('_', "Low Line"), ('-', "Hyphen-Minus"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs the MPL header
Also, yes, there should be some parse-fail tests for this. |
d5a4945
to
25a86fa
Compare
('}', "Right Curly Brace"), | ||
('*', "Asterisk"), | ||
('/', "Slash"), | ||
('\\', "Back Slash"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single word
25a86fa
to
39e6bfa
Compare
@Manishearth I've made the changes and now I've gone for a variation of what you'd suggested (I've used |
|
||
fn main() { | ||
let y = 0; | ||
//~^ ERROR unknown start of token: \u{37e} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the help message here too
compile-fail tests don't require all helps and notes to be listed, but if you do list a help or note, and the program fails to emit it, the test will fail.
39e6bfa
to
56647c3
Compare
@Manishearth r? |
.map(|idx| { | ||
let (_, u_name, ascii_char) = UNICODE_ARRAY[idx]; | ||
let span = make_span(reader.last_pos, reader.pos); | ||
match ASCII_ARRAY.iter().position(|&(c, _)| c == ascii_char) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use .find
, not .position
.
LGTM, small nits involving style. For future reference, you should be rarely indexing arrays and things in Rust. Most of the time you should use iterators ( |
This patch looks pretty decent. I second @Manishearth's suggestions. Also, note that there is a tidy error because some of the lines in the |
56647c3
to
c2c416c
Compare
@nikomatsakis @Manishearth Agreed, thanks! (and done). r? |
@@ -174,6 +174,9 @@ impl SpanHandler { | |||
self.handler.emit(Some((&self.cm, sp)), msg, Bug); | |||
panic!(ExplicitBug); | |||
} | |||
pub fn span_bug_no_panic(&self, sp: Span, msg: &str) { | |||
self.handler.emit(Some((&self.cm, sp)), msg, Bug); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention, can we add self.handler.bump_err_count();
here too?
c2c416c
to
7f63c7c
Compare
@bors r+ thanks! |
📌 Commit 7f63c7c has been approved by |
@Manishearth Thank you! :) |
❤️ 💓 ❤️ |
😀 Congrats on your first PR! |
Nice polish. |
Is there a reason this doesn't include U+201C LEFT DOUBLE QUOTATION MARK |
No reason. It contains the single quotes. I did Ctrl-F for those, but didn't bother to check the double quotes. Go ahead! |
It's again worth mentioning that this still doesn't have all the substitutions - only the printable ones from http://www.unicode.org/Public/security/revision-06/confusables.txt. So, feel free to add more :) |
Oh, I think I see why QUOTATION MARK was missed: things (including that) are considered confusable with APOSTROPHE, APOSTROPHE rather than ". |
The universe is starting to hit and appreciate this feature https://twitter.com/joeranweiler/status/678691374292590593 :D |
Add more aliases for Unicode confusable chars Building upon #29837, this PR: * added aliases for space characters, * distinguished square brackets from parens, and * added common CJK punctuation characters as aliases. This will especially help CJK users who may have forgotten to switch off IME when coding.
Add more aliases for Unicode confusable chars Building upon #29837, this PR: * added aliases for space characters, * distinguished square brackets from parens, and * added common CJK punctuation characters as aliases. This will especially help CJK users who may have forgotten to switch off IME when coding.
It's unclear why this is used here. All entries in the third column of `UNICODE_ARRAY` are covered by `ASCII_ARRAY`, so if the lookup fails it's a genuine compiler bug. It was added way back in rust-lang#29837, for no clear reason. This commit changes it to `span_bug`, which is more typical.
It's unclear why this is used here. All entries in the third column of `UNICODE_ARRAY` are covered by `ASCII_ARRAY`, so if the lookup fails it's a genuine compiler bug. It was added way back in rust-lang#29837, for no clear reason. This commit changes it to `span_bug`, which is more typical.
fixes #25957