Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new lint: unicode_not_nfc #249

Closed
wants to merge 20 commits into from
Closed

new lint: unicode_not_nfc #249

wants to merge 20 commits into from

Conversation

llogiq
Copy link
Contributor

@llogiq llogiq commented Aug 28, 2015

This closes #85 – I just renamed the lint to make it more clear. Wiki entry will follow once pulled.

I also want to note that once this gets pulled, we will have reached the 50 lints mark! Yay! 😄

@Manishearth r?

@llogiq
Copy link
Contributor Author

llogiq commented Aug 30, 2015

Let this be a testament to my ineptitude for working with anything more complex than SVN. 😄

@llogiq
Copy link
Contributor Author

llogiq commented Aug 31, 2015

Since Manish seems to be awfully busy, @birkenfeld r?

@Manishearth
Copy link
Member

(I wasn't awfully busy, my github emails were being sent to spam for some reason. But now I am busy 😛 )

@llogiq
Copy link
Contributor Author

llogiq commented Aug 31, 2015

I was referring to your work week 😜

@birkenfeld
Copy link
Contributor

Well, it seems to work as advertised, however I wonder if it's a good idea to emit a warning for each such sequence.

(Similar for the "non-ASCII char" lint, I think that should be restricted to at most once per string literal).

If you just do a pass that checks for the first mismatch between string and string.nfc().collect() that might be less intrusive, and need less convoluted control flow. If you want to keep the current code, I would request better names for "last" and "ustart" :)

@Manishearth
Copy link
Member

Once per stringlit sounds good to me (emit the list of chars if you want)

@llogiq
Copy link
Contributor Author

llogiq commented Aug 31, 2015

I'd really like to show where in the string literal the problematic characters are. Can we report multiple spans in one line?

@birkenfeld Thanks, I'll think about the variable names. I'd really like to keep the convoluted control flow, because I believe showing the locations where things go wrong (which btw. is a bit broken because zero-width whitespace throw off the formatting) is a really useful feature.

@Manishearth
Copy link
Member

No, but multispan reporting is a desired feature for rustc (e.g. for unused imports)

@birkenfeld
Copy link
Contributor

BTW, does "showing the locations" do useful things in the case of multiline string literals?

@llogiq
Copy link
Contributor Author

llogiq commented Aug 31, 2015

It should. Until now I had no problems with reporting on multiline snippets.

Since we don't have the means to make the report uncluttered, I'll report the whole string, along with pairs of sequences to search/replace.

@Manishearth
Copy link
Member

Make sure to add special handling for the language when you have a single bad char!

@llogiq
Copy link
Contributor Author

llogiq commented Aug 31, 2015

Will do. Alas, it'll have to wait – I'm a bit busy ATM.

@llogiq
Copy link
Contributor Author

llogiq commented Sep 1, 2015

Now I've special-cased the one-range case so that the respective part of the string is spanned. In the other cases, a newline-joined list of ranges + suggested replacements is reported with the whole string as span.

@Manishearth @birkenfeld r?

@llogiq
Copy link
Contributor Author

llogiq commented Sep 4, 2015

Since the solution to the problem does not involve seeing the actual ranges, I created a far less complex version that directly lints the strings and suggests replacements. I'm closing this in favor of the other solution.

@llogiq llogiq closed this Sep 4, 2015
@llogiq llogiq deleted the unicode branch September 4, 2015 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Various unicode warnings [zero-width-space, unicode_canon, ascii_only]
4 participants