-
-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(internet): userName, email and slugify return only ascii #1554
fix(internet): userName, email and slugify return only ascii #1554
Conversation
…locales, revert slugify to ascii only
…locales, revert slugify to ascii only
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## next #1554 +/- ##
==========================================
- Coverage 99.64% 99.63% -0.01%
==========================================
Files 2221 2222 +1
Lines 239460 239816 +356
Branches 1047 1056 +9
==========================================
+ Hits 238604 238950 +346
- Misses 835 845 +10
Partials 21 21
|
…locales, revert slugify to ascii only - add test
…locales, revert slugify to ascii only - fix test
Co-authored-by: Shinigami <chrissi92@hotmail.de>
…locales, revert slugify to ascii only - allow stripping diacritics
I added a further refinement to slugify which strips simple diacritics. This helps in languages like French or Vietnamese, where you have names like
now instead you'd get which seems nicer. I also added some additional tests for slugify. |
Team Decision:
|
"We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way. How that is done is an implementation detail and in the hand of the PR author" So I tried a new approach in the latest commit, basically reimplementing the userName function with a few strategies: Ascii names work as before
Simple accents are stripped using Unicode NFD
A simple mapping can be use to transliterate alphabetical languages like Cyrillic (could also add say Greek and Thai).
As a final fallback, just concatenate the hex unicode char codes of each character.
This works tolerably well but leads to some very long usernames in the final fallback scenario
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impl wise this looks good to me.
Could you add an example to the jsdocs that uses non ascii input?
Also I think it might be better if you move the char mapping to separate file and import it. Not sure whether it should be ts or json. But that is optional for now IMO.
We could do a substring if it gets too long but that's a different issue.
We also discusses that we want |
…se base36 as fallback
Co-authored-by: ST-DDT <ST-DDT@gmx.de>
Co-authored-by: ST-DDT <ST-DDT@gmx.de>
Co-authored-by: ST-DDT <ST-DDT@gmx.de>
Co-authored-by: Eric Cheng <ericcheng9316@gmail.com>
tentative fix for #1105 and #1437
before this PR is applied:
after this PR is applied: