-
-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird email and username in Chinese locale package #1105
Comments
@shtse8 Is there a "trivial" way to translate/transcribe Chinese words to English/Latin letters? Are usernames also in Latin letters? I think I have seen mostly Chinese usernames (display names) in Chinese forums (In the few I have ever visited). |
it is not the case. as there is possible to support Chinese in domain, username and email in theory and in standard. but it's not in practical. Chinese is very difficult to input comparing other languages.
because there is not possible to use Chinese in email and username most of the time on any site, which won't allow to input due to difficult to handle in tech way, parsing Chinese is relatively difficult. also, it's much easier to enter in English which can be directly from keyboard - one char by one char. in Chinese world, there are many ways to transcribe our Chinese name to English. In Hong Kong, we are using our English name or Cantonese phonic name on our id card. For example, surname Many of us have read English name taken by ourselves like In Mainland China, Taiwan and other Mandarin speaking places like Sigapore, Malysia, they are using Pinyin (Mandarin phonic), for example, surname Let's take a look on DouYin (抖音) (Chinese version TikTok) https://www.douyin.com/user/MS4wLjABAAAApDszKVp0whQtJRUaaDmKnrshCmZ5gwZwcXXnvYsAUFE Hope it can help to be more |
Just my opinion and idea: I feel like this breaks out of scope for faker itself. It uses a simple algorithm right now where a first name and last name are just inserted for the email. So my proposal (and we can freely discuss about that) would be: Create/Use a package, to covert chinese names to english counterparts and pass them into the email function of faker. |
IMO we could probably add a locale like However, this would be up to the user to explicitly select as locale, because technically it not Chinese anymore and phonetically converting the text probably takes more than 50 lines of code. And some users might explicitly want chinese usernames and email addresses, because they have to verify, that it works with those as well. (In Germany, it is possible to use Umlaute export function createRandomUser(): User {
return {
userId: fakerZH.datatype.uuid(),
username: fakerEN_CN.internet.userName(),
email: fakerEN_CN.internet.email(),
avatar: fakerZH.image.avatar(),
password: fakerZH.internet.password(),
birthdate: fakerZH.date.birthdate(),
registeredAt: fakerZH.date.past(),
}
} If we add some kind of internal workaround, to delegate to the English Faker ourselves, then we won't be able to split faker into individual locale modules anymore. @shtse8 What do you think about the |
There is a romanization system for Chinese characters called "pinyin" as @shtse8 said, but I'm not sure if there's an easy way to transliterate characters into it. I'll look into it. Edit: Problem is, some Chinese characters have multiple ways to pronounce them based on context :/ |
and just one google search away, typing in and there are even alternativ packages so I think this is currently the best workaround for now according to this answer on stackoverflow: https://stackoverflow.com/a/760151/6897682 |
Today another "affected" method and locale showed up: We might have to add an option |
Especially with From what I understand, not all TLDs are even accepting internationalized domain names (wiki), so I think it is out of scope for faker to determine which are and keep track of that. Imo, domain words should just not include non-ASCII chars to keep it simple. |
Perhaps locales which aren't in ASCII script should optionally be able to provide an alternative set of ASCII first names and last names to be used in contexts that require ascii like email addresses? For example zh_CN, ar, el |
Sample output for
I note there are two groups of locales with slightly different problems
|
The difference seems to come down to the fact that faker.helpers.slugify has some exceptions for Japanese and Chinese characters https://github.com/faker-js/faker/blame/next/src/modules/helpers/index.ts#L37
Note the Chinese and Japanese characters here are not stripped but Cyrillic, Arabic, Korean are:
|
... and that was originally introduced here: It seems to have caused more problems than it solved, so perhaps that could be reverted, and a more general solution found for all the non-ascii-ish locales. |
I dont think that |
as a simple solution, in non-ascii locales you could just make a purely random localPart for email addresses like two letters, followed by 5-8 numbers, e.g.
... at least it would be a valid email address. |
i created #1554 as a tentative solution for this. Not sure would be the best long term solution but it at least means that all locales return valid, ascii, email addresses. |
At least, as for email addresses, the same goes for the Japan.
I think this fix will help! |
Describe the bug
email and username should not using Chinese even in Chinese locale package.
there is no one using Chinese as an email and username even in Chinese.
Reproduction
code
output
Additional Info
No response
The text was updated successfully, but these errors were encountered: