Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve locale loading and global locale data #823

Closed
ST-DDT opened this issue Apr 9, 2022 · 8 comments
Closed

Improve locale loading and global locale data #823

ST-DDT opened this issue Apr 9, 2022 · 8 comments
Labels
c: feature Request for new feature c: locale Permutes locale definitions c: refactor PR that affects the runtime behavior, but doesn't add new features or fixes bugs s: needs decision Needs team/maintainer decision

Comments

@ST-DDT
Copy link
Member

ST-DDT commented Apr 9, 2022

Clear and concise description of the problem

The current system of loading locale data has some drawbacks, that needs to be addresses:

  1. It heavily depends on the en locale, even though some of the data (emojis, iban, country codes) aren't related to that
  2. Is is not possible to base region/country specific locales, with more than a single fallback (which is en by default)
  3. Using custom locale entries is currently not supported (via faker.definitions)

Suggested solution

  1. Introduce a global locale with all the locale data that aren't bound to a specific locale
  2. Replace locale and fallback locale with an locale[], that are searched in order feat!: multi locale fallback #858
  3. Provide all locale data dynamically via (via faker.definitions) feat: dynamic definitions tree #822

Alternative

  1. Remove these data entirely from the locales, making it impossible to change/filter them.
  2. Require the users to manually build their locales, but this is the exact opposite of our existing faker.definitions
  3. Let them access the locale data via the locale, which once again is the exact opposite of our existing faker.definitions

Additional context

  1. could be introduced in v6.x, while keeping the data in en till v7 (No 2 is implement).
  2. Is definitely a breaking change and can only be in v7+
  3. Can be added now, as it is a non-breaking improvement
@ST-DDT ST-DDT added c: feature Request for new feature s: needs decision Needs team/maintainer decision c: refactor PR that affects the runtime behavior, but doesn't add new features or fixes bugs c: locale Permutes locale definitions labels Apr 9, 2022
@ST-DDT ST-DDT added this to the v7 - Next Major milestone Apr 9, 2022
@ST-DDT ST-DDT moved this to Todo in Faker Roadmap Apr 9, 2022
@ST-DDT
Copy link
Member Author

ST-DDT commented Apr 9, 2022

I already have an outdated branch with a potential solution/implementation for 2: main...ST-DDT:feature/locales/multi-fallback

@pkuczynski
Copy link
Member

We should definitely have a space where we could put all locale-independent definitions. Not sure we should call it global locale though as it's not connected with anything i18n. I would rather see two folders here:

  • definitions (or data)
  • i18n (or locales)

Being able to create a path of multilevel fallbacks sounds like a good idea too! I can at least see 3 levels: dialect (eg de-AT), language (de), default (always en). But we should let user construct it they way he wants, including dynamic set created on his project side:

faker.locale = ['de-AT', {
  // custom user locale as Partial<FakerLocale>
}

We should also define default fallbacks so en-GB should always fallback into en. This way it could have a small subset of data specific to British English. What I mean here by fallback is that when user specifies en-GB as his language it should be actually a deep merge of definitions from en-GB with en. This way GB can be a super small file with only very specific things on British. Does this make sense?

@ST-DDT
Copy link
Member Author

ST-DDT commented Apr 10, 2022

We should definitely have a space where we could put all locale-independent definitions.

I specifically used a locale for these data to allow for easy and consistent overwriting of the data.
E.g. limit the ibans/country codes to only specific set of countries.

The locale order will be like this by default: de-At, de, en, global
The user may create their own faker instances using any combination of locales they want or dont want. They are explicitly able to omit global from their fallbacks. They are also able to add zero or more custom locales.
These custom locales work exactly like normal locales.

The localeOrder is indepent of the available locales in a faker instance. The locale order uses the locale names/keys, not the locale objects.

@ST-DDT
Copy link
Member Author

ST-DDT commented Apr 10, 2022

What I mean here by fallback is that when user specifies en-GB as his language it should be actually a deep merge of definitions from en-GB with en. This way GB can be a super small file with only very specific things on British. Does this make sense?

We would use the same fallback mechanism as now, just with more fallback levels. If it is not in en-GB then we will check en, then global.
We virtually merge the locale modules, but not their entries.

You can overwrite in en-GB only en.finance.credit_card.
Not statically only en.finance.credit_card.visa.
You may clone and filter/adjust the subsections though. E.g. edit visa by overwriting credit_card with an adjusted copy.
If you want to do more than that, we can discuss that in a seperate discussion.

@ejcheng
Copy link
Member

ejcheng commented Apr 10, 2022

We should definitely have a space where we could put all locale-independent definitions. Not sure we should call it global locale though as it's not connected with anything i18n. I would rather see two folders here:

  • definitions (or data)
  • i18n (or locales)

Being able to create a path of multilevel fallbacks sounds like a good idea too! I can at least see 3 levels: dialect (eg de-AT), language (de), default (always en). But we should let user construct it they way he wants, including dynamic set created on his project side:

faker.locale = ['de-AT', {
  // custom user locale as Partial<FakerLocale>
}

We should also define default fallbacks so en-GB should always fallback into en. This way it could have a small subset of data specific to British English. What I mean here by fallback is that when user specifies en-GB as his language it should be actually a deep merge of definitions from en-GB with en. This way GB can be a super small file with only very specific things on British. Does this make sense?

Just some suggestions for the local-independent definitions name: nolocale, independent, or something like that.

@pkuczynski
Copy link
Member

I specifically used a locale for these data to allow for easy and consistent overwriting of the data.
E.g. limit the ibans/country codes to only specific set of countries.

This is not a good example for IBAN, as for example I might be on a polish locale, yet willing to generate Italian IBAN... IBAN imho is not connected with locale. Country of origin might a param for it, but otherwise format is the same for all countries, right?

I really don't think global as you call it should be part of locales. It has nothing to do with locales... Good example was recent discussion around colors. IBAN is another good example. Credit card number or credit card organisation too...

I really see no reason why anyone would like to exclude global as you call it or why someone might want to override it? Makes no sense. We should follow what makes most sense from architecture point of view and not abstract ideas without practical application.

@ST-DDT
Copy link
Member Author

ST-DDT commented Apr 22, 2022

You have a point.

What about mime file types?
I assume you will hardly ever want most of them. If they are (global) locale data, you can easily filter and overwrite them.

@ST-DDT
Copy link
Member Author

ST-DDT commented Sep 8, 2022

Superseded by #1340.

@ST-DDT ST-DDT closed this as completed Sep 8, 2022
Repository owner moved this from In Progress to Done in Faker Roadmap Sep 8, 2022
@ST-DDT ST-DDT removed this from Faker Roadmap Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: feature Request for new feature c: locale Permutes locale definitions c: refactor PR that affects the runtime behavior, but doesn't add new features or fixes bugs s: needs decision Needs team/maintainer decision
Projects
None yet
Development

No branches or pull requests

3 participants