Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Design for standalone module functions #2667

Open
ST-DDT opened this issue Feb 14, 2024 · 16 comments
Open

Proposal: Design for standalone module functions #2667

ST-DDT opened this issue Feb 14, 2024 · 16 comments
Labels
breaking change Cannot be merged when next version is not a major release c: feature Request for new feature p: 1-normal Nothing urgent s: awaiting more info Additional information are requested
Milestone

Comments

@ST-DDT
Copy link
Member

ST-DDT commented Feb 14, 2024

Clear and concise description of the problem

Currently the faker methods are organized in modules and these are joined to the whole Faker class.
Most methods in the modules reference a function in a different module, by invoking it via the embedded Faker reference.
This makes it near impossible for building tools to drop unused parts of code package.

Changing the implementation to use standalone methods, that only call other standalone methods, mostly solves this issue.
Old code/syntax (e.g. faker.person.firstName) must remain valid.

Suggested solution

1. Extract the functions from the module classes into their own files (the same applies to internal helper methods)

- modules
  - airline
    - airline.ts
    - airport.ts
    - ...
  - helpers
    - arrayElement.ts
    - arrayElements.ts
    - multiple.ts
    - ...
  - ..

2. Change the signature of the methods, so that it replaces it the reference to the main Faker instance e.g. by passing it as argument.

function firstName(fakerCore: FakerCore, ...remainingArgs) {
  return arrayElement(fakerCore, fakerCore.definitions.person.firstName);
}

fakerCore ≈> Faker without any modules (only definitions, randomizer, config)

3. Call the standalone functions from the modules.

export function newPersonModule(fakerCore: FakerCore): PersonModule {
  return ({
    firstName: (..args) => firstName(fakerCore, ...args);
    ...
  });
}

4. Optionally, "process" the function to make it easier to use with the same/a fakerCore.

export function fakerize<T extends (fakerCore: FakerCore, ...args: any[]) => any>( fn: T ): FakerizedMethod<T> {
  const fakerized = fn as FakerizedMethod<T>;
  fakerized.withCore = (fakerCore: FakerCore) => (...args: FakerizedParameters<T>) => fn(fakerCore, ...args);
  // Potentially more methods
  return fakerized;
}

// We need to check whether this negatively impacts readability of the type if not inlined
type FakerizedMethod<T extends () => any> = T & Readonly<{
    withCore: ( fakerCore: FakerCore ) => (...args: FakerizedParameters<T>) => ReturnType<T>;
  }>;

type FakerizedParameters<T extends (...args: any[]) => any> =
  Parameters<T> extends [FakerCore, ...infer U] ? U : never;

Usage:

// in firstName.ts
export const firstName = fakerize(firstName);

// by users or maybe pre-built per locale
export const enFirstName = firstName.withCore(fakerCoreEN);

export function newPersonModule(fakerCore: FakerCore): PersonModule {
  return ({
    firstName: firstName.withCore(fakerCore);
    ...
  });
}

// by users
enFirstName() // Henry
Usage nano bound standalone module function
// in locales/en.ts
const nanoCoreFirstName: FakerCore = { randomizer, config, locale: { person: { first_name }}}; // Contains only the required locale data for firstName
export const enFirstName = firstName.withCore(nanoCoreFirstName);

// by users
enFirstName() // Henry

The nano bound standalone module function contain only the data that are needed for that specific function and nothing else.
So they are tiny after treeshaking 1-10KB instead of the full 600KB for en.

Pros:

  • Allows adding meta functions, such as firstName.isAvailable = (fakerCore) => fakerCore.rawDefinitions.person.firstName != null;

Cons:

  • The hover-able API docs are slightly less beautiful because it is a export const instead of export function.

grafik
vs
grafik

Alternative

I also considered making the fakerCore parameter the last parameter and optional by defaulting to e.g. fakerCoreEN.

However that has a few disadvantages:

  • The default in the signature or otherwise is likely to force all of the English locale to be bundled, even if it is never used.
  • The optional nature of the parameter makes it easy to forget especially when creating complex objects in a function with a fakerCore parameter itself.
// This method can be used by multiple languages
function mockPerson(fakerCore?: FakerCore, static?: Partial<Person> = {}): Person {
  return ({
    id: static.id ?? uuid(fakerCore),
    firstName: static.firstName ?? firstName(),
    lastName: static.lastName ?? lastName(),
  });
}
// Did you spot the missing parameters?
// Would you if I didn't tell you?
// If it was more complicated?
// Or if this method is only a nested function within the `mockLeasingAgreement()` function?
  • If the parameter is the last one, you will likely end with some undefined parameters in between.
firstName(undefined (SexType), fakerCoreFR)
  • If the parameter is last, there is also additional work to do when processing the function.
export function fakerize<T extends (...args: any[], fakerCore: FakerCore) => any>(
    fn: T,
    fakerCoreParameterIndex: number // <-- probably not compile time guarded => hard to use by our users
    ): FakerizedMethod<T> {

  const fakerized = fn as FakerizedMethod<T>;
  fakerized.withCore = (fakerCore: FakerCore) => (...args: FakerizedParameters<T>) => {
    args[fakerCoreParameterIndex] = fakerCore;
    fn(...args);
  }
  return fakerized;
}
  • Having the fakerCore parameter last makes it very hard for methods like multiple to pass the argument on.
    This basically prevents using derive in combination with pass-through parameters (IIRC currently only supported by unique)
Functions with pass-through parameters might result in easier to read code, but that belongs into a different proposal
function multiple(source: (...args: A[], fakerCore) => T, options, fakerCore: FakerCore, ...args: A[]) {
  // Impossible
}

multiple(firstName, { count: 5}, fakerCoreEN, 'male');

Additional context

This is an attempt at detailing the proposal made in #1250

It was made with a change like #2664 in mind to retain optimal performance.

This proposal will not remove or deprecate the well known faker.person.firstName syntax.


Adopting this proposal in combination with #2664 has side effects for methods that accept other methods as parameters, as they have to pass on the fakerCore instance.
function multiple(fakerCore: FakerCore, source: (fakerCore: FakerCore) => T, options) {
  cosnt derived = fakerCore.derive();
  return Array.from({length: options.count}, () => source(derived));
}

If the user does not use the derived instance, then they use more seeds from their main instance and might even use different fakerCores for the multiple function and the source function.

// BAD
multiple(fakerCoreEN, () => firstName(fakerCoreFR), { count: 3 });

// GOOD
multiple(fakerCoreEN, (core) => firstName(core), { count: 3 });

// BETTER?
multiple(fakerCoreEN, firstName, { count: 3 });

This also affects other functions such as uniqueArray and unique.
Aka that extends the scope of faker to some (helper) functions that call faker functions indirectly.

Note: I cut quite a few related/derived proposals from this issue entirely, because it is long enough as is:
  • Nano-Localized-Functions for min bundle sizes
  • Most of the Fakerize Meta-Framework
  • Code-Transformer
  • Most of the extension framework
  • Implementation details
  • generateComplex(fakerCore, { key: keyof T: (fakerCore) => T[key] }): T
  • ...

I just though about these while writing the ticket, thus it does not imply that I actually plan on adding any of these.
(I likely forgot a few as well, because I only wrote them down at the end😉.)

Usage Example

const usersEN = mockUsers(fakerCoreEN);
const usersFR = mockUsers(fakerCoreFR);

function mockUsers(fakerCore: FakerCore) {
  return multiple(fakerCore, mockUser, { count: 3 });
}

function mockUser(fakerCore: FakerCore): User{
  return ({
    id: uuid(fakerCore),
    firstName: firstName(fakerCore),
    lastName: lastName(fakerCore),
  });
}
@ST-DDT ST-DDT added c: feature Request for new feature p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision labels Feb 14, 2024
@ST-DDT ST-DDT added this to the v9.0 milestone Feb 14, 2024
@matthewmayer
Copy link
Contributor

Sorry I don't totally understand if this is a internal change or would affect consumers.

If I had existing code like

const { faker } = require("@faker-js/faker")
const email = faker.internet.email()

Would that need to change?

@ST-DDT
Copy link
Member Author

ST-DDT commented Feb 15, 2024

Would that need to change?

Old code does not need to change

Could/as proposed:

const { fakerCoreEN, email } = require("@faker-js/faker")
const email = email(fakerCoreEN)
// This is a few KB smaller after bundling than the old code (~620KB? vs ~600KB?)

At full "capacity" (probably not part of the first implementation):

const { email } = require("@faker-js/faker/locale/en")
const email = email()
// This could be tiny (~5KB?)

@matthewmayer
Copy link
Contributor

Provided it doesn't need to change that's fine. Anything which would require consumers to change the syntax of every call to faker they are currently making in existing code would be a big con.

But if this is only for new code, or if you want to take advantage of tree shaking, that's OK.

@matthewmayer
Copy link
Contributor

i think its also important to make sure that tree-shaking actually works after making this change!

This is going to require a LOT of changes to the code; we don't want to spend a long time doing that but then find out that there is some unrelated reason that causes tree-shaking to not be possible with this architecture. Maybe we could start with a proof of concept that this actually allows for tree-shaking to work before committing to changing this, maybe just for a single module or method?

@Shinigami92
Copy link
Member

At first: thanks very much for taking the effort and proposing this draft via a GitHub issue 🙏

I already know a little bit more behind the scenes via private conversations and faker meetings, but I also want to write this down here to be more clear and referenceable in the future:
I'm 1_000% with @matthewmayer and want to be able to stick to the current usage of faker and so this proposal / implementation is just an addition / opt-in if wanted to benefit from. Also that means that there will no breaking changes introduced by this implementation and all current tests (except of the deprecation removals for sure) will still work afterwards.
However, I might be fine if we need something as a workaround then to support tree-shaking like import { email } from '@faker-js/faker/tree-shaked'.


Furthermore, like @matthewmayer I would really love to see a working implementation that results in <<< 600 KiB like you proposed here, before actually rewriting the whole src code base.

At full "capacity" (probably not part of the first implementation):

const { email } = require("@faker-js/faker/locale/en")
const email = email()
// This could be tiny (~5KB?)

And one last wish: I would prefer if we could do all the deprecation removals first, so we have more clean test files and also less to convert.

Tracking PR is mostly here (it got slightly more a tracking issue right now 😅)

@ST-DDT
Copy link
Member Author

ST-DDT commented Feb 15, 2024

  • POC: Will add it when I (or someone) have some free time.
  • Deprecation: -> See tracking issue

@ST-DDT
Copy link
Member Author

ST-DDT commented Feb 29, 2024

Team Decision

@ST-DDT
Copy link
Member Author

ST-DDT commented Feb 29, 2024

@ST-DDT ST-DDT added the s: on hold Blocked by something or frozen to avoid conflicts label Feb 29, 2024
@matthewmayer
Copy link
Contributor

const { fakerCoreEN, email } = require("@faker-js/faker")
const email = email(fakerCoreEN)
// This is a few KB smaller after bundling than the old code (~620KB? vs ~600KB?)

Regarding the new syntax for this what happens to the module names? (Ie "internet" for email?)

@ST-DDT
Copy link
Member Author

ST-DDT commented Mar 22, 2024

I dont have a final idea for this, but I thought about omitting it in the export names unless required for disambiguation.

  • Internet.email as email
  • Animal.type as animalType
  • String.sample as string or stringSample

Assigning the aliases/export names might be a manual task.

@Shinigami92
Copy link
Member

It would be absolutely cool if it is somehow possible to call these functions without passing a specific faker instance, and it falls back to a default faker instance.
But not sure yet if that's possible.


I would be in favor for sampleString() because this reads a bit more naturally 🤔
But I have just put 2 seconds of brain power into that 😅

@ST-DDT
Copy link
Member Author

ST-DDT commented Mar 22, 2024

It would be absolutely cool if it is somehow possible to call these functions without passing a specific faker instance, and it falls back to a default faker instance.
But not sure yet if that's possible.

I thought about exposing all methods bound to a specific locale (as minimal dataset) e.g. from the locale/en folder.

Note: I cut quite a few related/derived proposals from this issue entirely, because it is long enough as is:

  • Nano-Localized-Functions for min bundle sizes

I consider that off-topic for the discussion, as the topic is already big enough as is.

@Shinigami92
Copy link
Member

It would be absolutely cool if it is somehow possible to call these functions without passing a specific faker instance, and it falls back to a default faker instance.

But not sure yet if that's possible.

I thought about exposing all methods bound to a specific locale (as minimal dataset) e.g. from the locale/en folder.

Oh, I absolutely forgot 🤦, many of the functions do not need a localized faker instance at all and will just work more or less out of the box. So my request/idea was more related to just these non-locale functions. The localized functions can / should depend on a requested faker instance anyway.

Anyway, if something is confusing, I can also just shut up and do vacation if you want 😂

@ST-DDT
Copy link
Member Author

ST-DDT commented Mar 22, 2024

many of the functions do not need a localized faker instance

But they still need the randomizer from it. We can discuss where to put them, after we have the capability to actually do so.

@ST-DDT ST-DDT removed the s: on hold Blocked by something or frozen to avoid conflicts label Apr 4, 2024
@ST-DDT
Copy link
Member Author

ST-DDT commented Apr 20, 2024

So here my plan to implement this feature.

  1. Introduce FakerCore the container that stores all mutable state/data.
    1. PR that moves randomizer, locale data and "config" to its own thing
    2. Issue/PR with considerations regarding behavioral! changes (should the pre-built instances share the config/randomizer with each other), can be done in parallel with 2
  2. Actually change the methods to be standalone/take the FakerCore as param (instead of using it as field).

Is that OK for you?

@matthewmayer
Copy link
Contributor

As this is a big change and will likely need several preparatory pull requests before it gets to the point where you can actually use individual tree-shakable functions, perhaps it would make sense to merge these into a dedicated branch rather than merge directly into next? That way if there are unanticipated problems along the way, there is the freedom to postpone all the changes and release them together in a later release than 9.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Cannot be merged when next version is not a major release c: feature Request for new feature p: 1-normal Nothing urgent s: awaiting more info Additional information are requested
Projects
None yet
Development

No branches or pull requests

3 participants