Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize the PRNG used by faker #2195

Closed
dubzzz opened this issue Jun 3, 2023 · 4 comments · Fixed by #2284
Closed

Customize the PRNG used by faker #2195

dubzzz opened this issue Jun 3, 2023 · 4 comments · Fixed by #2284
Assignees
Labels
c: feature Request for new feature s: accepted Accepted feature / Confirmed bug
Milestone

Comments

@dubzzz
Copy link

dubzzz commented Jun 3, 2023

Clear and concise description of the problem

Faker currently offers a way to somehow customize the PRNG via the ability to define the seed (or seeds). But at some point, the seed is a limited aspect of a PRNG. For instance, on a mersenne twister (like the one used by Faker), the PRNG state is made of 600+ 32bits integers, while the seed can only represent one of them (actually the seed via an array somehow addresses part of the problem). In addition, mersenne twister might not be perfect in some cases: not crypto, not fast enough. So offering the ability to twist it can be a good option.

My case is linked to the property based testing framework called fast-checkin a nutshell, it makes users able to test their code against randomized values while offering replay and shrink capabilities but does not aim to generate fake data (there are libs tailored for that need). In this framework, we suggest users to pass by fake data libraries such as faker if they want fake data (which is definitely a common need). But the current approach consisting into wrapping Faker by passing it a seed sounds not that perfect. Indeed, we have internally a PRNG (mersenne, xoroshiro and others) that are already seeded and offset-ed and passing it directly instead of the seed would be better.

My question is: Is there a plan to offer the ability to customize the PRNG used by Faker?

Suggested solution

I think my suggestion is still a little bit drafty at the moment. But the idea was to had a .prng making users of Faker able to change the PRNG used by Faker. As Faker already offers a .seed and in order not to break this function or make it useless with the new one I would suggest one of the two options below:

  • .prng receives an instance of PRNG, but if we call .seed we go back to the default PNRG of Faker
  • .prng has to be called with an object fitting the current schema

From a caller point of view, changing the default random generator used by Faker would be done as follow:

import { faker } from '@faker-js/faker';
import prand from 'pure-rand';

let currentPrng = prand.xoroshiro128plus(0);
faker.prng({
  next: () => {
    return (currentPrng.unsafeNext() >>> 0) / 0x1_0000_0000.
  },
  seed: (value) => {
    const seedValue = typeof value === "number" ? value : value.reduce((a, b) => a ^ b, 0);
    currentPrng = prand.xoroshiro128plus(seedValue);
  },
});

Example changing the PRNG by one coming from the library pure-rand

Alternative

No response

Additional context

Thank you so much for this awesome library. Feel free to close the issue, if the feature is definitely not part of the plans 👍
If the feature makes sense, I can give an hand for it!

@dubzzz dubzzz added c: feature Request for new feature s: pending triage Pending Triage s: waiting for user interest Waiting for more users interested in this feature labels Jun 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2023

Thank you for your feature proposal.

We marked it as "waiting for user interest" for now to gather some feedback from our community:

  • If you would like to see this feature be implemented, please react to the description with an up-vote (:+1:).
  • If you have a suggestion or want to point out some special cases that need to be considered, please leave a comment, so we are aware about them.

We would also like to hear about other community members' use cases for the feature to give us a better understanding of their potential implicit or explicit requirements.

We will start the implementation based on:

  • the number of votes (:+1:) and comments
  • the relevance for the ecosystem
  • availability of alternatives and workarounds
  • and the complexity of the requested feature

We do this because:

  • There are plenty of languages/countries out there and we would like to ensure that every method can cover all or almost all of them.
  • Every feature we add to faker has "costs" associated to it:
    • initial costs: design, implementation, reviews, documentation
    • running costs: awareness of the feature itself, more complex module structure, increased bundle size, more work during refactors

View more issues which are waiting for user interest

@matthewmayer
Copy link
Contributor

Minor comment perhaps it should just be "RNG" not PRNG? We would make no assumptions about what algorithm or hardware device is used to generate the values as long as it follows the signature.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 3, 2023

We will consider it for v9. Currently our roadmap for v9 (tree-shakeability) would move the mersenne part out of the Faker constructor into a separate FakerCore containing all the state and data.

In my mind this would work somewhat like that:
RNG + enLocale(or a subset thereof) + config => enCore
firstName + enCore (e.g. with only firstname data) => enFirstName
firstName + other methods + enCore => enFaker

// Somewhere in Faker

exported firstName: FakerFn<(args...) => string> = fakerize((fakerCore, args...) => ...);

const enFirstNameCore  = new FakerCore({ rng, enFirstNameLocaleData, config });

exported enFirstName: (args...) => string = firstName.withCore(enFirstNameCore);

// Somewhere in the users test code

exported myNewPerson: FakerFn<() => Person> = fakerize((fakerCore) => { firstName: firstName(fakerCore), ...});

exported enMyNewPerson: () => Person = myNewPerson.withCore(enCore);

// alternatively if only a single locale is needed

exported enMyNewPerson: () => Person = () => { firstName: enFirstName(), ...};

// Usage in the users test code

const person: Person = enMyNewPerson()

Note: The actual types/implementation will look different. I just could not find the discussion now.


We also have another big issue with generating complex objects or more precisely the side effects of changing it. If you add a property to it, the used seed value bleeds out of that method and changes all subsequent data entries.

See also #1499


So in summary: The RNG should look somewhat like this:

interface RNG {
   next: () => number;
   seed: (?) => void;
   copy/clone/fork/derive/?: (?) => RNG;
}

@ST-DDT
Copy link
Member

ST-DDT commented Jul 31, 2023

Potential implementation:

@ST-DDT ST-DDT self-assigned this Jul 31, 2023
@ST-DDT ST-DDT linked a pull request Jul 31, 2023 that will close this issue
@ST-DDT ST-DDT modified the milestones: v9.x, v8.x Oct 3, 2023
@xDivisionByZerox xDivisionByZerox added s: accepted Accepted feature / Confirmed bug and removed s: waiting for user interest Waiting for more users interested in this feature labels Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: feature Request for new feature s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants