Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DX for methods that use all memory #2695

Closed
9 of 10 tasks
vgjenks opened this issue Feb 26, 2024 · 9 comments · Fixed by #3216
Closed
9 of 10 tasks

Improve DX for methods that use all memory #2695

vgjenks opened this issue Feb 26, 2024 · 9 comments · Fixed by #3216
Assignees
Labels
c: docs Improvements or additions to documentation c: feature Request for new feature m: helpers Something is referring to the helpers module m: string Something is referring to the string module p: 1-normal Nothing urgent
Milestone

Comments

@vgjenks
Copy link

vgjenks commented Feb 26, 2024

Pre-Checks

Describe the bug

Using a big integer as min or max in faker.string.numeric in length options, is not handled gracefully and causes Javascript heap out of memory crash in V8. It should at least validate the inputs and provide a graceful exception message. Even better; provide big int support in options.

Minimal reproduction code

const { faker } = require("@faker-js/faker");

const fake = {
    bigRange: faker.string.numeric({ length: { min: 1000000000, max: 2000000000 }})
}

console.log("bigRange:", fake.bigRange);

Additional Context

No response

Environment Info

System:
  OS: macOS (Apple silicon)
Binaries:
  Node: 20.9.0
  npm: 10.1.10
npmPackages:
  @faker-js/faker: 8.4.1

Which module system do you use?

  • CJS
  • ESM

Used Package Manager

npm

@vgjenks vgjenks added c: bug Something isn't working s: pending triage Pending Triage labels Feb 26, 2024
@vgjenks vgjenks changed the title Crashes app suddenly without error BigInt options values in string.numeric crashes v8 Feb 26, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Feb 26, 2024

Using a big integer as min or max in faker.string.numeric in length options, is not handled gracefully and causes Javascript heap out of memory crash in V8. It should at least validate the inputs and provide a graceful exception message.

The inputs are valid for faker, but they hit constraints of the underlying engine.

Any suggestion how to handle that?

What if the memory limit is increased?
What if the memory is already spend on different strings/other objects?
What if you use a different engine?


Previously, we limited the size arbitrarily (2 ** 20), but this also results in freeze like delays (6f977f6).

I have ideas how to make it 4-7 times faster without loosing much randomness, but generating these long strings (or arrays) still takes a long time.
If we use repeated sub-strings, then we still could go faster (like ~100-1000times?), but we would still hit the limit of the runtime eventually. Unfortunately these vary between systems:

JS Spec: 2^53-1
In V8 (used by Chrome and Node), the maximum length is 2^29 - 24 (~1GiB). On 32-bit systems, the maximum length is 2^28 - 16 (~512MiB).
In Firefox, the maximum length is 2^30 - 2 (~2GiB). Before Firefox 65, the maximum length was 2^28 - 1 (~512MiB).
In Safari, the maximum length is 2^31 - 1 (~4GiB).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length

We could limit the size again to like 2^25 to avoid the runtime limits:

  • min(length,limit) value: We would secretly hide the limitation from the user, would intransparently violate the expected behavior.
  • Throw error: In the end we would still throw an error, maybe with a better error message

In both scenarios we would limit the user arbritarly, even if their engine supports bigger values.

Just catching the out of memory error isn't an option either, because we would have to do that for every method then.

We could add a hint like If you generate more than 2 ^ 20 elements, you will likely run out of memory, but that feels very random for a documentation.


Even better; provide big int support in options.

Could you explain why that would be better or for what purpose do you need bigInt support?

@ST-DDT ST-DDT added s: awaiting more info Additional information are requested m: string Something is referring to the string module labels Feb 26, 2024
@vgjenks
Copy link
Author

vgjenks commented Feb 26, 2024

Could you explain why that would be better or for what purpose do you need bigInt support?

Sorry, admittedly...I misunderstood the usage of these options and made an assumption that "length" + "min" and "max" would generate a string between those numbers. It's a little misleading at first glance, to be fair.

I actually came here to delete this report and submit another to tend to the fact that it causes you to run out of memory. I think that's the crux of the issue here - it should handle it better and have some sort of safe limit - or even a reasonable, safe default with a configurable limit to allow you to blow your foot off, should you choose.

My bad for not reading the doc closely enough and making an assumption - bad form. However, I was testing this within the context of the aws-sdk and it was frustrating to have it fail w/ zero feedback. I had to eliminate almost field-by-field to spot where it was happening, then test it in isolation to see it run out of memory and blow v8 up.

@ST-DDT
Copy link
Member

ST-DDT commented Feb 27, 2024

Sorry, admittedly...I misunderstood the usage of these options and made an assumption that "length" + "min" and "max" would generate a string between those numbers. It's a little misleading at first glance, to be fair.

We had hoped that a link to the method generating a number would solve that issue.

Do you have a suggestion on how to improve this?


I actually came here to delete this report and submit another to tend to the fact that it causes you to run out of memory. I think that's the crux of the issue here - it should handle it better and have some sort of safe limit - or even a reasonable, safe default with a configurable limit to allow you to blow your foot off, should you choose.

Even if we limit it to 2^20, then you are still limited to ~1k of those strings on the heap. Less if we limit it to 2^25 etc.
Having a configurable length limit might be solution against accidental misuse. But it would affect all methods, returning strings, arrays and objects.
We will discuss this in the next team-meeting.
It might take a while for it to get implemented, probably after #2667


My bad for not reading the doc closely enough and making an assumption - bad form. However, I was testing this within the context of the aws-sdk and it was frustrating to have it fail w/ zero feedback. I had to eliminate almost field-by-field to spot where it was happening, then test it in isolation to see it run out of memory and blow v8 up.

I understand how frustrating this can be. Sadly node does a very poor job, logging where the OOM error occurred.
With a little stacktrace that would be easier.

For the future, I would like to recommend binary search for the error location or some kind of debug logging, so it is easier to detect where the code halts.


The issue was first about Crashes app suddenly without error.
Was that actually no error message or did you run into a timeout first?
I would like to understand whether we have to improve the implementation (for performance) or why there was no error message.

@matthewmayer
Copy link
Contributor

"The number or range of digits to generate." is perhaps a little hard to read

Maybe

"The length of the generated string. Should either be a fixed length, or a min and max length.

@ST-DDT ST-DDT added c: docs Improvements or additions to documentation p: 1-normal Nothing urgent labels Feb 27, 2024
@ST-DDT ST-DDT added this to the vAnytime milestone Feb 27, 2024
@ST-DDT ST-DDT added s: needs decision Needs team/maintainer decision and removed s: pending triage Pending Triage labels Feb 27, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Feb 27, 2024

FFR: This needs a decision regarding introducing a configurable limit on data (string,array,...) length.

@matthewmayer
Copy link
Contributor

I think the issue is that if a user passes
faker.string.numeric({ length: { min: 1000000000, max: 2000000000 }})
or
faker.string.numeric({ length: 1000000000}})

Do we think its more likely

  • They actually want to create a 1-2 billion long character numeric string
    or
  • It's programmer error because they misunderstood the parameters and they actually wanted e.g. faker.string.numeric({ length: 9 }})

How about, if a very large length is provided, we log a warning like:

> `faker.string.numeric({ length: { min: 1000000000, max: 2000000000 }})`
Warning: This will create a numeric string with length over 1 million which may be slow or cause out of memory errors. The length parameter controls the number of digits in the string, not the maximum value. To suppress this warning, pass `{allowLargeLength: true}`

@ST-DDT
Copy link
Member

ST-DDT commented Feb 29, 2024

Team Decision

  • We won't tackle this in v9.0
  • We will dedicate a separate feature release for improved DX, once we have a concept on how to do this consistently

@ST-DDT ST-DDT added c: feature Request for new feature and removed c: bug Something isn't working s: awaiting more info Additional information are requested s: needs decision Needs team/maintainer decision labels Feb 29, 2024
@ST-DDT ST-DDT changed the title BigInt options values in string.numeric crashes v8 Improve DX for methods that use all memory Feb 29, 2024
@ST-DDT ST-DDT added the m: helpers Something is referring to the helpers module label Feb 29, 2024
@xDivisionByZerox xDivisionByZerox added the s: on hold Blocked by something or frozen to avoid conflicts label Apr 27, 2024
@ST-DDT ST-DDT removed the s: on hold Blocked by something or frozen to avoid conflicts label Oct 20, 2024
@ST-DDT ST-DDT self-assigned this Oct 23, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Oct 23, 2024

I plan to tackle this soon.

  1. What should be the limit for the number of elements?
    10k? 100K?

  2. Suggestions for the parameter name:

  • bypassSafety: true | undefined
  • unsafeLength: true | undefined
  • allowHuge: true | undefined
  • restrictionsOff: true | undefined
  • restrictions: 'off' | undefined
  • allowUnlimited: true | undefined
  • unlimited: true | undefined
  • limits: 'off' | undefined
  1. Should this be a global config or a per method option?
function methodForBigData(options: { length: NumberOrRange, bypassSafety?: true})

faker.module.methodForBigData({ length: 1_000_000, bypassSafety: true });

// vs

faker.bypassSafety = true;
faker.module.methodForBigData({ length: 1_000_000 });
  1. Suggestion for the error message:

You are generating huge datasets, this is slow and might cause OutOfMemory issues. Requested: {}, Limit: {}. If you want to bypass this limit, please specify the ___ option.

or

Do you really want to generate datasets this huge? This is slow and might cause OutOfMemory issues. Requested: {}, Limit: {}. If you want to bypass this limit, please specify the ___ option.


Partially blocked by #3216

@ST-DDT ST-DDT reopened this Nov 3, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Nov 14, 2024

Team Decision

We will not add an allowHuge or similar flag.
JS's string.repeat and other similar methods don't have an allowHuge flag either.
The main performance impact is from the repeated nested method calls and not by the huge list itself.

We will try to improve the performance of methods generating strings though.

@ST-DDT ST-DDT closed this as not planned Won't fix, can't repro, duplicate, stale Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: docs Improvements or additions to documentation c: feature Request for new feature m: helpers Something is referring to the helpers module m: string Something is referring to the string module p: 1-normal Nothing urgent
Projects
None yet
4 participants