Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: implement NODE_COMPILE_CACHE for automatic on-disk code caching #52535

Closed
wants to merge 3 commits into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Apr 15, 2024

This is the initial PR for implementing the idea proposed in #47472. It takes inspiration from ccache, the v8-compile-cache npm package (but faster, supports ESM and replaces CJS loader moneky-patching in user land) and v8 code caching in Blink. The first commit is backported from https://chromium-review.googlesource.com/c/v8/v8/+/5401780 which has already landed in the upstream (this is required for import() to work in deserialized code). For more motivation and background of this PR, check out #47472

Local numbers:

❯ hyperfine "./out/Release/node test/fixtures/snapshot/typescript.js" --warmup 5
Benchmark 1: ./out/Release/node test/fixtures/snapshot/typescript.js
  Time (mean ± σ):     124.3 ms ±  13.2 ms    [User: 109.6 ms, System: 11.6 ms]
  Range (min … max):   116.1 ms … 170.2 ms    21 runs

❯ export NODE_COMPILE_CACHE=/tmp/.node_compile_cache
❯ hyperfine "./out/Release/node test/fixtures/snapshot/typescript.js" --warmup 5
Benchmark 1: ./out/Release/node test/fixtures/snapshot/typescript.js
  Time (mean ± σ):      72.3 ms ±   3.2 ms    [User: 62.3 ms, System: 10.2 ms]
  Range (min … max):    70.5 ms …  87.2 ms    40 runs

There are more numbers from folks trying on an earlier iteration of this work (although this branch now hashes the code and the cache as an extra check, but crc32 should be fast enough that it's not too big of an overhead). For example from #47472 (comment) tsc --version can go from ~90ms to ~40ms, yarn --version can go from ~190ms to ~135ms, from #47472 (comment) npm run echo with a ncc-bundled npm may go from ~150ms to ~110ms


summary

This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk V8 code cache persisted in the specified directory
to speed up the compilation. This may slow down the first load of a
module graph, but subsequent loads of the same module graph may get
a significant speedup if the contents of the modules do not change.
Locally, this speeds up loading of test/fixtures/snapshot/typescript.js
from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
NODE_COMPILE_CACHE.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

  • Compile cache directory (from NODE_COMPILE_CACHE)
    • 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
    • 2ea3424d:
      • 10860e5a: CRC32 hash of filename + module type
      • 431e9adc: ...
      • ...

Inside the cache file, there is a header followed by the actual
cache content:

[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

There are still a bunch of TODOs for this feature, some of them are documented as TODO comments. In addition we can look into idle-time code cache serialization/writing which is also done by Blink to reduce the impact on first load. But this should be good enough as an initial iteration.

Note: why CRC32? Because we already have it in the dependencies (in zlib.h), it's fast and can avoid enough collisions for this use case. We do have some existing hashing functions available via openssl but those won't be available on no-crypto builds. We can revisit if CRC32 turns out to be inadequate for this feature. We can also revisit the directory structure or use an actual db, but for now the file I/O cost doesn't really show up when I profile it locally. The remaining bottleneck is the repeated UTF8 encoding, which I plan to get rid of in a follow-up (because it needs to dance with monkey-patching).

Refs: #47472

Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified directory
to speed up the compilation. This may slow down the first load of a
module graph, but subsequent loads of the same module graph may get
a significant speedup if the contents of the modules do not change.
Locally, this speeds up loading of test/fixtures/snapshot/typescript.js
from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/loaders
  • @nodejs/security-wg
  • @nodejs/v8-update
  • @nodejs/vm

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 15, 2024
@joyeecheung joyeecheung added request-ci Add this label to start a Jenkins CI on a PR. and removed lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 15, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 15, 2024
@nodejs-github-bot
Copy link
Collaborator

of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with [V8 JavaScript code coverage][], the
Copy link
Member Author

@joyeecheung joyeecheung Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When enabling NODE_COMPILE_CACHE and running tools/test.py parallel locally, three tests would fail:

Failed tests:
out/Release/node /Users/joyee/projects/node/test/parallel/test-runner-coverage.js
out/Release/node /Users/joyee/projects/node/test/parallel/test-runner-output.mjs
out/Release/node /Users/joyee/projects/node/test/parallel/test-v8-coverage.js

The failed tests are all related to less precise coverage which is currently a known caveat of code cache + v8 native coverage collection (e.g. see #51672). So I just documented that a bit in the docs that users should not use the two together for now. I could spare some time later to fix it in V8, but I don't think it's a blocker for this PR because this feature is currently opt-in. User-land code caching can run into this problem too, and it's technically a V8 bug.

@merceyz
Copy link
Member

merceyz commented Apr 15, 2024

There are more numbers from folks trying on an earlier iteration of this work

Some up-to-date results with f7806b3:

$ hyperfine -w 10 "NODE_COMPILE_CACHE='' ./node ./.yarn/releases/yarn-4.1.1.cjs --version" "NODE_COMPILE_CACHE='/tmp/node-cache' ./node ./.yarn/releases/yarn-4.1.1.cjs --version"
Benchmark 1: NODE_COMPILE_CACHE='' ./node ./.yarn/releases/yarn-4.1.1.cjs --version
  Time (mean ± σ):     165.5 ms ±   1.9 ms    [User: 156.5 ms, System: 25.4 ms]
  Range (min … max):   162.2 ms … 169.3 ms    18 runs

Benchmark 2: NODE_COMPILE_CACHE='/tmp/node-cache' ./node ./.yarn/releases/yarn-4.1.1.cjs --version
  Time (mean ± σ):     116.0 ms ±   1.3 ms    [User: 109.2 ms, System: 23.5 ms]
  Range (min … max):   113.8 ms … 119.7 ms    25 runs

Summary
  NODE_COMPILE_CACHE='/tmp/node-cache' ./node ./.yarn/releases/yarn-4.1.1.cjs --version ran
    1.43 ± 0.02 times faster than NODE_COMPILE_CACHE='' ./node ./.yarn/releases/yarn-4.1.1.cjs --version

@richardlau richardlau added the semver-minor PRs that contain new features and should be released in the next minor version. label Apr 15, 2024
src/compile_cache.cc Outdated Show resolved Hide resolved
@jdalton
Copy link
Member

jdalton commented Apr 15, 2024

This is so rad! Having had to manage and hook into systems for this having it built-in is very nice! ❤️ it.

src/compile_cache.cc Outdated Show resolved Hide resolved

> Stability: 1.1 - Active Development

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is very technical - I would use more layman terms for the first paragraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have specific suggestions? I don't want to make it sound too magical to avoid false advertisement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll do a follow up PR after this lands to not hold it but ideally something that focuses more on what it does and not the technical implementation

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as experimental/active development. Have a few questions:

  • how does this interact with SEA (would it make sense or be possible to only eventually bundle the code cache and not the code?)
  • What happens on overflows or tampering with the cache file? Is there a security concern here regarding passing different code through the cache file (since the hash is just a crc32)?
  • How does this feature interact with loaders/require hooks that modify the file contents and in particular passing them in different order? (reading the code it should work but want to make sure)

// - <cache_file_1>: a hash of filename + module type
// - <cache_file_2>
// - <cache_file_3>
bool CompileCacheHandler::InitializeDirectory(const std::string& dir) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not blocking for this PR - we should have better user facing errors if there are permission issues creating/reading/writing the cache)

Copy link
Member Author

@joyeecheung joyeecheung Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not error at all, the idea is that if caching cannot be done, this is just a no-op. There can be some logging to help users figure out why the cache isn't hit to ensure better performance, but the cache not being hit only leads to "not speeding up module loading", and it's not an error (just like when you write code that V8 cannot optimize or have to deopt with, there is no error either).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's literally caching I get that but when it won't work for users and they'll want to figure out why they'll ask us. It would be useful to at least trace it as as. warning or add DEBUG* like logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already traces. The handful of tests here are testing against the traces .

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 16, 2024

how does this interact with SEA (would it make sense or be possible to only eventually bundle the code cache and not the code?)

Currently SEA only supports bundling a single script and it's not compiled using the normal CJS loader, but there's already a dedicated option in the SEA config to enable code cache for that one file ("useCodeCache": true). Neither static import nor dynamic import is not currently supported in SEA. Users can load on-disk CJS files via createRequire() and in that case if the environment variable is enabled, the cache of on-disk modules loaded through the created require() will be put in that directory. This is all separated from the SEA itself.

What happens on overflows or tampering with the cache file? Is there a security concern here regarding passing different code through the cache file (since the hash is just a crc32)?

The cache will be silently ignored if the hash doesn't match. Under the Node.js threat model, code loaded from the file system is already trusted. If you can gain access to the file system, you might as well just replace the source file on disk that'll be executed and do whatever you want in that replaced file, instead of wasting all your time figuring out how to mutate the code cache that meets all the requirement below:

  1. Can do what you need (this means you need to know how to hand-write V8 bytecode to do what you need. Also, it must be run with the right V8 version since the bytecode format changes all the time)
  2. The mutated code cache is still of the same length of the original code cache (so if you want to do a lot of additional stuff in that bytecode, this is unlikely to pass)
  3. It passes the additional checks done by V8, and V8 can actually run it just fine (and once the V8 version in Node.js is upgraded this is likely to stop working)
  4. The CRC32 of this new code cache must also conflict with the original code cache (finding CRC32 conflicts isn't that hard, but finding CRC32 conflicts that meet 1-3 is hard)

TBH it seems a bit silly for an attacker to waste all their effort on this when they can already just replace the source code on disk if they have access to the file system. Also, this is already what people have been doing in the wild via e.g. https://www.npmjs.com/package/v8-compile-cache with 12M downloads/week, so if there are any security concerns, well it's already out there for many years, and it's unlikely that the ecosystem will just stop relying on a capability like this.

How does this feature interact with loaders/require hooks that modify the file contents and in particular passing them in different order? (reading the code it should work but want to make sure)

If they are using the "official" hooks, e.g. module.register the modules will be cached and the resource name & module content cached match the results coming out of the loaders. If it's monkey-patching CJS hooks this will be ignored if they patch Module.wrap, Module.wrapper or Module.prototype._compile (without calling the original version, this means it's a no-op if v8-compile-cache is used since that package does this), but otherwise it should work fine.

@merceyz
Copy link
Member

merceyz commented Apr 16, 2024

Would it be feasible to add a way to enable this for the current process only, perhaps as a follow-up?

For context Corepack uses v8-compile-cache to speed up the loading of package managers but doesn't modify process.env to avoid affecting anything said package managers might spawn.

@joyeecheung
Copy link
Member Author

Would it be feasible to add a way to enable this for the current process only, perhaps as a follow-up?

Yeah I think we can discuss other ways to enable this (or a possible solution to enable it by default and pointing it to somewhere that's less likely to fill up the disk e.g. the tmpdir) in a follow-up. For the initial PR an opt-in environment variable should be enough.

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 17, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 17, 2024
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung joyeecheung added the notable-change PRs with changes that should be highlighted in changelogs. label Apr 17, 2024
Copy link
Contributor

github-actions bot commented Apr 17, 2024

The notable-change PRs with changes that should be highlighted in changelogs. label has been added by @joyeecheung.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section.

Joyee: See the PR description

targos pushed a commit to targos/node that referenced this pull request Apr 18, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
@lirantal
Copy link
Member

Good discussion and a cool performance improvement @joyeecheung, congrats!

Expressing some of my security concerns thoughts:

What happens on overflows or tampering with the cache file? Is there a security concern here regarding passing different code through the cache file (since the hash is just a crc32)?

The cache will be silently ignored if the hash doesn't match

It would be helpful for security audit trail to record in some way (warning, trace, etc) that a cache entry failed matching.

Other aspects worth considering, even if only through mentions in the documentation so that these risks are well communicated:

  • Would child processes also inherit the enabled cache mode?
  • It seems like this shouldn't conflict with experimental policies but raising just in case. For example, would the experimental policy throw an error for loaded cached code because there's no integrity specified/calculated for it?
  • What are the security controls guarding against a denial of service? for example, attackers create large code files that would cause more cached code files to be created. Since the threat model already trusts the OS I get this is not in regular scope but potentially be in effect if child processes are inheriting code cache and in between an attacker creates this scenario (still out of scope I guess?)
  • Where are the code cache files stored and how are they protected? If they are created in a predictable path and a vulnerability such as path traversal exists in the application, it could be used to exfiltrate the compiled code.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 18, 2024

It would be helpful for security audit trail to record in some way (warning, trace, etc) that a cache entry failed matching.

It can be traced using NODE_DEBUG_NATIVE=COMPILE_CACHE, see the tests

Would child processes also inherit the enabled cache mode?

Only if they inherit the environment variables (child process env options are defaulted to process.env, but once you specify it you need to copy process.env into the new options).

For example, would the experimental policy throw an error for loaded cached code because there's no integrity specified/calculated for it?

We can just skip it #52577 - the cache should be quiet by default, failures would just lead to no-ops.

What are the security controls guarding against a denial of service? for example, attackers create large code files that would cause more cached code files to be created.

If they already have access to the file system they might as well just modify the original module to run an infinite loop?

Where are the code cache files stored and how are they protected?

This is up to the user, it's currently opt-in via the environment variable NODE_COMPILE_CACHE, and the names of subdirectories change depending on the Node.js version and the filename and the module type. If an attacker can already get the filename of the module, they might as well just...modify the module?

@lirantal
Copy link
Member

Thanks Joyee, appreciate all the input.

nodejs-github-bot pushed a commit that referenced this pull request Apr 19, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52293
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
targos pushed a commit to targos/node that referenced this pull request Apr 19, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
marco-ippolito pushed a commit that referenced this pull request Apr 19, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52293
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
targos pushed a commit to targos/node that referenced this pull request Apr 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
nodejs-github-bot pushed a commit that referenced this pull request Apr 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52465
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit that referenced this pull request Apr 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52465
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
@joyeecheung joyeecheung mentioned this pull request Apr 25, 2024
11 tasks
aduh95 pushed a commit that referenced this pull request Apr 29, 2024
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
@aduh95 aduh95 added the backported-to-v22.x PRs backported to the v22.x-staging branch. label Apr 30, 2024
aduh95 added a commit that referenced this pull request Apr 30, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: TODO
aduh95 added a commit that referenced this pull request May 1, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768
aduh95 added a commit that referenced this pull request May 1, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768
aduh95 added a commit that referenced this pull request May 1, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768
aduh95 added a commit that referenced this pull request May 2, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768
targos pushed a commit that referenced this pull request May 2, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768
Ch3nYuY pushed a commit to Ch3nYuY/node that referenced this pull request May 8, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280

PR-URL: nodejs#52768
@marco-ippolito marco-ippolito added the backport-blocked-v20.x PRs that should land on the v20.x-staging branch but are blocked by another PR's pending backport. label May 23, 2024
EliphazBouye pushed a commit to EliphazBouye/node that referenced this pull request Jun 20, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280

PR-URL: nodejs#52768
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: nodejs#52465
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280

PR-URL: nodejs#52768
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-blocked-v20.x PRs that should land on the v20.x-staging branch but are blocked by another PR's pending backport. backported-to-v22.x PRs backported to the v22.x-staging branch. commit-queue-failed An error occurred while landing this pull request using GitHub Actions. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.