-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: add synchronous hooks proposal #198
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! This seems close to being ready to implement.
Co-authored-by: Geoffrey Booth <webadmin@geoffreybooth.com>
After some thoughts I think the higher level function exports(url, context) { ... }
function requires(specifier, context, nextRequires) {
// requires hook can choose to run something here, exports can't.
// If hooks want to simply hijack a request and return early, skipping
// resolve and load, they can do it here. They also don't need to care
// about how different specifiers can be mapped to the same URL.
if (specifier === 'pnpapi') {
return { url: 'some://dummy/url', exports: pnpApiObject };
}
const result = nextRequires(specifier, context); // Invokes resolve and load steps
exports(result.url, result);
} Or to re-implement the after-load function linkAfterLoad(url, context) { ... }
const pnpApiModule = new vm.SyntheticModule([...], () => {
pnpApiModule.setExports('default', pnpApiObject);
});
function link(specifier, context, nextLink) {
// link hook can choose to run something here, linkAfterLoad can't.
// If hooks want to simply hijack a request and return early, skipping
// resolve and load, they can do it here. They also don't need to care
// about how different specifiers can be mapped to the same URL.
if (specifier === 'pnpapi') {
return { url: 'some://dummy/url', module: pnpApiModule };
}
const result = nextLink(specifier, context); // Invokes resolve and load steps
linkAfterLoad(result.url, result);
} In addition, if these hooks need to invoke say |
I like this assessment. |
Let’s say I want to write CoffeeScript and have it instrumented. The CoffeeScript hooks might implement just As an author, personally what I find easiest to follow is hooks that correspond to steps in the process. So if there’s a step before resolution, and a step after loading, then those can be additional hooks that still run in the sequence: something before resolution, resolve, load, something after loading. |
I am suggesting to have |
After attempting to prototype the high-level wrappers in the ESM loader I realized that there are some nuances in the current design - the
|
How does that work for loaders that are prerequisite for subsequent loaders to work? For example let's say a TS loader needing to resolve & load files from within zip archives? |
I like this idea. I find the |
The alternative is to provide the |
Can you describe how the module request would look like in this case? Would that just override the resolution/loading completely or still use part of the default logic? For example assuming that the request is function beforeResolve(specifier, context) {
if (specifier === 'foo') {
const url = 'file:///path/to/package.zip?file=foo.ts'; // Custom resolution
return { format: 'zip+typescript', url, shortCircuit: true };
}
return { }; // do nothing, and the default resolution logic gets used.
}
function beforeLoad(url, context) {
if (context.format === 'zip+typescript') {
const source = unzipAndTranspile(url); // Unzip path/to/package.zip, and transpile foo.ts
return { source, format: 'commonjs', shortCircuit: true };
}
return { }; // do nothing, and the default loading logic gets called.
} (Not sure if the search params can be preserved in CJS loader yet, given the compat issue of Or if you are thinking about composing a zip loader + a typescript loader: // Zip loader
function beforeResolve(specifier, context) {
if (specifier === 'foo') {
const url = 'file:///path/to/package.zip/foo.ts'; // Custom resolution
return { format: 'zip', url, shortCircuit: true };
}
return { }; // do nothing, and the default resolution logic gets used.
}
function beforeLoad(url, context) {
if (context.format === 'zip') {
const source = unzip(url); // Unzip path/to/package.zip, and locate foo.ts
return { source, shortCircuit: true };
}
return { }; // do nothing, and the default loading logic gets called.
}
// TypeScript loader
function afterLoad(url, source, context) {
if (url.endsWith('.ts')) {
return { source: transpile(source), format: 'commonjs' };
}
return { }; // do nothing, and the default loading logic gets called.
} (I think we might need to separate |
Yeah one alternative is to provide function load(url, context, nextLoad) {
if (url.startsWith('http')) {
const source = fetchSync(url);
return { source, shortCircuit: true };
}
return nextLoad(url, context);
} ..which needs to be ensured to be at the bottom of the hook chain (next to the default one). One downside of this is the fetching would be sequential and blocking even for ESM which is designed to make parallel loading of module requests in the same module possible. Theoretically with a slightly different design I think it's still possible to make the fetching part almost parallel: function load(url, context, nextLoad) {
if (url.startsWith('http')) {
const syncResponse = fetchSyncPooled(url); // This adds a fetch request to a pool
return {
// This blocks until the result is fetched; by default if this is the only load hook, the ESM loader
// runs the hooks to get the source provider functions first (then the requests are all started)
// and then run over the source provider functions to get the source (and start blocking on the
// first module request).
getSource() { return syncResponse.readAsBuffer(); } ,
shortCircuit: true
};
}
return nextLoad(url, context);
} This would require significant refactoring in the ESM loader to make the creation of ModuleWraps lazy but it allows the loader to start the fetch requests sequentially fairly quickly in a pool (without waiting for the previous module request to actually complete), and then the worker pool can handle the requests in parallel. Any On the other hand I think this might be a more compelling reason to split the hooks - with the // This needs to be the last of the beforeLoad chain:
function beforeLoad(url, context) {
if (url.startsWith('http')) {
const syncResponse = fetchSyncPooled(url); // This adds a fetch request to a pool
responseMap.set(url, syncResponse);
return {
source: 'dummy',
shortCircuit: true
};
}
return {};
}
// Node.js can run the beforeLoad eagerly for the module requests, and
// only start running afterLoad when the all beforeLoads of parallel module
// requests are run
// This needs to be the first of the afterLoad chain:
function afterLoad(url, source, context, nextLoad) {
if (url.startsWith('http')) {
// source was dummy at this point.
const syncResponse = responseMap.get(url);
return {
source: syncResponse.readAsBuffer(),
};
}
return {};
} |
Would it make sense to have a think like the "link" hook that works for both module formats? I think we would generally prefer a single interface to work with, even if it's a bit sub-optimal for certain cases. We want to be able to apply CJS-based patches to ESM rewrites when they happen without needing to rewrite the patches. That's not a huge deal though. |
I don't think so, because this is where one of the key difference of CJS and ESM lies: ESM linking (dependency resolution) is designed to be done before evaluation, for all module requests in the module, which relies on static analysis based on the import syntax, and it's also what allows parallel loading of ESM dependencies; CJS dependency resolution is done during evaluation (in the form of |
I'm aware of the difference in order/timing. I was thinking more if some interface could exist which abstracts that away. Like if you allow the CJS side to see what the export names are without directly accessing them you could do a similar "replace when available" kind of semantic as we have to do with ESM. I feel like a similar shaped API should be possible between the two, it just might not be quite as friendly. 🤔 Anyway, two APIs are probably liveable, we'll just have to make sure our instrumentations can handle both scenarios with (hopefully) minimal breakage. |
That sounds cool but I think it would be more flexible if the built-in loader API provides the low-level hooks, and if a unified hook is desired, some user-land package can be developed on top of them (like a package that just unify import-in-the-middle and require-in-the-middle). Otherwise by only providing high-level unified hooks in core we will deprive the ability to do low-level customization completely from developers who need them, and only cater to the needs of developers who don't care about the differences. |
Fair point. I'd be happy to help with constructing whatever that higher-level thing is later, but probably would want that to live in the Node.js org just to have an unbiased home for it that all the APM vendors can contribute to it. (Similar to what we're trying to do by donating IITM at the moment.) |
I did some preliminary testing to eyeball what performance implications this may have, and it seems to have a non-trivial degradation to performance: JakobJingleheimer/nodejs-loaders#16 Hook execution time (on node v22.4.1) increases ~14%. This is potentially offset by eliminating startup cost from the loader worker thread, but something to consciously consider. The cause behind the increase may perhaps be specific to esbuild's implementation: they do have warnings in the docs that |
I suspect that's already where the cost is coming from, because spawning a worker is not cheap, and this test is spawning that worker for nothing. Also, with this model you are going to do parallelisation differently - you can spawn multiple workers, and do the transform actually in parallel using the BFS iteration order. In the off-thread model you are not getting actual parallelisation, because there is only one loader worker, and the computation there is single threaded. Even if you post transpilation for 3 files to that worker, the single loader worker would have to chew the bytes of them one by one. The right way to do parallelisation is to spawn 3 workers (more or less) and you post transpilation to them using a worker pool abstraction, so that you actually have 3 threads doing transpilation in parallel, not just one loader thread choking on all the computational work. |
For tsx in particular, I think you should compare their CJS hooks v.s. the ESM hooks, because the CJS hooks are built on top of monkey patching, and is what this proposal is effectively replacing (or that's the top priority of this proposal). My local tests show that at least the synchronous CJS hooks are currently ~2.5x faster than the asynchronous ESM hooks to load a very basic file.
|
Co-authored-by: Geoffrey Booth <webadmin@geoffreybooth.com>
} | ||
``` | ||
|
||
## `exports` (require-only): from compiled CJS wrapper to exports object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some experiments with this, and I think this can probably be repurposed as an "execute"/"evaluate" hook.
/**
* @typedef {{
* thisValue: object,
* parameters: { exports: object, require: function, module: object, __filename: string, __dirname: string }
* }} ModuleExecuteContext
*/
/**
* @typedef {{
* exports: object,
* shortCircuit?: boolean
* }} ModuleExecuteResult
*/
function execute(url, context, nextExecute) {
// ...
}
This would still work basically the same for the example mock of require-in-the-middle
below, but it also allows users to override how the CJS wrapper gets invoked. This is still CJS-only, because in the case of ESM, V8 doesn't give us a way to hook into the evaluation of every module in the graph - we only get to control the evaluation of the root module.
In any case this might need a bit more thought, so I'll keep it off the initial PR.
The latest prototype is in https://github.com/joyeecheung/node/tree/sync-hooks-6 - I've got it working for ESM too, still need a bunch of testing to be opened as PR tough. |
This lays the foundation for adding synchronous hooks proposed in nodejs/loaders#198. In addition this corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO.
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. In addition this corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO.
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire(). PR-URL: #54769 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Stephen Belanger <admin@stephenbelanger.com> Reviewed-By: James M Snell <jasnell@gmail.com>
cc @nodejs/loaders
Spinned from nodejs/node#52219 - I ended up writing it down as a proposal here, because there are some high-level questions and re-design ideas after I looked into existing usages of CJS monkey-patching. While
resolve
andload
hooks work well enough for some cases, it also seems common in the wild to just overridespecifier -> module
mapping directly, so I sketched out some higher level hooks as alternatives here as well.