-
Notifications
You must be signed in to change notification settings - Fork 66
proposal: Staged require() with lifecycle hooks #56
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,330 @@ | ||
| Title | Staged require() with lifecycle hooks | | ||
|--------|---------------------------------------| | ||
| Author | @qard | | ||
| Status | DRAFT | | ||
| Date | 2017-05-05 | | ||
|
||
# Proposal | ||
|
||
There are two major purposes to this proposal: | ||
- To provide module loading lifecycle hooks | ||
- To split resolving, loading and compilation concerns | ||
|
||
# Motivations | ||
|
||
Currently some very popular userland modules depend heavily on monkey-patching | ||
to provide their functionality. This can be unsafe, inflexible and damaging to | ||
performance. | ||
|
||
Some examples include: | ||
|
||
- Tracing, debugging and monitoring | ||
- Code coverage | ||
- Mocking | ||
- Transpiling | ||
|
||
### Safety | ||
|
||
Monkey-patching can be unsafe for many reasons. | ||
|
||
Sometimes the behaviour of code changes based on the parameter length of a | ||
function given as input to another, such as the `(req, res)`, `(req, res, next)` | ||
and `(err, req, res, next)` route handler signatures in express, which will | ||
switch the first argument to being an error instance if the function parameter | ||
arity is 4. Monkey-patching express route handlers properly requires branching | ||
instrumentation patches which provide different patched functions to match the | ||
length of the original route handler. This is easy to miss or get wrong, which | ||
will generally result in a crashed app. | ||
|
||
Sometimes patches can interfere with getter/setter behaviour. An example of this | ||
is `pg.native` mutating the rest of the postgres driver module to expect the | ||
native bindings to be used. Dynamic instrumentation is unaware of what the app | ||
might want to do in the future, so it needs to patch this optimistically, but | ||
if one naively tries to just do `const oldProp = pg.native; pg.native = newProp` | ||
it will trigger the getter behaviour, breaking the JS interface. Again, this is | ||
easy to miss, likely resulting in a crashed app. | ||
|
||
It can be difficult to differentiate between async callbacks that need code to | ||
maintain barrier linkage and sync callbacks which do not. Some interfaces also | ||
accept multiple function arguments or callbacks stored in an options object | ||
which could contain other functions, so dynamically identifying callbacks that | ||
need to be instrumented can be unreliable or even impossible. | ||
|
||
Any form of queueing, even simply putting a callback in an array to be called | ||
elsewhere, will break continuation linking and requires manual intervention to | ||
maintain that linkage which is not always possible for dynamic instrumentation. | ||
|
||
Most APM providers try/catch constantly because the existing solutions for | ||
tracking continuation lose context regularly. We've done the best we can with | ||
monkey-patching, but we've hit a wall and have been pushing for things like | ||
`async-hooks` to improve the situation. However, `async-hooks` only solves part | ||
of the problem, linking the callsite to the callback. However, context on what | ||
was happening still requires monkey-patching, and there are plenty of situations | ||
where context-loss is not the hard part. | ||
|
||
At the code text level, applying heuristic behaviour analysis is possible to | ||
more deeply understand the complex interaction of the code, enabling a much | ||
wider range of possibilities in terms of dynamic instrumentation. | ||
|
||
### Performance | ||
|
||
Monkey-patching is prone to performance issues. The vast majority requires | ||
wrapping things in closures. More complex instrumentation can often trigger | ||
deoptimization. Sometimes heavy runtime checking is required with constant | ||
dynamic function generation and reassignment, which can put a lot of undue | ||
stress on the optimizer and garbage collector. | ||
|
||
If one could intercept the module loader and apply code transformations directly | ||
to the code text before compilation, there's a great deal that could be gained. | ||
Most, if not all, run-time checking would simply disappear. Most closures would | ||
disappear. Context loss would mostly disappear because the transformer could | ||
take advantage of lexical scope to describe linkage directly at callsites. | ||
|
||
### Flexibility | ||
|
||
Many things are just not possible to any reasonable capacity relying purely on | ||
monkey-patching. Often things which could provide valuable coverage or tracing | ||
insight are simply not exposed. | ||
|
||
An example of a difficult thing to apply tracing hooks to is the timing from | ||
when a socket first enters JS in `net.js` to when it switches to a TLS socket. | ||
One could listen for the `secureConnection` event, but there are several | ||
failure paths between the `TLSSocket` constructor and the event handler being | ||
triggered, which could result in context loss. One might also think to patch | ||
`TLSSocket.prototype._init`, however that will be called by both servers and | ||
clients, which are difficult to differentiate from in this case. To correctly | ||
identify and correlate a net socket to a corresponding TLS upgrade takes | ||
extensive monkey-patching with a lot of run-time checks in a major hot-path. | ||
With an AST transform, a single line of code could be inserted | ||
[before the `TLSSocket` construction][1] in the `Server` constructor. | ||
|
||
### ES Modules | ||
|
||
A big issue on the horizon is that, if ES Modules are adopted, modules become | ||
immutable at build time. This means that the current monkey-patching approach | ||
will not work anymore. Members of TC39 have already suggested AST transforms | ||
to deal with this issue. | ||
|
||
Another concern with ES Modules is the difference in loader behaviour. This is | ||
where splitting resolution and loading concerns from compilation is valuable. | ||
|
||
# Implementation | ||
|
||
There are a few possible implementations to discuss, both of which rely on a | ||
`ModuleRequest` object to express the module loading lifecycle. Initially it | ||
would only include the `parent` and `inputPath` fields. Other fields would be | ||
populated in the different lifecycle stages. | ||
|
||
### `ModuleRequest` | ||
|
||
##### `moduleRequest.parent` | ||
|
||
The parent module which initiated the `require()` request. | ||
|
||
##### `moduleRequest.inputPath` | ||
|
||
The input string to the `require()` call or `import` statement which initiated | ||
this module request. | ||
|
||
##### `moduleRequest.resolvedPath` | ||
|
||
The fully resolved file path, as determined by the `require.resolvers` handler. | ||
This would be added to the module request object during the `resolver` stage. | ||
|
||
##### `moduleRequest.contents` | ||
|
||
The content string (or buffer?) of the module. This field is populated during | ||
the `loader` stage. | ||
|
||
##### `moduleRequest.module` | ||
|
||
The final module object, built by the `compiler` stage. | ||
|
||
## Hook-based | ||
|
||
The idea here is to go for something similar to `Module._extensions`, but with | ||
three stages: resolving, loading and compiling. | ||
|
||
### `require.resolvers` | ||
|
||
It is the responsibility of `require.resolvers` extension handlers to receive | ||
the in-process `Module` object and use the metadata on it to figure out how to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When referencing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep. I'll make some revisions soon. Pretty busy for the next few days! 😱 |
||
resolve a load. For `require()` relative location loading, it'd simply look at | ||
the `parent` and resolve the input string of the `require()` call relative to | ||
the `parent` module. | ||
|
||
This would roughly correspond to the current `Module._resolveLookupPaths` | ||
implementation, but it should instead receive the module request object which | ||
it will modify in-place. | ||
|
||
The first resolver to return a non-null value would decide the `resolvedPath` | ||
value in subsequent stages. | ||
|
||
```js | ||
const mockedPaths = { | ||
'foo.js': path.join(__dirname, 'mocked-foo.js') | ||
} | ||
|
||
require.resolvers.push(({ inputPath }) => { | ||
if (Object.keys(mockedPaths).includes(inputPath)) { | ||
return mockedPaths[inputPath] | ||
} | ||
}) | ||
``` | ||
|
||
### `require.loaders` | ||
|
||
The `require.loaders` extension handlers receive the `Module` object after the | ||
`resolver` stage and use the resolved path to attempt to load the contents of | ||
the module file. | ||
|
||
Returning `null` would skip to the next loader in the pipeline. If no loader | ||
returns a value, a `MODULE_NOT_FOUND` error is thrown. | ||
|
||
```js | ||
require.loaders.push(({ resolvedPath }) => { | ||
if (!/\.js$/.test(resolvedPath)) return | ||
try { | ||
return fs.readFileSync(resolvedPath).toString() | ||
} catch (err) { | ||
if (err.code === 'ENOENT') return | ||
throw new Error(`module load error: ${err.message}`) | ||
} | ||
}) | ||
``` | ||
|
||
NOTE: How native modules would be handled in the three-stage design is a bit | ||
unknown, as they don't have a textual intermediate representation, they are | ||
loaded directly from the file path to a `.node` file to a JS object in C++ via | ||
`process.dlopen`. It might make sense for native modules to have a `compiler` | ||
and not a `loader`. | ||
|
||
### `require.transformers` | ||
|
||
The `require.transformers` extension handlers receive the `Module` object after | ||
the `loader` stage and alter the text content of the module. | ||
|
||
```js | ||
require.transformers.push(({ contents, resolvedPath }) => { | ||
if (!/\.js$/.test(resolvedPath)) return | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, returning undefined means no changes. Could work either way though. |
||
return contents.replace( | ||
/module\.exports\s+\=/, | ||
// Don't forget that `exports` object now! | ||
'exports = module.exports =' | ||
) | ||
}) | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will c++ addons go through the loader? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. My thinking was the single line extensions handler currently used would become a compiler and it'd skip the loader step. Not sure if that makes sense though. |
||
|
||
### `require.compilers` | ||
|
||
The `require.compilers` extension handlers receive the `ModuleRequest` object | ||
after the `transform` stage and pass the `contents` into a compiler function | ||
to build the final module object. | ||
|
||
```js | ||
require.compilers.push(({ contents, resolvedPath }) => { | ||
if (!/\.yaml$/.test(resolvedPath)) return | ||
const module = new Module() | ||
module.filename = resolvedPath | ||
module.exports = yaml.parse(contents) | ||
return module | ||
}) | ||
``` | ||
|
||
## Loader objects | ||
|
||
Another possible implementation is an object-based design, which would allow for | ||
easier managing of dependency order between loaders by iterating through the | ||
loaders array and inspecting `name`. This would be helpful for manipulating the | ||
order of the loaders in situations like when you need to ensure your code | ||
coverage instrumentation is applied after a babel transpile. | ||
|
||
I prefer this approach, but it's a bit more involved and may be difficult to | ||
achieve without backwards-incompatible changes. | ||
|
||
```js | ||
class JSONConfigLoader { | ||
constructor({ base }) { | ||
super() | ||
this.name = 'JSONConfigLoader' | ||
this.base = base | ||
} | ||
|
||
// Receives a `ModuleRequest` object | ||
// Return a value to set `resolvedPath` | ||
// Return null to remove loader from remainder of this request pipeline | ||
resolve({ inputPath }) { | ||
if (!/^config\:/.test(inputPath)) return | ||
return `${this.base}/${path.slice(7)}.json` | ||
} | ||
|
||
// Return a string to set `contents` to pass along to subsequent stages | ||
// Return null to remove loader from remainder of this request pipeline | ||
// Throw to indicate module loading failure (other than not found) | ||
load({ resolvedPath }) { | ||
try { | ||
return fs.readFileSync(resolvedPath).toString() | ||
} catch (err) { | ||
if (err.code === 'ENOENT') return | ||
throw new Error(`module load error: ${err.message}`) | ||
} | ||
} | ||
|
||
// Return transformed contents | ||
// Throw to indicate module transform failure | ||
transform({ contents, resolvedPath }) { | ||
if (!contents.contains('testing')) { | ||
throw new Error('module transform error: "testing" not found') | ||
} | ||
|
||
return contents.replace(/testing/, 'test') | ||
} | ||
|
||
// Return the final compiled module object | ||
// Throw to indicate parse/compile failure | ||
compile({ contents, resolvedPath }) { | ||
try { | ||
const module = new Module() | ||
module.filename = resolvedPath | ||
module.exports = JSON.parse(contents) | ||
return module | ||
} catch (err) { | ||
throw new Error(`module compile error: ${err.message}`) | ||
} | ||
} | ||
} | ||
|
||
// Add the custom loader to the loaders array | ||
require.loaders.push(new JSONConfigLoader({ | ||
base: path.join(process.cwd(), 'configs') | ||
})) | ||
|
||
const myConfig = require('config:some-custom-config') | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also seems like it should be a stack. If the code was .ts, it might need compilation to .js, but after it's compiled to js, an APM that instruments js would want to get it so it can be rewritten for instrumentation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, my original PR went with a stacked design. I've been going back and forth and if that's much better than just wrapping the existing function repeatedly. It's a trade off of making all requires a bit slower with stack checks/iteration or just making patched environments slightly slower. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be possible to optimize inside module. If the array has only one member, the function used to iterate the array can be replaced directly with the only function in the array. For arrays longer than one, we might even be able to pre-compose a function out of the members of the array (basically, cacheing them to avoid repeated iteration). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitely feel like the stacked interface is friendlier. Maybe I'll put that in tomorrow and have a couple alternative suggestions to discuss and narrow down on. |
||
|
||
This example is implemented as a class, but it could also just be a simple | ||
object. | ||
|
||
Each stage handler would receive a `ModuleRequest` object, with each handler | ||
corresponding to an assignment to a particular property of that request. The | ||
`resolve` method sets `resolvedPath`, the `load` method sets `contents`, the | ||
`transform` method alters `contents`, the `compile` method sets `module`. | ||
|
||
Returning `null` from any of `resolve`, `load`, `transform` or `compile` will | ||
result in that loader being removed from the remainder of the pipeline for that | ||
load request. However, any of `load`, `transform` and `compile` can be left | ||
unimplemented to skip just that step. Returning a value from `load` will skip | ||
the remaining loaders in the `load` stage and proceed to the `transform` stage. | ||
Similar to `load`, the first `compile` to return a non-null value will skip the | ||
remainder of the compile step and that value will become the final module. | ||
|
||
# Notes | ||
|
||
- The `ModuleRequest` object could possibly also just be a `Module` instance, | ||
but the current resolution stage occurs before attempting to build a module | ||
instance, so it's possible backwards-incompatible changes may be required to | ||
do that, thus the safer approach of an entirely new object. | ||
- For the object-based style, it might also be a good idea to, rather than have | ||
the `transform` method, have `preCompile` and `postCompile` transformers. | ||
|
||
[1]: https://github.com/nodejs/node/blob/master/lib/_tls_wrap.js#L829 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You probably wanna future proof this link by tying it to a specific git ref (it's already broken) - tip: press There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, I'll fix that soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminds me of https://www.lua.org/manual/5.1/manual.html#pdf-package.loaders