-
Notifications
You must be signed in to change notification settings - Fork 66
proposal: Staged require() with lifecycle hooks #56
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
| Title | Staged require() with lifecycle hooks | | ||
|--------|---------------------------------------| | ||
| Author | @qard | | ||
| Status | DRAFT | | ||
| Date | 2017-05-05 | | ||
|
||
# Proposal | ||
|
||
There are two major purposes to this proposal: | ||
- To provide module loading lifecycle hooks | ||
- To split resolving, loading and compilation concerns | ||
|
||
# Motivations | ||
|
||
Currently some very popular userland modules depend heavily on monkey-patching | ||
to provide their functionality. This can be unsafe, inflexible and damaging to | ||
performance. | ||
|
||
Some examples include: | ||
|
||
- Tracing, debugging and monitoring | ||
- Code coverage | ||
- Mocking | ||
- Transpiling | ||
|
||
### Safety | ||
|
||
Monkey-patching can be unsafe for many reasons. Sometimes code the behaviour | ||
of code changes based on parameter length of input functions--express is a good | ||
example of that. Sometimes patches can interfere with getter/setter behaviour. | ||
It can be difficult to differentiate between async callbacks that need code to | ||
maintain barrier linkage and sync callbacks which do not. Some interfaces | ||
accept multiple function arguments and dynamically identifying which is a | ||
callback that needs to be instrumented can be unreliable. Any form of queueing, | ||
even simply putting a callback in an array to be called elsewhere, will break | ||
continuation linking and requires manual intervention to maintain that linkage | ||
which is not always possible for dynamic instrumentation. | ||
|
||
Most APM providers try/catch constantly because the existing solutions for | ||
tracking continuation lose context regularly. We've done the best we can with | ||
monkey-patching, but we've hit a wall and have been pushing for things like | ||
`async-hooks` to improve the situation. However, `async-hooks` only solves part | ||
of the problem--linking the callsite to the callback--but context on what was | ||
happening still requires monkey-patching, and there are plenty of situations | ||
where context-loss is not the hard part. | ||
|
||
### Performance | ||
|
||
Monkey-patching is prone to performance issues. The vast majority requires | ||
wrapping things in closures. More complex instrumentation can often trigger | ||
deoptimization. Sometimes heavy runtime checking is required with constant | ||
dynamic function generation and reassignment, which can put a lot of undue | ||
stress on the optimizer and garbage collector. | ||
|
||
If, for example, one could intercept the module loader and apply an AST | ||
transform before compilation, there's a great deal that could be gained. | ||
Most Run-time checking would simply disappear. Most closures would disappear. | ||
Context loss would disappear because the transformer could take advantage of | ||
lexical scope to describe linkage directly at callsites. | ||
|
||
### Flexibility | ||
|
||
Many things are just not possible to any reasonable capacity relying purely on | ||
monkey-patching. Often things which could provide valuable coverage or tracing | ||
insight are simply not exposed. | ||
|
||
An example of a difficult thing to apply tracing hooks to is the timing from | ||
when a socket first enters JS in `net.js` to when it switches to a TLS socket. | ||
One could listen for the `secureConnection` event, but there are several | ||
failure paths between the `TLSSocket` constructor and the event handler being | ||
triggered, which could result in context loss. One might also think to patch | ||
`TLSSocket.prototype._init`, however that will be called by both servers and | ||
clients, which are difficult to differentiate from in this case. To correctly | ||
identify and correlate a net socket to a corresponding TLS upgrade takes | ||
extensive monkey-patching with a lot of run-time checks in a major hot-path. | ||
With an AST transform, a single line of code could be inserted | ||
[before the `TLSSocket` construction][1] in the `Server` constructor. | ||
|
||
### ES Modules | ||
|
||
A big issue on the horizon is that, if ES Modules are adopted, modules become | ||
immutable at build time. This means that the current monkey-patching approach | ||
will not work anymore. TC39 has already suggested using AST transformation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you have a link to this discussion, or other ref? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It came up in one-on-one discussion with @bmeck at one point. Not sure if that is doc'd somewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
to deal with this issue. | ||
|
||
Another concern with ES Modules is the difference in loader behaviour. This is | ||
where splitting resolution and loading concerns from compilation is valuable. | ||
|
||
# Implementation | ||
|
||
The idea is to go for something similar to the existing `Module._extensions` | ||
interface, but with three stages: resolving, loading and compiling. | ||
|
||
Each stage would interact with a `ModuleRequest` object which expresses the | ||
lifetime of a module request through the `resolve`, `load` and `compile` stages. | ||
Initially, it would only include the `parent` and `inputPath` fields. Other | ||
fields would be populated in the different lifecycle stages. | ||
|
||
### `ModuleRequest` | ||
|
||
##### `moduleRequest.parent` | ||
|
||
The parent module which initiated the `require()` request. | ||
|
||
##### `moduleRequest.inputPath` | ||
|
||
The input string to the `require()` call or `import` statement which initiated | ||
this module request. | ||
|
||
##### `moduleRequest.resolvedPath` | ||
|
||
The fully resolved file path, as determined by the `require.resolvers` handler. | ||
This would be added to the module request object during the `resolver` stage. | ||
|
||
##### `moduleRequest.contents` | ||
|
||
The content string (or buffer?) of the module. This field is populated during | ||
the `loader` stage. | ||
|
||
##### `moduleRequest.module` | ||
|
||
The final module object, built by the `compiler` stage. | ||
|
||
### `require.resolvers` | ||
|
||
It is the responsibility of `require.resolvers` extension handlers to receive | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
the in-process `Module` object and use the metadata on it to figure out how to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When referencing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep. I'll make some revisions soon. Pretty busy for the next few days! 😱 |
||
resolve a load. For `require()` relative location loading, it'd simply look at | ||
the `parent` and resolve the input string of the `require()` call relative to | ||
the `parent` module. | ||
|
||
This would roughly correspond to the current `Module._resolveLookupPaths` | ||
implementation, but it should instead receive the module request object which | ||
it will modify in-place. | ||
|
||
```js | ||
const mockedPaths = { | ||
'foo.js': path.join(__dirname, 'mocked-foo.js') | ||
} | ||
|
||
const originalResolver = require.resolvers['require'] | ||
require.resolvers['require'] = function customResolver(request) { | ||
if (Object.keys(mockedPaths).includes(request.inputPath)) { | ||
return mockedPaths[request.inputPath] | ||
} | ||
originalResolver(request) | ||
} | ||
``` | ||
|
||
By defining named behaviour specific to `require()`, we open the door for | ||
supporting the different behaviour of `import` in the future. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe should be an array of resolvers, where if a resolver fails to resolve, the next one is tried? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, the API only makes sense if in the future there will be a key that is not "require", which is a bit YAGNI, unless you actually demonstrate that the API will work for a future use-case ("import" is what I assume you are leaving the door open for) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea on the failover design. 👍 Import is indeed what I'm thinking of with the named list design. That one is much more of a "maybe" to me than the other bits right now. I feel like there's something in here that ES Modules can benefit from in regards to the loader, but I'm not quite sure what yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd go for simpler and more generic unless we are certain this will be helpful for imports, and actually, as much as es mods are a WIP, its probably not worth doing this unless we are certain it works for them, or its guaranteed to be "legacy" in a couple years. And isn't it 2am in Vancouver? ;-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Almost 3AM. 😅 Also, my thinking was to put the import stuff in there just to spark some discussion around that. I'm definitely not tied to that part being in the final proposal. I'd really love to hear from other people more deeply involved in the ES Modules discussion than I, since both things are deeply tied to the module loading systems and I'd really like to get some coordination going on here. |
||
|
||
### `require.loaders` | ||
|
||
The `require.loaders` extension handlers receive the `Module` object after the | ||
`resolver` stage and use the resolved path to attempt to load the contents of | ||
the module file. | ||
|
||
```js | ||
const originalLoader = require.loaders['.js'] | ||
require.loaders['.js'] = function customLoader(request) { | ||
// This populates the `contents` field | ||
originalLoader(request) | ||
|
||
request.contents = request.contents.replace( | ||
/module\.exports\s+\=/, | ||
// Don't forget that `exports` object now! | ||
'exports = module.exports =' | ||
) | ||
} | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will c++ addons go through the loader? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. My thinking was the single line extensions handler currently used would become a compiler and it'd skip the loader step. Not sure if that makes sense though. |
||
|
||
### `require.compilers` | ||
|
||
The `require.compilers` extension handlers receive the `Module` object after the | ||
`loader` stage and pass the `contents` into a compiler function to build the | ||
final module object. | ||
|
||
```js | ||
require.compilers['.yaml'] = function yamlCompiler(request) { | ||
request.module.exports = yaml.parse(request.contents) | ||
} | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also seems like it should be a stack. If the code was .ts, it might need compilation to .js, but after it's compiled to js, an APM that instruments js would want to get it so it can be rewritten for instrumentation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, my original PR went with a stacked design. I've been going back and forth and if that's much better than just wrapping the existing function repeatedly. It's a trade off of making all requires a bit slower with stack checks/iteration or just making patched environments slightly slower. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be possible to optimize inside module. If the array has only one member, the function used to iterate the array can be replaced directly with the only function in the array. For arrays longer than one, we might even be able to pre-compose a function out of the members of the array (basically, cacheing them to avoid repeated iteration). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitely feel like the stacked interface is friendlier. Maybe I'll put that in tomorrow and have a couple alternative suggestions to discuss and narrow down on. |
||
|
||
# Notes | ||
|
||
- The `ModuleRequest` object could possibly also just be a `Module` instance, | ||
but the current resolution stage occurs before attempting to build a module | ||
instance, so it's possible backwards-incompatible changes may be required to | ||
do that, thus the safer approach of an entirely new object. | ||
|
||
[1]: https://github.com/nodejs/node/blob/master/lib/_tls_wrap.js#L829 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You probably wanna future proof this link by tying it to a specific git ref (it's already broken) - tip: press There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, I'll fix that soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you are trying to say about n-arity of functions, but the grammar is a bit garbled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on it. Thanks for the input. 😅