Skip to content
This repository has been archived by the owner on Jul 31, 2018. It is now read-only.

proposal: Staged require() with lifecycle hooks #56

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
192 changes: 192 additions & 0 deletions XXX-staged-require-with-lifecycle-hooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
| Title | Staged require() with lifecycle hooks |
|--------|---------------------------------------|
| Author | @qard |
| Status | DRAFT |
| Date | 2017-05-05 |

# Proposal

There are two major purposes to this proposal:
- To provide module loading lifecycle hooks
- To split resolving, loading and compilation concerns

# Motivations

Currently some very popular userland modules depend heavily on monkey-patching
to provide their functionality. This can be unsafe, inflexible and damaging to
performance.

Some examples include:

- Tracing, debugging and monitoring
- Code coverage
- Mocking
- Transpiling

### Safety

Monkey-patching can be unsafe for many reasons. Sometimes code the behaviour
of code changes based on parameter length of input functions--express is a good

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you are trying to say about n-arity of functions, but the grammar is a bit garbled here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on it. Thanks for the input. 😅

example of that. Sometimes patches can interfere with getter/setter behaviour.
It can be difficult to differentiate between async callbacks that need code to
maintain barrier linkage and sync callbacks which do not. Some interfaces
accept multiple function arguments and dynamically identifying which is a
callback that needs to be instrumented can be unreliable. Any form of queueing,
even simply putting a callback in an array to be called elsewhere, will break
continuation linking and requires manual intervention to maintain that linkage
which is not always possible for dynamic instrumentation.

Most APM providers try/catch constantly because the existing solutions for
tracking continuation lose context regularly. We've done the best we can with
monkey-patching, but we've hit a wall and have been pushing for things like
`async-hooks` to improve the situation. However, `async-hooks` only solves part
of the problem--linking the callsite to the callback--but context on what was
happening still requires monkey-patching, and there are plenty of situations
where context-loss is not the hard part.

### Performance

Monkey-patching is prone to performance issues. The vast majority requires
wrapping things in closures. More complex instrumentation can often trigger
deoptimization. Sometimes heavy runtime checking is required with constant
dynamic function generation and reassignment, which can put a lot of undue
stress on the optimizer and garbage collector.

If, for example, one could intercept the module loader and apply an AST
transform before compilation, there's a great deal that could be gained.
Most Run-time checking would simply disappear. Most closures would disappear.
Context loss would disappear because the transformer could take advantage of
lexical scope to describe linkage directly at callsites.

### Flexibility

Many things are just not possible to any reasonable capacity relying purely on
monkey-patching. Often things which could provide valuable coverage or tracing
insight are simply not exposed.

An example of a difficult thing to apply tracing hooks to is the timing from
when a socket first enters JS in `net.js` to when it switches to a TLS socket.
One could listen for the `secureConnection` event, but there are several
failure paths between the `TLSSocket` constructor and the event handler being
triggered, which could result in context loss. One might also think to patch
`TLSSocket.prototype._init`, however that will be called by both servers and
clients, which are difficult to differentiate from in this case. To correctly
identify and correlate a net socket to a corresponding TLS upgrade takes
extensive monkey-patching with a lot of run-time checks in a major hot-path.
With an AST transform, a single line of code could be inserted
[before the `TLSSocket` construction][1] in the `Server` constructor.

### ES Modules

A big issue on the horizon is that, if ES Modules are adopted, modules become
immutable at build time. This means that the current monkey-patching approach
will not work anymore. TC39 has already suggested using AST transformation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a link to this discussion, or other ref?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It came up in one-on-one discussion with @bmeck at one point. Not sure if that is doc'd somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to deal with this issue.

Another concern with ES Modules is the difference in loader behaviour. This is
where splitting resolution and loading concerns from compilation is valuable.

# Implementation

The idea is to go for something similar to the existing `Module._extensions`
interface, but with three stages: resolving, loading and compiling.

Each stage would interact with a `ModuleRequest` object which expresses the
lifetime of a module request through the `resolve`, `load` and `compile` stages.
Initially, it would only include the `parent` and `inputPath` fields. Other
fields would be populated in the different lifecycle stages.

### `ModuleRequest`

##### `moduleRequest.parent`

The parent module which initiated the `require()` request.

##### `moduleRequest.inputPath`

The input string to the `require()` call or `import` statement which initiated
this module request.

##### `moduleRequest.resolvedPath`

The fully resolved file path, as determined by the `require.resolvers` handler.
This would be added to the module request object during the `resolver` stage.

##### `moduleRequest.contents`

The content string (or buffer?) of the module. This field is populated during
the `loader` stage.

##### `moduleRequest.module`

The final module object, built by the `compiler` stage.

### `require.resolvers`

It is the responsibility of `require.resolvers` extension handlers to receive

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the in-process `Module` object and use the metadata on it to figure out how to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When referencing the Module object, did you mean ModuleRequest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'll make some revisions soon. Pretty busy for the next few days! 😱

resolve a load. For `require()` relative location loading, it'd simply look at
the `parent` and resolve the input string of the `require()` call relative to
the `parent` module.

This would roughly correspond to the current `Module._resolveLookupPaths`
implementation, but it should instead receive the module request object which
it will modify in-place.

```js
const mockedPaths = {
'foo.js': path.join(__dirname, 'mocked-foo.js')
}

const originalResolver = require.resolvers['require']
require.resolvers['require'] = function customResolver(request) {
if (Object.keys(mockedPaths).includes(request.inputPath)) {
return mockedPaths[request.inputPath]
}
originalResolver(request)
}
```

By defining named behaviour specific to `require()`, we open the door for
supporting the different behaviour of `import` in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should be an array of resolvers, where if a resolver fails to resolve, the next one is tried?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, the API only makes sense if in the future there will be a key that is not "require", which is a bit YAGNI, unless you actually demonstrate that the API will work for a future use-case ("import" is what I assume you are leaving the door open for)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on the failover design. 👍

Import is indeed what I'm thinking of with the named list design. That one is much more of a "maybe" to me than the other bits right now. I feel like there's something in here that ES Modules can benefit from in regards to the loader, but I'm not quite sure what yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for simpler and more generic unless we are certain this will be helpful for imports, and actually, as much as es mods are a WIP, its probably not worth doing this unless we are certain it works for them, or its guaranteed to be "legacy" in a couple years.

And isn't it 2am in Vancouver? ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost 3AM. 😅

Also, my thinking was to put the import stuff in there just to spark some discussion around that. I'm definitely not tied to that part being in the final proposal. I'd really love to hear from other people more deeply involved in the ES Modules discussion than I, since both things are deeply tied to the module loading systems and I'd really like to get some coordination going on here.


### `require.loaders`

The `require.loaders` extension handlers receive the `Module` object after the
`resolver` stage and use the resolved path to attempt to load the contents of
the module file.

```js
const originalLoader = require.loaders['.js']
require.loaders['.js'] = function customLoader(request) {
// This populates the `contents` field
originalLoader(request)

request.contents = request.contents.replace(
/module\.exports\s+\=/,
// Don't forget that `exports` object now!
'exports = module.exports ='
)
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will c++ addons go through the loader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. My thinking was the single line extensions handler currently used would become a compiler and it'd skip the loader step. Not sure if that makes sense though.


### `require.compilers`

The `require.compilers` extension handlers receive the `Module` object after the
`loader` stage and pass the `contents` into a compiler function to build the
final module object.

```js
require.compilers['.yaml'] = function yamlCompiler(request) {
request.module.exports = yaml.parse(request.contents)
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also seems like it should be a stack. If the code was .ts, it might need compilation to .js, but after it's compiled to js, an APM that instruments js would want to get it so it can be rewritten for instrumentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my original PR went with a stacked design. I've been going back and forth and if that's much better than just wrapping the existing function repeatedly. It's a trade off of making all requires a bit slower with stack checks/iteration or just making patched environments slightly slower.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be possible to optimize inside module. If the array has only one member, the function used to iterate the array can be replaced directly with the only function in the array. For arrays longer than one, we might even be able to pre-compose a function out of the members of the array (basically, cacheing them to avoid repeated iteration).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely feel like the stacked interface is friendlier. Maybe I'll put that in tomorrow and have a couple alternative suggestions to discuss and narrow down on.


# Notes

- The `ModuleRequest` object could possibly also just be a `Module` instance,
but the current resolution stage occurs before attempting to build a module
instance, so it's possible backwards-incompatible changes may be required to
do that, thus the safer approach of an entirely new object.

[1]: https://github.com/nodejs/node/blob/master/lib/_tls_wrap.js#L829
Copy link
Member

@watson watson May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably wanna future proof this link by tying it to a specific git ref (it's already broken) - tip: press y while on the github page and it will update the URL for you with a ref

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll fix that soon.