Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optionally only wrap modules hooked in --import #146

Merged
merged 22 commits into from
Jul 29, 2024

Conversation

timfish
Copy link
Contributor

@timfish timfish commented Jul 16, 2024

The Problem

import-in-the-middle has been designed to wrap every module to allow hooking of them after they've been imported.

However import-in-the-middle has a couple of fundamental issues which means it will likely remain incompatible with some libraries:

These issues almost certainly cannot be solved with our current wrapping module strategy and are unlikely to get fixed even with the new loader proposals without changes to the ESM spec.

A Solution?

The obvious workaround is to only wrap modules that we later intend to hook!

Since the --loader cli arg is deprecated, many APM vendors (including Sentry) now recommend that users initialise instrumentation of their code via the --import cli arg. For Sentry at least, we register the hook and also initialise all the OpenTelemetry plugins which in turn call Hook with the required libraries.

This PR adds the ability to only wrap modules that are hooked before they are imported. This is almost always the case when hooking via --import.

When registering the iitm hook, you can pass a MessagePort that is passed like this:

import { register } from 'module'
import { Hook, createAddHookMessageChannel } from 'import-in-the-middle'

const { registerOptions, waitForAllMessagesAcknowledged } = createAddHookMessageChannel()

register('import-in-the-middle/hook.mjs', import.meta.url, registerOptions)

Hook(['fs'], (exports, name) => {
 // instrument fs here
});

// Ensure that the loader has acknowledged all the modules before we allow execution to continue
await waitForAllMessagesAcknowledged()

Or more explicitly, you can pass the message port yourself:

const { addHookMessagePort, waitForAllMessagesAcknowledged } = createAddHookMessageChannel()

register('import-in-the-middle/hook.mjs', import.meta.url, { data: { addHookMessagePort, include: [] }, transferList: [addHookMessagePort] })

Now, if there are any calls to Hook with a specific list of modules, these will be added to the include list and only those modules will be wrapped.

This would mean that if you initialise OpenTelemetry via --import, the OpenTelemetry plugins define which modules to wrap!

@timfish
Copy link
Contributor Author

timfish commented Jul 16, 2024

Can you think of any otel instrumentations where this might be problematic?

My first thought was that maybe opentelemetry-instrumentation-aws-lambda might be an issue but looking at the code, as long as the LAMBDA_TASK_ROOT environment variable is there, it should get the correct filename.

@timfish timfish changed the title feat: Option to only instrument hooked modules feat: Optionally only wrap modules hooked in --import Jul 16, 2024
AbhiPrasad
AbhiPrasad previously approved these changes Jul 16, 2024
Copy link

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the README to add details about createAddHookMessageChannel

AbhiPrasad
AbhiPrasad previously approved these changes Jul 16, 2024
Copy link
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bit of a race condition here in that there's no way to be sure the messages have been received by the loader thread before the --import file completes and the entrypoint file begins.

Perhaps there could be some sort of acknowledgement message for each and there could be a promise you can await at the end of that --import file to wait until all acknowledgements have been received before proceeding?

@timfish
Copy link
Contributor Author

timfish commented Jul 16, 2024

I think there's a bit of a race condition here

And even if it's not possible to trigger the race condition now, it remains a possibility in the future!

I've added acknowledgment messages and updated the example in the top post.

@timfish
Copy link
Contributor Author

timfish commented Jul 17, 2024

I'm getting exit code 13 from the new test on some versions in CI but it passes locally:

13: Unsettled Top-Level Await: await was used outside of a function in the top-level code, but the passed Promise never settled.

You can't have the end of execution blocked simply waiting on a promise because it assumes it's never going to resolve?

Comment on lines +58 to +61
const timer = setInterval(() => { }, 1000)
const promise = new Promise((resolve) => {
resolveFn = resolve
}).then(() => { clearInterval(timer) })
Copy link
Contributor Author

@timfish timfish Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timer stops the process exiting with error code 13 which is triggered by blocking exit via top-level await if there are no more active handles. This feels... sub-optimal but I don't know how else we should be working around this!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent little bit trying to avoid this but couldn't come up with a better way. I'm gonna take another look tomorrow morning, but if I can't figure anything else out I'll approve.

@trentm
Copy link
Contributor

trentm commented Jul 26, 2024

issue with internals: true

I think there is also an issue with {internals: true} usage and includeModules.

(This example uses @langchain/core, mainly because this had been a module
used for a separate (but slightly related) issue discussed in require-in-the-middle.
You don't need to know that background for this issue, but if interested see
the second part of
elastic/require-in-the-middle#90 (comment)
starting with "I'm guessing a bit here".)

control case

Put these files in the top-dir of a git clone of import-in-the-middle.

use-langchain-tools.mjs

import * as tools from '@langchain/core/tools'
console.log('use-langchain-tools: Imported @langchain/core/tools:', tools.tool)

instr-langchain-core.mjs

import { register } from 'module'
// import { Hook } from 'import-in-the-middle'
import { Hook, createAddHookMessageChannel } from './index.js'
const { registerOptions, waitForAllMessagesAcknowledged } = createAddHookMessageChannel()

const LIMIT_TO_HOOKED = process.env.LIMIT_TO_HOOKED

if (LIMIT_TO_HOOKED) {
  register('./hook.mjs', import.meta.url, registerOptions)
} else {
  register('./hook.mjs', import.meta.url)
}

Hook(['@langchain/core'], { internals: true }, (mod, name, baseDir) => {
  if (name === '@langchain/core/dist/callbacks/manager.js') {
    console.log('Hooked name=%j: mod.CallbackManager:', name, mod.CallbackManager)
  }
  return mod
})

if (LIMIT_TO_HOOKED) {
  await waitForAllMessagesAcknowledged()
}

This Hook uses internals: true to be able to get at internal files in the @langchain/core package.
In the control case (without using includeModules functionality) this works:

% npm install --no-save @langchain/core
...

% node --import ./instr-langchain-core.mjs use-langchain-tools.mjs
Hooked name="@langchain/core/dist/callbacks/manager.js": mod.CallbackManager: [class CallbackManager extends BaseCallbackManager]
use-langchain-tools: Imported @langchain/core/tools: [Function: tool]

failing case

When I use includeModules (via the functionality in this PR), it no longer hooks:

% LIMIT_TO_HOOKED=true node --import ./instr-langchain-core.mjs use-langchain-tools.mjs
XXX loader: includeModules=["@langchain/core"]
XXX loader: skip out, "file:///Users/trentm/tm/import-in-the-middle2/use-langchain-tools.mjs" not in includeModules
XXX loader: skip out, "@langchain/core/tools" not in includeModules
XXX loader: skip out, "./dist/tools/index.js" not in includeModules
XXX loader: skip out, "zod" not in includeModules
XXX loader: skip out, "../callbacks/manager.js" not in includeModules
... elided about 200 lines ...
use-langchain-tools: Imported @langchain/core/tools: [Function: tool]
...
XXX loader: skip out, "uuid" not in includeModules

This one is a bit more tricky.

Options:

  1. Just document that includeModules (and the related feature in this PR) just doesn't work with internals: true and move on.
  2. Consider wrapping every submodule under the hooked module names. So for example, IITM would resolve that the "../callbacks/manager.js" import reference is inside @langchain/core, which is on includeModules, so it would be wrapped.
  3. Consider allowing specific submodule paths as hook args, e.g. Hook(['@langchain/core/dist/callbacks/manager.js'], ...). (RITM supports this without requiring internals: true.) Then the includeModules handling could wrap just that submodule of @langchain/core, without wrapping every other submodule in the package. Considering this would definitely be a separate issue (i.e. not part of this PR).

@timfish
Copy link
Contributor Author

timfish commented Jul 27, 2024

@trentm thanks for the feedback!

I think I found a bug where node:-prefixed imports are not included when the unprefixed name is hooked

Yes, this is a bug in the previously added feature. I'll submit an additional PR to fix and test for this (#149).

Consider wrapping every submodule under the hooked module names

For exclude at least, only excluding a single module feels like a bug from a users perspective. It currently only excludes wrapping the module entry point. If a library is problematic internally, users have to resort to using a regex to actually exclude an entire library with its sub-modules.

On the other hand, include only wrapping a single module feels advantageous/correct if you ignore internals: true.

I'll take another look at the code and how internals: true works and get back to you!

@timfish timfish force-pushed the feat/only-instrument-hooked branch from e3c735c to 2d3fc58 Compare July 29, 2024 09:46
@timfish
Copy link
Contributor Author

timfish commented Jul 29, 2024

Ok, so I've marked this new API as experimental and this and the previous feature addition as incompatible with the {internals: true} Hook option.

Just document... doesn't work with internals: true and move on.

We may be able to fix this in the future!

trentm
trentm previously approved these changes Jul 29, 2024
Copy link
Contributor

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

}

/**
* EXPERIMENTAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad TypeScript jsdoc tag support doesn't support @experimental. Hopefully at some point (microsoft/TypeScript#56808)

@timfish timfish requested review from trentm and AbhiPrasad and removed request for AbhiPrasad July 29, 2024 17:47
@timfish
Copy link
Contributor Author

timfish commented Jul 29, 2024

Sorry, I had to make some changes after merging in the node: fix PR so I'll need approvals again as they were auto-dismissed!

for (const each of modules) {
if (!each.startsWith('node:') && builtinModules.includes(each)) {
includeModules.push(`node:${each}`)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the addition I need to make!

@timfish timfish merged commit 71c8d7b into nodejs:main Jul 29, 2024
48 checks passed
@timfish timfish deleted the feat/only-instrument-hooked branch July 29, 2024 23:46
lforst pushed a commit to getsentry/sentry-javascript that referenced this pull request Sep 5, 2024
Likely closes many issues but I don't want to auto-close anything
specific here. We should probably confirm the issues are closed
individually.

`import-in-the-middle` by default wraps every ES module with a wrapping
module that later allows it exports to be modified. This has issues
though because the wrapping output of `import-in-the-middle` is not
compatible with all modules.

To help work around this I [added a
feature](nodejs/import-in-the-middle#146) to
`import-in-the-middle` that allows us to only wrap modules that we
specifically need to instrument.

```ts
import * as Sentry from '@sentry/node';

Sentry.init({
  dsn: '__DSN__',
  registerEsmLoaderHooks: { onlyHookedModules: true },
});
```

So far I've only changed the express integration test to use this new
mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants