-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look for node_ceiling #43828
base: main
Are you sure you want to change the base?
Look for node_ceiling #43828
Conversation
Stop searching for package.json or node_modules when a node_ceiling file is found. Refs: nodejs#43192 Refs: nodejs#43368
Review requested:
|
Some tests are failing, even before my changes. |
This seems exceedingly premature prior to a lot more discussion. Changing the CJS module resolution algorithm - especially with "Stability 2" - will impact a ton of tools in the ecosystem, including bundlers, transpilers, packages like |
@ljharb there has been some discussion in the referenced issue, but I hoped having code to play with might facilitate additional discussion. |
and that discussion largely seems to indicate that this change shouldn't be made, except via loaders. |
Even loaders somewhat want something like this outside loaders. I like the
use case and it is possible to do this with policies today to error outside
a directory but policies lack the ability to stop searching at a directory
boundary. It would be good to differentiate error vs short circuit here.
…On Wed, Jul 13, 2022, 3:47 PM Jordan Harband ***@***.***> wrote:
and that discussion largely seems to indicate that this change shouldn't
be made, except via loaders.
—
Reply to this email directly, view it on GitHub
<#43828 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJIZ2VHSAFAAMTWHAABDVT4TNJANCNFSM53P7D6ZQ>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
Did you consider just using an environment variable for this like |
An env would be nicer as it cannot be removed from disk to circumvent
things.
…On Wed, Jul 13, 2022, 5:36 PM Guy Bedford ***@***.***> wrote:
Did you consider just using an environment variable for this like
NODE_PROJECT_ROOT or something?
—
Reply to this email directly, view it on GitHub
<#43828 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJI5A3LMZG6ZWHJR52ALVT5AG7ANCNFSM53P7D6ZQ>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
@guybedford @bmeck
The using something in the file system OTOH
I also considered placing something inside package.json or inside node_modules, but it seemed useful to be able to terminate the search for either without the presence of the other. Also, the search for package.json terminates at the first one located, while the search for node_modules can continue beyond a package.json, so a new file seemed useful. I chose to look specifically for a file rather than a file-or-directory because it seemed useful to have a single pattern. Either would allow future expansion (file contents or new entries in a directory), but having a mix of both sounded needlessly complicated. The name seemed reasonably meaningful, didn't commit to a file-format when it's currently only a present or not-present boolean, and is the same length as node_modules (so it's probably OK on any system node currently searches for node_modules). |
Would it be useful to have an environment variable in addition to looking in the file system? If so, there would be questions of how multiple paths are joined into the variable and how they are normalized (or not) and compared with the search paths, but I think both approaches could coexist. |
Environment variables have the problem that they may not get propagated to child processes. Ex. |
If policies were automatically looked up it would solve the problem as
well. They intentionally do not compose though. If we go for files I would
prefer that approach as it is fewer features to keep track of.
…On Sun, Jul 17, 2022, 6:16 AM Ben Noordhuis ***@***.***> wrote:
Environment variables have the problem that they may not get propagated to
child processes. Ex. child_process.fork().
—
Reply to this email directly, view it on GitHub
<#43828 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJI7PH5H7GZMGAYADUA3VUPTPPANCNFSM53P7D6ZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Policies offer options like dependency redirection. Any proposal would need to avoid automatically loading a policy under malicious control. Are there any proposals in that direction? The code in this pull request only prunes the search path and only looks in directories that have already been examined for node_modules or package.json. |
Policies have their own integrity flag provided via CLI.
…On Sun, Jul 17, 2022, 8:37 AM Andrew Hart ***@***.***> wrote:
If policies were automatically looked up it would solve the problem as
well. They intentionally do not compose though. If we go for files I would
prefer that approach as it is fewer features to keep track of.
Policies offer options like dependency redirection. Any proposal would
need to be avoid automatically loading a policy under malicious control.
Are there any proposals in that direction?
The code in this pull request only prunes the search path and only looks
in directories that have already been examined.
—
Reply to this email directly, view it on GitHub
<#43828 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJI56MKKT7XAUGICLSX3VUQEB3ANCNFSM53P7D6ZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Like environment variables, CLI arguments don't necessarily get passed to child processes. |
I don't have any strong opinions here but if we do end up using a file sentinel I'd suggest package.json with |
I like that idea much better - it mirrors |
Using a field in package.json might not be as convenient as it first sounds.
|
This feature comes down to specifying a single path only within which Node.js should execute code - I think rather than a file signifier, we should just have an out-of-band way to achieve that such as a flag or environment variable or configuration file. There are a number of things in Node.js at this point that want to just be applied ambiently through configuration - loaders, hooks, instrumentation, policies, custom flags. Perhaps this is another data point on the journey to having a generic configuration file for Node.js invocations that is looked up on execution. If such a thing were to be designed, having a key for this configuration seems like a good solution to me. So starting from a flag / environment variable I think we can get there. But I think this PR has a better chance of landing if it were to take this approach rather than introduce a new pattern which provides another statting reliance and pattern for people to learn. |
@guybedford could you elaborate on your single path idea? With Node.js looking in so many paths for dependencies, especially with symlink handling, perhaps you have a simplified or recommended pattern in mind? |
@arhart sure, if you're after an explicit suggestion I can certainly provide that. My simplified or recommended pattern for implementation would be:
Possibly alternatively supporting some
In due course then extending support for such a flag through project-level / app-level configuration system. Name can be bikeshed further of course. This is just my preference, it's an open process and you have no explicit blocks, so it is your decision how to navigate further of course. |
@guybedford Thanks. I think I see the direction you are suggesting. While some sort of mechanism for ambient configuration of Node.js seems useful, it also seems like a long route to walk to get there. |
I wouldn't characterize this feature as specifying path(s) within which Node.js should execute code. Instead, I'd say it's about stopping module resolution at particular path(s). If an application dynamically downloads and loads a module, then it seems good for the path(s) [where] module resolution stops to be similarly flexible. |
An install tool (ex: npm) knows where it is placing modules and their dependencies. It might not have visibility to the larger application structure or even how many applications use those modules. |
I’m very hesitant to modify the CommonJS resolution algorithm at this point. There are so many other tools that depend on it and reimplement it, like many of the bundlers out there. I would encourage you to implement this idea as a loader, at least at first; that can prove it its viability, and then you can make the argument that it should be part of Node core. |
@GeoffreyBooth is that a general hesitancy to change something so central or are there also specific reasons to think such a change would not be compatible with existing code? Implementing as a loader could be interesting, at least the parts that loaders currently allow. My understanding is that loaders can only be setup by command line arguments and can't change how CommonJS modules are loaded. What do you think might be learned by having a partial loader implementation rather than the more complete branch in this pull request? |
Changes to the resolution algorithm impose a heavy burden on the community. Many tools, especially build tools, reimplement the resolution algorithm and therefore would need to be updated to accommodate this change. Before we impose that cost on everyone, I’d like some indication that anyone beyond you would use this feature. Shipping it as a loader (using the ESM hooks for modifying ESM resolution, and monkey-patching |
In both forms, it is and always is just a list of paths, just with different sources of truth over which paths constitute stop paths. Roundabout paths to goals are unfortunately usually necessary! |
Hi 👋 thanks for considering a feature like this! Some package managers like yarn LTS (3.x) allow an install config to define the ceiling during installs (which should override the hoisting limits but I'm not sure if it's for ESM loading or just CommonJS?). So, could Node also look at those config options in package.json when it does it's module resolution? I think if someone has a file tree like:
And inside {
...
"nmHoistingLimits": "workspace",
...
} I would want this to mean: when I do For the time being, I have to use the resolve hooks: import { resolve as pResolve } from 'path';
const parentDir = pResolve('..');
const ignoreDir = `file://${parentDir}/node_modules`;
export async function resolve(specifier, context, nextResolve) {
let localPkg
if (specifier.startsWith(ignoreDir)) { // If the dep is tin the parent...
const pkg = specifier.match(/node_modules\/([^\/]+)\//)[1] // pull out the pkg name...
localPkg = await import.meta.resolve(pkg).catch(() => {}) // and only use the local one, not the parent's
}
return nextResolve(localPkg || specifier); // return the local one if found, otherwise carry on...
} |
@richardeschloss Thanks for the resolve hook example. Here is some code I sketched out in August for the CommonJS case, but I never got back to it. Some use cases really want some sort of ambient configuration to load the necessary hooks. Perhaps something like #44975 'use strict';
const fs = require('fs');
const Module = require('module');
const path = require('path');
const ModuleFindPath = Module._findPath;
Module._findPath = function(request, paths, isMain) {
if ( paths && paths.length && !path.isAbsolute( request ) ) {
/* By modifying the paths argument to ModuleFindPath rather than
* completely replacing Module._findPath we:
* 1. (+) are more accepting of changes to Module._findPath, including
* other monkeypatches
* 2. (-) eagerly stat for node_ceiling rather than lazily
* 3. (-) lack access to the statCache in lib/internal/modules/cjs/loader.js
* 4. (-) lack access to internalModuleStat and its optimizations
*
* By monkeypatching Module._findPath rather than editing it directly we:
* 1. (+) avoid requiring a change in core
* 2. (-) require some caller to call this code "soon enough" for each
* script which needs this protection
* 3. (-) introduce a dependency on undocumented implementation
* details of CommonJS module loading in Node.js
*/
// TODO say something about how this addresses a security concern
// TODO say something about --experimental-policy and NODE_OPTIONS
// TODO say something about ESM
// TODO say something about package.json
// TODO copy paths, look for node_ceiling, modify copy
console.log('paths is', paths);
const oldPaths = paths;
paths = [];
let foundRoot = false;
let foundCeiling = false;
for( const curPath of oldPaths ) {
if ( !foundRoot ) {
foundRoot = curPath === path.resolve( curPath, '../../node_modules' );
if ( foundRoot ) {
console.log( 'foundRoot', curPath );
}
if ( foundCeiling ) {
continue;
}
}
if ( !foundCeiling && !foundRoot ) {
const ceilingPath = path.resolve( curPath, '../node_ceiling' );
console.log( 'looking for node_ceiling at', ceilingPath );
const statFoundFile = fs.statSync( ceilingPath, { throwIfNoEntry: false } )?.isFile() ?? false;
if ( statFoundFile ) {
console.log( 'found node_ceiling at', ceilingPath );
foundCeiling = true;
}
}
paths.push( curPath );
}
}
return ModuleFindPath( request, paths, isMain );
};
require('no-such-thing'); |
It is easy to accidentally allow another user to influence what code node loads and executes. Details can be found at HackerOne reports 1564437 (CommonJS module loading), 1564444 (ECMAScript module resolution), and 1564445 (package.json). While these behaviors are documented, the security implications are easy to overlook.
By stopping the search for
node_modules
andpackage.json
after encountering anode_ceiling
file, this change allows an application to be protected with few changes to the file system and a high degree of compatibility.