Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Set --experimental-default-type mode by detecting ESM syntax in entry point #50043

Closed
GeoffreyBooth opened this issue Oct 4, 2023 · 52 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem.

Comments

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Oct 4, 2023

After landing #49869, I opened nodejs/TSC#1445 to discuss when to flip the default value of --experimental-default-type from commonjs to module, which would make Node itself default to ESM whenever the user didn’t make it explicit that they wanted to work in CommonJS by using "type": "commonjs" in package.json, or a .cjs extension, etc. There were concerns raised on that thread around breaking the “loose” files case, where there is no package.json present, and breaking tutorials such as https://nodejs.org/en/docs/guides/getting-started-guide that assume a CommonJS default.

I propose we create --experimental-default-type=detect-module that would function as follows:

  • As with --experimental-default-type=commonjs and --experimental-default-type=module, it would only apply to ambiguous cases: a file with no explicit .mjs or .cjs extension, either no package.json or one that lacks a type field, no --input-type or --experimental-default-type=module flags passed, not under node_modules.
  • The entry point would be parsed (not evaluated) and we would look for import or export statements. (Not import() expressions that are allowed in CommonJS, not the CommonJS wrapper variables which could be set as globals by user code.) If the file cannot be parsed, error.
  • If an import or export statement is found, run the entry point (and the rest of the app) as if --experimental-default-type=module had been passed. Else run as if --experimental-default-type=commonjs had been passed.

This would solve the “loose files” and tutorials cases: ambiguous files would continue to be run as CommonJS. Only files with import or export statements, which currently error if you try to run them, would start to run as ESM. This would solve the goal of flipping, which is to let people start using ESM syntax by default without first opting in somehow.

I implemented big parts of this in 2019 in nodejs/ecmascript-modules#55. It’s very similar to a feature request from 2021, #39353. The main difference between then and now is the existence of --experimental-default-type. I can respond to some of the concerns raised on those earlier issues:

  • People won’t understand the distinction between disambiguating CommonJS and ESM, and what this is doing. This is why I called the value detect-module, not auto, and I think “it runs as ESM if import or export statements are found, or as CommonJS otherwise” is simple enough that users will get it.
  • CommonJS and ES modules cannot be disambiguated. This is true, but I’m not trying to disambiguate them. There are three sets of files, basically: files that can only run as CommonJS modules, files that can only run as ES modules, and files that can run as either (ambiguous syntax, like console.log(3)). I can’t tell apart the “ambiguous syntax” files from either of the other two, but I don’t have to: those run as CommonJS today, and they can continue to run as CommonJS. That would be the status quo, avoiding a breaking change. I’m only changing how the “runs only as ESM” files are interpreted, which turns a guaranteed error (Unexpected token import) into running code.
  • How would files imported by the entry point be interpreted? This is where the existence of --experimental-default-type today helps us out, because now I have a framework to use that defines how all files in various scopes are interpreted. The detection triggers either --experimental-default-type=module or leaves us in the default --experimental-default-type=commonjs, and that’s it.
  • What about future syntax, like whatever comes after import attributes? In my PR in 2019, I used Acorn since it’s already a dependency within Node. Acorn throws on any syntax that it doesn’t know, and I think we would likewise error. The error would be informative, instructing users to try again with an explicit marker (type field, file extension).
  • What about the performance impact? The entry point is parsed twice, yes, but only the entry point; and only when no explicit marker is present.

@nodejs/loaders @nodejs/tsc

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Oct 4, 2023
@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Oct 4, 2023

In the TSC meeting today we agreed that at some point, Node needs to run ESM code by default. So currently the options are:

  1. Flip the default, to --experimental-default-type=module, and users need to take a minute to update their project to add a type field or explicit file extension or flag to continue running their non-explicit CommonJS code as before.
  2. Ship --experimental-default-type=detect-module and set that as the new default, and most (all?) current CommonJS users are fine without needing to make any changes, but ESM syntax works by default without first opting in.

I’m fine with either. In nodejs/TSC#1445 there were people hesitant about the “require some people to make a one-minute change” first option, so I’m curious if they prefer this instead. I’m unaware of other solutions yet.

@JakobJingleheimer
Copy link
Member

This sounds like a great feature to me. And one that would have a lot of community support. As a user, it is very frustrating for node to say "I know exactly what is wrong, but too bad".

There may be some technical challenges to facilitate, but I think we can accomplish it.

The only downside I can think of is a performance cost paid only by those who would otherwise hit a brick wall (those who don't need this feature pay no cost); and that seems a better alternative.

@guybedford
Copy link
Contributor

I agree it could well be time to explore this direction, so far as it can fit within the narrow space we've created for it. When previously discussed this was always in the context of being the primary determiner, but having it restricted to the top-level and as an option sounds very interesting to explore further experimentally. I would be weary unflagging something though until all the edge cases have been worked through including things like workspace symlink consistency and the potential publishing footgun.

@ljharb
Copy link
Member

ljharb commented Oct 5, 2023

Detection isn't reliable, though, because the two parse goals are ambiguous.

@isaacs
Copy link
Contributor

isaacs commented Oct 5, 2023

I think this is a good idea. It's kind of rage inducing that node doesn't do this already, tbh.

However, the issue of ambiguous commonjs deps I think needs a subtle change (unless I'm misreading the proposal, and this is already the intent, in which case, just a clarification).

If I set --experimental-default-module-type=detect-module (and, ideally, eventually that's the default), and it sees import, export, or import.meta in the entry point and thus sets "type": "module" effectively, it should only affect modules:

  1. Within the current working directory
  2. Not within ./node_modules

Ie, it should act as if a file at ./package.json contains "type": "module". But, like in the case where ./package.json contains "type": "module", it should not default the type for any other packages lacking an explicit type that are encountered along the way by virtue of the entry point being detected as type:module. If those dependency packages are detected to be ESM, then yes, treat them as ESM as well. Otherwise, having a file that does import blah from 'old-dep' will try to load old-dep in ESM mode, and blow up when it uses __dirname, require(), and so on.

With this, the ways to trigger ESM are:

  1. a package.json file containing "type": "module"
  2. an entry point containing code that triggers a SyntaxError outside of a Module context
  3. naming the file .mjs
  4. explicitly setting --default-type=module

The issue of parsing being fundamentally ambiguous is not really a big issue, imo, because it doesn't have to be 100% accurate in all cases; all it does is add another way to trigger ESM mode, by just using ESM, which is as it should be.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 5, 2023

What about the performance impact? The entry point is parsed twice, yes, but only the entry point; and only when no explicit marker is present.

I am fairly certain that it is possible to let V8 return this information after parsing & hit the compilation cache when we parse it for the second time (which is essentially no-op and we pay no extra cost)

@GeoffreyBooth
Copy link
Member Author

Not within ./node_modules

Yes like the other default-type modes, this new one only applies outside of node_modules.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 5, 2023

Actually for the use case alone I don't think even Acorn is needed, we can just compile the script via ComepileFunction/ContextifyScript and if there is an syntax error, we check the error message to see if it's caused by import/export (even though that's not quite nice, but we already do something like that for JSON cycle detection anyway). It's likely that a script that uses import uses this very early on and V8 won't parse more than a few lines before it throws, so the overhead should be manageble.

@GeoffreyBooth
Copy link
Member Author

ctually for the use case alone I don’t think even Acorn is needed, we can just compile the script via ComepileFunction/ContextifyScript and if there is an syntax error, we check the error message to see if it’s caused by import/export

That’s clever, and it eliminates the “future syntax” problem. V8 would be able to parse any syntax that V8 can evaluate 😄

@targos
Copy link
Member

targos commented Oct 5, 2023

I really like this idea. Are you already working on an implementation?

@benjamingr
Copy link
Member

+1 for this and +1 for letting V8 do the heavy lifting I think that was the reservation last time.

@benjamingr
Copy link
Member

To be clear - there are some caveats:

  • The goals are ambiguous AND
  • modules run in strict mode and scripts run in loose mode by default

This means that if someone has code that works in loose mode but not strict mode like:

with(someObject) {
  // do stuff
}

and they rely on it running in loose mode - running it as a module (because e.g no require) or doing the opposite (defaulting to CJS again, and someone writes code that only works in strict mode) is going to break their code.

I went over the 20 first Google results for Node.js tutorial right now and none of them are affected. Personally I'm not concerned with this issue and think this is a great solution overall to 'how can we migrate to esm without breaking anything' so I'm still heavily +1 on the issue but I just wanted to raise this point.

@targos
Copy link
Member

targos commented Oct 5, 2023

With this proposal, it would only run unambiguous module code as ESM, so with(someObject) {} wouldn't be affected, would it?

@MoLow
Copy link
Member

MoLow commented Oct 5, 2023

+1 on the approach.
in addition, we should encourage people and ask npm/yarn/etc to use type in package.json to reduce the cases of ambiguous files.

@ljharb
Copy link
Member

ljharb commented Oct 5, 2023

To be clear, there's plenty of sloppy mode code on npm, that doesn't use with, that will break if suddenly parsed in strict mode.

The only code that's safe to "detect" is code that uses with or top-level return, or uses import/export statements (or top-level await)

@JakobJingleheimer
Copy link
Member

I have a loader that does this btw, so I know it's possible 🙂

@aduh95
Copy link
Contributor

aduh95 commented Oct 5, 2023

The current plan is if the entry point is a .js file, node first attempts to parse it with compileFunction (so a sloppy mode script), and if it fails with a error that we recognise as being caused by a static import/export statement, the default-type for the process will be switched to module.

@targos
Copy link
Member

targos commented Oct 5, 2023

The only code that's safe to "detect" is code that uses with, or uses import/export statements.

This proposal is exactly about "detecting" uses of import/export statements. In other words, code that would throw a syntax error (because of import/export, not any syntax error) if we tried to run it as CommonJS (sloppy or strict).

@ljharb
Copy link
Member

ljharb commented Oct 5, 2023

@targos awesome, that sounds great then! as long as the only code it makes a "guess" on is unambigously one or the other, there's zero concern imo :-)

@joyeecheung
Copy link
Member

joyeecheung commented Oct 5, 2023

Actually maybe this can be as simple as just little tweaks to the existing entry point execution

if (useESMLoader) {
runMainESM(resolvedMain || main);
} else {
// Module._load is the monkey-patchable CJS module loader.
const { Module } = require('internal/modules/cjs/loader');
Module._load(main, null, true);
}

We can just inverse the two branches and go to the non-ESM branch first, if detection mode is used, catch any error from Module._load, check if it's a SyntaxError caused by import/loose statements and if it is, fallback to the ESM branch, otherwise rethrow. This means something like https://www.npmjs.com/package/v8-compile-cache would continue to work if they are loaded via --require.

@bmeck
Copy link
Member

bmeck commented Oct 5, 2023 via email

@joyeecheung
Copy link
Member

joyeecheung commented Oct 5, 2023

Wrapper scripts, script runners, etc. Won't be able to use these files
otherwise since they won't have access to this mode of operation.

Can you elaborate on what that "wrapper scripts, script runners" means? Do you mean workers or vm APIs or embedder APIs?

If we just tweak executeUserEntryPoint as suggested in #50043 (comment) this would be visible to user land (now, whether we want to make Module.runMain a public API or address this TODO, is a different question - I don't think I had any more context other than "there is this undocumented thing, which appeared to be used/monkey-patched in the wild, we either document this or deprecate this" when I left the TODO).

// TODO(joyeecheung): deprecate this in favor of a proper hook?
Module.runMain =
require('internal/modules/run_main').executeUserEntryPoint;

Note that workers also use Module.runMain, so changes would also apply automatically in workers.

For VM APIs, there is already distinction between vm.Script/vm.compileFunction and vm.SourceTextModule and users can implement fallback on their own if they want to support something similar.

The embedder API would need an extension, likely for LoadEnvironment(), to support detection (in particular this would be asynchronous, either returning a v8::Promise or taking a callback/v8::Promise::Resolver - although "support for ESM entrypoint for embedders" is in itself already a feature request that has been around for some time).

Additionally this makes the interpretation of files argv dependent so that needs to be easy to pick up on.

The interpretation of the entry point has already been dependent on filenames (.cjs/.mjs). I don't see there is anything else that need to be done here, if we just tweak Module.runMain, which takes the entrypoint file name as argument. Whatever application that wish to let Node.js interpret some file as a normal entry point always needs to nudge things in the right way so that the file that should be interpreted as the entry point gets passed into Module.runMain, they have been doing that and won't need to change anything for Node.js to keep taking care of the rest.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Oct 5, 2023

I really like this idea. Are you already working on an implementation?

I’ve started a branch at https://github.com/nodejs/node/compare/main…GeoffreyBooth:node:detect-module, but all it has is a docs update so far. I also have some old code from nodejs/ecmascript-modules#55. At least all the tests from that old branch should be reusable, though we should rewrite them into the modern style with the test runner and spawnPromisified. I probably wouldn’t use the Acorn stuff, because I think Joyee’s idea of using vm is better. Wanna collaborate? 😄

@bmeck
Copy link
Member

bmeck commented Oct 5, 2023

Can you elaborate on what that "wrapper scripts, script runners" means? Do you mean workers or vm APIs or embedder APIs?

I mean code that refer to bin files via require or import()/import. I've certainly run code using a quick bootstrap file that adds some state somewhere / adds hardening / logging / etc. I've generally not done via --require in a lot of scenarios for stability and convenience. These wrappers can look like:

#!/usr/bin/env node
// bin/foo-debug
addDebugLogging()
// go into normal bin workflow
// just invoke the file normally, this note this file is both valid CJS & ESM
import(which('foo') || './foo') // this will load the bin not as an entrypoint

This also can occur when you are stuck for some reason using an API from a bin (often coming from things using .main style invocation)

import('bin-or-lib').diagnostics.push(recoverableException) // also not an entrypoint

Setting "bin" in the package.json field will result potentially in bin-or-lib being loaded in different modes since invoking via npm wrapper will see entrypoint behavior but loading via import() will load it and potentially get a different result.

Note: I'm not opposing the removal of ability to statically analyze what mode a file is.

I think having the files act the same is important to avoid the WTFs form the mismatch and I doubt many people will be upset if non-entrypoints act the same as entrypoints. Having consistent behavior is in general a boon to avoid questions like:

did you run it via a terminal, a worker entrypoint, or via require/import()

and avoid necessary documentation:

this module must not be loaded via require/import() and must be loaded only as an entrypoint

which kind of goes against some of the point of having a import.meta.main which is being pushed these days.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Oct 5, 2023

I think this proposal would only affect entry points that error in current Node; by definition, this only affects entries with import or export statements in an implicit CommonJS scope, and those entries always throw right now. So the cases in @bmeck’s examples, which all work under current Node, would be unaffected by the new mode?

I guess there’s a hazard that an ESM entry point might work when run directly in this new mode, but then when you run it via a CommonJS or ambiguous wrapper it stops working; but at least that’s not something that works in current Node and then stops working if this new mode becomes the default. It might be annoying to debug, and we should try to guide people to a fix via an error message, but I don’t think this potential scenario should be disqualifying.

@bmeck
Copy link
Member

bmeck commented Oct 5, 2023

@GeoffreyBooth

I guess there’s a hazard that an ESM entry point might work when run directly in this new mode, but then when you run it via a CommonJS or ambiguous wrapper it stops working; but at least that’s not something that works in current Node and then stops working if this new mode becomes the default. It might be annoying to debug, and we should try to guide people to a fix via an error message, but I don’t think this potential scenario should be disqualifying.

Correct, this is the breakage that would not be able to be worked around. Files would change behavior depending on the location doing the loading and be stuck in a state unable to be altered.

at least that’s not something that works in current Node and then stops working if this new mode becomes the default.

It isn't that it would stop working in the new mode, it would potentially execute in CJS in one form and ESM in the other. This is doubly true if you swap the default parse to go for ESM then failing back to CJS. If under this mode we move from an Error to a valid execution of CJS we shouldn't for example move to swap it to ESM later. I think in the short term it could catch errors and prevent alteration of code which would be good but this wouldn't work for cases where code is wrapped. They could not catch the errors and alter the mode of execution.

It might be annoying to debug, and we should try to guide people to a fix via an error message, but I don’t think this potential scenario should be disqualifying.

I think having fewer errors and applying this on cases that aren't the entry point makes more sense than leading users to errors and having them fix. I'm unclear on why we would only want this to apply to the entry point when it causes a new form of breakage to be possible. We could simply not error and use the same mode detection for all files in these ambiguous situations.

I'm fine to a -0 due to amount of complexity of out of band non-static analysis friendly operation here but think there are only negatives by only fixing the entrypoint for this.

@bmeck
Copy link
Member

bmeck commented Oct 5, 2023

To be clear, are there direct objections to allowing this detection on ALL ambiguous cases but a desire to allow it in the entrypoint? If so, is there a clear reason for this if we can share the expensive code cache like above to avoid major per issues?

@bmeck
Copy link
Member

bmeck commented Oct 6, 2023

There’s also this case:

entry.js contains import './imported.js'
imported.js contains just console.log("hello")
If each file is evaluated individually, then imported.js here would be run as CommonJS, since it lacks an import or export statement. But it’s almost certainly intended to be run as ESM, because it’s part of the same app as entry.js and they’re in the same folder and have the same file extension.

The inverse is also true though. It could be expected to be CJS because it lacks import, or it could be expected to be CJS because another file in that directory called entry2.js is CJS. Additionally it may actually be written so it works as either or even detects the mode it is running under since imported.js only knows its mode based upon how the main module of the entire process was loaded and under what default-type.

Theorizing what someone may expect, intend, etc. is not a route of discussion I'd find comforting and would instead find a good reason to not ship a feature. I think trying to frame this as some big implicit configuration feature is missing the real gains to be had.

Users want messages of the form ~Unexpected token "import" to go away. If we only want those messages to go away for entrypoints that seems a bit odd and leads to inconsistencies in how files are treated as shown above. By doing the parse we are ensuring we are in an error case so that as you said:

this only affects entries with import or export statements in an implicit CommonJS scope, and those entries always throw right now.

Occupying spaces that would not run and solving user errors seems a general good. It is exceedingly rare that an ESM module lacks export statements if it isn't a dependency and also exceedingly rare that an ESM entrypoint lacks an import statement. I think we could solve real problems without any potential breakage by only changing what throws right now by just constraining this to a double parse.

If this suddenly changed other file interpretations as well it would be an out of band / implicit configuration mechanism which I'd be very resistant to. As it was framed originally, I thought this was intended to solve user errors and not intended provide new means of process configuration to take into account while writing code.

@bakkot
Copy link
Contributor

bakkot commented Oct 6, 2023

I think there's a lot of value in simplicity/predictability. The case of

  • importing a file purely for its side effects (i.e., it contains no exports)
  • which behaves differently in strict vs sloppy mode

is, I would think, likely to be pretty rare. As such, even aside from @bmeck's concerns, I don't think it would be worth adding complexity just for that case - better to stick to the more easily explained "this only changes the behavior of files with an import or export".

@aduh95
Copy link
Contributor

aduh95 commented Oct 6, 2023

Should we add TLA to the list of unambiguous syntax? so

console.log(this===undefined? 'ESM' : 'CJS'); // prints CJS

and

await console.log(this===undefined? 'ESM' : 'CJS'); // prints ESM

@bnoordhuis
Copy link
Member

In response to #50043 (comment): on the scale of node and the npm registry, rare is common. Edge cases matter.

@benjamingr
Copy link
Member

I still think that doing this only for the entry point is fine and the easier fix since Node is put into "esm" or "cjs" mode once rather than having to detect for every file. This + enforcing "type" in new published packages + none of the old ones breaking sounds like a great compromise for switching the default in a non-breaking way and incurs a marginal performance penalty (vs. doing it for every file).

@GeoffreyBooth GeoffreyBooth changed the title Proposal: --experimental-default-type=detect-module Proposal: Set --experimental-default-type mode by detecting ESM syntax in entry point Oct 6, 2023
@bmeck
Copy link
Member

bmeck commented Oct 6, 2023

@benjamingr dual parse is well understood, I don't know of any tool that does parse entrypoint and swap detection modes based upon that. I wouldn't find comfort or assume ease in inventing a new undiscovered set of edge cases. If such a tool exists we should likely figure out what it is like. Tools like TypeScript don't operate in this way or even have configuration to allow for it, but those tools generally do have a configuration for double parsing.

@GeoffreyBooth
Copy link
Member Author

I opened #50064 to separate the discussion of the “detect the entry point” approach (this proposal) and the “detect every file” approach (that one).

@mcollina
Copy link
Member

mcollina commented Oct 6, 2023

I support this proposal, the cost of double parsing one file is minimal vs the benefit for the end user.

@andrewbranch
Copy link

Still gathering thoughts on how this would affect TypeScript. If the entrypoint to node under this flag is an .mjs or .cjs file, does the default type fall back to CommonJS? I think I would find it surprising if I had:

// @filename: a.js
import "./b.js"
// @filename: b.js
export {};
$ node a.js #

$ mv a.js a.mjs

$ node a.mjs # 💥 
b.js:1
export {};
^^^^^^

SyntaxError: Unexpected token 'export'

I started out with modules, I made one more explicitly a module, and now I don’t have modules anymore?

@GeoffreyBooth
Copy link
Member Author

If the entrypoint to node under this flag is an .mjs or .cjs file, does the default type fall back to CommonJS?

No, per the proposal, detection only happens for ambiguous files: .js or no extension, where there’s no controlling package.json or the package.json lacks a type field. The .mjs or .cjs extensions are each explicit and so Node always interprets them as ESM or CommonJS, respectively.

I think I would find it surprising if I had:

Yes, confusion can occur if you change your entry point separately from the rest of your app. That’s one of the weaknesses of this approach. Fortunately I don’t think that doing so is all that common; someone using the .js extension is more likely to add "type": "module", to change all files in a scope at once, than they are to change individual files’ extensions. But even if they do intermix ambiguous and explicit extensions, we can error informatively like “b.js was interpreted as CommonJS because it uses the .js extension and there is no package.json in this or any parent folder that contains "type": "module". If you intended for b.js to be interpreted as an ES module, either rename it to b.mjs or create a package.json file containing { "type": "module" }.”

@Qard
Copy link
Member

Qard commented Oct 6, 2023

I somewhat prefer the approach of the other proposal to always be trying both for this exact reason. We would always do the fallback so we don't break users in these edge cases.

If we only detect at the entrypoint file then the behaviour could be unexpected. The inverse of that scenario is that it's also relatively common to vendor parts of modules by copy/pasting files directly into your app and I can imagine we would encounter users building ESM entrypoints and then expecting loose CJS files copy/pasted from elsewhere into that project to just work.

@isaacs
Copy link
Contributor

isaacs commented Oct 8, 2023

It should auto detect for all files whose package.json resolves to the same package.json of the entry point module.

Not for anything in other packages or deps, but for everything directly in the entry point's local package.

@GeoffreyBooth
Copy link
Member Author

should auto detect for all files

I think any discussion of detection for all files should happen in #50064. This proposal is focused on using the entry point as a signal to set the default module system.

@Qard
Copy link
Member

Qard commented Oct 8, 2023

I think what @isaacs is getting at is a hybrid of the two proposals where it only does the check once, but it's module-scoped so it doesn't potentially interfere with expectations of dependency modules. I think it makes sense to bring up here given that it's not fully checking all files.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Oct 8, 2023

From an implementation point of view, detecting all files other than dependencies is much closer to #50064 than it is to this. It would mean first implementing detection for potentially any file, not just an entry point, and then adding “isn’t under node_modules“ to the list of factors that would permit detection for each file, just like “package.json has no type field” is one of the factors. So for the sake of keeping discussion organized, I’d prefer that this thread focus on the entry point stuff and that thread focus on detection for any set of files larger than just the entry point.

@jakebailey
Copy link

jakebailey commented Oct 13, 2023

I haven't seen this discussed yet, but one of the major gotchas I see with this proposal so far is the possibility for users to accidentally publish packages that are broken without them realizing.

Take an example where I have:

// package.json
{
    "name": "my-cli",
    "main": "main.js"
}
// main.js
import { parseArgs } from "util:path";
console.log(parseArgs({ args: process.argv.slice(2), strict: false }));

I can test this script locally and it'll work; it's not in node_modules, there's no type marker, there's clearly an import, so it runs.

But, if I publish this package, suddenly this code is appearing in node_modules, and now the lack of a type marker implies commonjs, and my script fails. There's not a good way to test this ahead of time, besides getting this package into node_modules somehow.

So, I'm not personally convinced that a missing type in package.json is a good idea, even outside of node_modules, because it's likely that the code will at some point appear in someone's node_modules. It seems a lot more reasonable to get all of the package managers to default to module in their --init commands, if the point of this proposal is to help new users start using ESM quickly.

(EDIT: npm/cli#6855 (comment) maybe implies that this had been discussed somewhere? I have not found it.)

In a similar vein, @weswigham also noted that if you ever "vendor" a package without putting it in node_modules (like third_party or something; I know some people vendor like this), then now a package without a type marker, that now its files become ambiguous too even if there's a package.json, which is problematic. This is effectively the opposite direction of my above case, where code that was intended to be CJS becomes ESM.

So given those two ideas, I'm much more a fan of getting all of the package managers to generate different --init results and wait it out, if we're trying to handle the case where package.json is present but missing the marker.

@weswigham
Copy link

Indeed, format being entry point dependent is also probably incredibly bad for tooling, which thus far may either not need an entry point, or needs to acknowledge multiple entry points. In either case, a detection flag like this can still leave intended module formats completely ambiguous. Detection for every file might be OK, tools like TS and webpack have had modes for that for forever with known caveats, but only using the entry point sounds like it could lead to significant pain.

@Qard
Copy link
Member

Qard commented Oct 13, 2023

Yep, which is why I think I agree with @isaacs about entrypoint plus a separate check for the first file seen from each module.

@jakebailey
Copy link

jakebailey commented Oct 13, 2023

FWIW I think that the potential cost of running detection on every file wouldn't be that bad; in addition to previous comments about V8 internals (no clue about that, I have no experience), it's pretty easy to tell textually whether or not a file may contain imports; e.g. if you don't match /import|export/, then you definitely know that the file isn't a module, so can skip detection. I expect most cjs code to not ever contain the substring import.

This is a common tactic in editors, e.g. "I want to run find-all-definition for the symbol Blah, but I can really easily just check to see if the file even contains that string before paying any parse costs".

(Obviously this idea can be refined; I'm just throwing an idea out for that if perf actually turns out to be a problem.)

Yep, which is why I think I agree with @isaacs about entrypoint plus a separate check for the first file seen from each module.

I'm a little confused as to this idea, but maybe I'm missing it in the thread above; my reading of that statement is that the heurstic is somehow just one level deep, which I think is more confusing than "just the entrypoint" or "every ambiguous file gets detection". I much prefer the latter for reasons stated above. Every file should individually get to choose whether or not they're CJS or ESM; mixing them is fine.

Entrypoint-dependent behavior gives me horrible flashbacks to Python, where import resolution depends on the directory of the entrypoint (metaphorically in the JS world, it'd be like if running a script in your package meant that some random other import could find files in your node_modules even if they aren't even in the same FS tree, and then everyone complained that their editors couldn't handle this).

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Oct 13, 2023

I haven’t seen this discussed yet, but one of the major gotchas I see with this proposal so far is the possibility for users to accidentally publish packages that are broken without them realizing.

It’s discussed in #50064; see the section “Pros, in comparison with #50043.”

@jakebailey
Copy link

It’s discussed in #50064; see the section “Pros, in comparison with #50043.”

Thanks; I didn't realize it was mentioned there. It is of course explicitly a downside of this issue's method, regardless of any other proposal, so good to note here!

@GeoffreyBooth
Copy link
Member Author

Sure. I think for now I’m treating this one as more or less on hold until we know whether #50064 is viable. I had assumed that running detection on every file would be prohibitively performance impactful, but it seems like my assumption might have been wrong on that, in which case #50064 offers many advantages. So I’m looking into trying to get an implementation of #50064 first before considering alternatives.

@DanielRosenwasser
Copy link
Member

@jakebailey and @weswigham have mentioned it above, but just for the sake of completeness, our recent design meeting notes point out some of the issues TypeScript would have with this proposal: microsoft/TypeScript#56103.

@GeoffreyBooth
Copy link
Member Author

Closing as alternative #50096 has landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests