-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: --experimental-detect-module
doesn’t run as ESM files that have CommonJS parse errors above the first ESM syntax
#50917
Comments
I tried to fix Case 2, but there's a problem with top-level With the test case: |
PR with the test if someone knows how to handle this case: #50918 |
Case 1 is similar as it also generates an error that's not specific to ESM syntax. |
@targos, case 1 can be fixed with an additional wrapper. If we assume the normal CJS wrapper is something like: (function(module, exports, require, __filename, __dirname) {
CODE
})(); For the check we can do something like: (function(module, exports, require, __filename, __dirname) {
(function() {
CODE
})();
})(); This allows all of the CJS "globals" to be shadowed and redeclared with That said, I'm not familiar enough with the codebase to know whether the compiled ESM checking function is being cached for later use in the case that the code is indeed a CJS module. If the function is being cached for later use, this might be some kind of breaking change for code that expects to throw on things on |
Just a thought: I feel like the proper way to do this would be to actually traverse the AST and check for ESM nodes before the compilation step. Does v8 expose the raw AST? |
V8 does not expose the AST (for encapsulation concerns, I gather). And we rely on V8 errors for now because using another parser would likely result in a performance regression in the no-failure case. |
It seems to me both cases can be solved with a different strategy: when there are syntax errors when trying to parse it as CommonJS, don't even detect the errors and just try parsing it as ESM unconditionally. And if that fails, it really fails. (I thought that was the original plan, but I don't know why it wasn't implemented that way, though I didn't give any thorough review the original PR myself either) |
If you do that, ambiguous files (which are syntactically valid in ESM and CJS) will be executed as ESM. |
Ambiguous files would still be executed as CJS, no? Only if they cause a syntax error (any syntax error) when parsing as CJS would they be parsed again as ESM. If they are truly ambiguous, this should cause the same syntax error. If it succeeds, however, then they weren't ambiguous to begin with, and should just be executed as ESM. I think the proposal is to ignore the specific type of syntax error; it isn't to try ESM first (which would obviously be a much larger change). |
No, it'll still be executed in CJS. We can first try to compile it as CJS, and if it fails due to SyntaxError (any SyntaxError, not just selected ones) we try to compile it as ESM, but it if passes we execute it as CJS. And if the second parse as ESM still doesn't compile we fail. |
What about: const module = {}; This is not unambiguous ESM and throws a SyntaxError is CJS. I know it's not a very realistic example but I'm afraid we can find some if we think about it. |
We can just say in this case we recognize it as ESM. The rule is set by us, and I think it's fair to just say, if a script attempts to declare a new binding module, it will be recognized as ESM. |
Or we don’t need to say anything about any syntax in particular, just that
|
Please don’t forget to tag @nodejs/loaders |
So unless I’m misremembering, the current algorithm is:
Lines 1406 to 1409 in 2e458d9
So if in step 1 V8 throws on some other parse error, before the first From the beginning detection was never intended to catch every valid ESM file; when we implemented it we knowingly excluded top-level await from the list of syntax that would trigger “this is ESM,” because we can’t distinguish the error that V8 throws for top-level await from the identical error that V8 throws for any I don’t think we want to have multiple parsing passes to catch edge cases such as CommonJS parsing errors before the first |
If you change step 2 to
Then you’re describing the current behavior. Though it’s an interesting idea to attempt “run as ESM” for any parsing error when attempting to parse as CommonJS, rather than restricting the “try as ESM” behavior to only ESM syntax errors. So the first thing I’m wondering is whether there’s a difference in practice between “we will attempt to run it as ESM” and “we will attempt to parse it as ESM. If that parses we execute it as ESM.” Is the latter an additional parsing pass? If so, is there a noticeable performance cost to doing so? (Perhaps not, if V8 caches.) Let’s say there’s no meaningful performance cost. In that case, why not try again as ESM for any CommonJS parsing failure? What would be the downside? The only one I can think of is that we might get a different parsing error, that the user might not expect. For example, a module consisting of: const { readFile } = require('fs/promises');
const contents = await readFile('file.txt'); Would throw The other possible footgun I can think of is if a user is editing a CommonJS file and makes a mistake that causes it to fail parsing as CommonJS, but it does still parse as ESM. Then the user would have unintentionally opted into the “run ESM by detection” behavior. It’s quite explicit to need to type |
What it means is that a file will be in sloppy mode up until the first import or export is written, at which point it would suddenly be in strict mode. While the programming models in which that's likely are rare, when they do happen, it will be surprising and very very difficult to debug. |
I don’t think we need to extend to reference errors. Just syntax errors are enough. So it can just fail as 1. Or just that we can stash the first error somewhere and if ESM errors again, in step 3 we throw the first one. Or we can just show both. If users can be surprised by what happens, we just tell them what happens. |
Yeah, let’s not extend to |
I think at the very least we can improve the documentation and error messages. I’m unsure what, if anything, we should do beyond that. The only thing I can think of is to remove the restriction for “try again as ESM” from just the ESM-related syntax errors to any parsing/syntax error, but I’m wary that doing so could introduce other issues; but the only potential problem I can think of is the “different errors thrown when compiled as CommonJS versus when compiled as ESM” issue, which is something we can certainly handle one way or another. @joyeecheung @targos @aduh95 can you think of any issues that would be created if we did so? I think in practice what this would look like would be replacing Lines 1487 to 1492 in 2e458d9
SyntaxError: .
|
I can think of one reason not to change "try as ESM after getting an ESM-related parse error" to "try as ESM after getting any parse error": other tools would then have no reliable way to know how Node will run a file. Currently they can look at our docs, where it lists the syntax we check for ( The counterargument is that if there's a CommonJS syntax error before the first appearance of this syntax, Node will still fail to run the file as ESM even though it has the opt-in syntax. So the current behavior is more like "if the file parses as CommonJS except for this syntax, Node will run as ESM" which is much less straightforward to evaluate by tools; though I'd expect many to skip over the edge case presented by this issue and just assume that files can parse correctly. |
I’ve been discussing with @guybedford and we think we have a solution for this. So
This should cause the examples in this issue, which contain code that can run as ESM but not as CommonJS, to successfully evaluate as ESM; while not expanding our algorithm beyond what it currently is, and is easily reproducible by other tools. And the additional parse would only happen for the rare case of a file that:
|
--experimental-detect-module
doesn’t run as ESM files that have CommonJS parse errors above the first ESM syntax
In discussion with @joyeecheung and @guybedford I think there might be an even simpler solution. The insight is twofold:
We could add these additional errors to this list of exceptions that trigger a retry as ESM. So the flow would be (new part in bold):
Would this work? If so, are there any errors to add to our list besides the below?
The other nice thing about this approach is that it’s still replicable by other tools. Edit: Here’s an example that would be an issue with this approach: exports.blah = 6;
const module = 'test'; The |
I don't understand why that's an issue. What's wrong with saying "hey looks like this file is neither correct CJS nor ESM. If we try to interpret it as CJS the error is here, if we try to interpret it as ESM the error is here"? |
Imagine a 1000-line CommonJS file; you add Also I assume it wouldn’t be too hard to show two parse errors if neither mode parses successfully, since both parses happen one after each other in the same function, but to show a CommonJS parse error and an ESM runtime error would mean somehow remembering the CommonJS parse errors for every ES module being parsed and linked before they’re later evaluated. |
I don't think that's that problematic. If they don't want surprises, just make the module type explicit. Otherwise treat it as sloppy mode, which is just...JavaScript. Re. reparsing I think they can be done together. At least for require(esm) that's easily doable. For the ESM loader, some refactoring needs to be done because it has too many abstractions in the way. |
Well great; if such an outcome is acceptable, then we have multiple solutions. Obviously something which has fewer surprises would be better, though. Here’s another potential solution, based on #50917 (comment): If the CommonJS parse throws one of the six potential errors listed above, ( (function(module, exports, require, __filename, __dirname) { (async function() {
CODE
})(); })(); If the If this second CommonJS parse still errors (on the same error?) then the user really did write code like I think this would cover all the cases in this issue, without needing to involve cjs-module-lexer; and this extra CommonJS parse would only happen in very limited circumstances on an error path, so it would rarely affect performance. Does this seem like it could work? |
I just ran into the "accidentally opted into ESM mode" issue by simply running a file that does
The latter is not a case that we can detect based on syntax - it's a reference error thrown at evaluation time. I think the suggestion myself and @aduh95 made previously about showing both errors still make sense - the current message isn't very helpful for users accidentally opting into ESM mode. Showing both errors at least helps users understand what's going on. On a side note: the current modified error message isn't super helpful either, a beginner would think it's telling them to do |
Case 1 (a perfectly valid ES module):
Results in:
Case 2 (another valid module)
Results in:
Solutions
The first case can be solved by placing the code in another function wrapper, allowing
module
,exports
, etc to be shadowed (note that this might require manual removal of the hashbang if there is one).The second case can be solved by adding the above "await" error message to the list of ESM errors to check for.
The text was updated successfully, but these errors were encountered: