Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mappings] Define for each ECMAScript input element where mappings are allowed #124

Merged
merged 3 commits into from
Sep 18, 2024

Conversation

szuend
Copy link
Collaborator

@szuend szuend commented Aug 28, 2024

This PR is a initial take to at least clarify on a very basic level where we allow mappings for JavaScript. The current draft proposes to use "input elements" instead of "ECMAScript tokens" since generators might want to map comments as well (e.g. a TypeScript -> JavaScript with JSDoc transformer that maps the @param back to the TS type annotation).

The main reasoning behind the current draft is that it seldom make sense to have mappings in the middle of tokens. E.g. with let foo = 5;, you don't really need a mapping for the o in foo. The exception (I think) are string literals where minifiers could combine multiline string literals into a single literal. Template literals might also fall under this category so we could move them down to the second bullet point.

I don't see a use case for mapping white space and line terminators, but if there is a valid use-case we should also move them to the second bullet point.

Note: I plan to do a followup where we recommend some AST productions where generators "should" emit mappings to better facilitate stack trace deobfuscation/debugging/etc. But this is kinda tricky since engines don't really agree on where Error.stack frames point to w.r.t. to the ECMAScript AST, so we'd have to cover all or at least the major use cases in the major browsers/runtimes.

@szuend szuend requested a review from nicolo-ribaudo August 28, 2024 09:32
@sjrd
Copy link

sjrd commented Aug 28, 2024

Mappings in the generated source have range positions. Why are we restricting them to only point to the first code point? It seems more natural to point to the entire range that is produced.

For example, for an identifier in source that becomes an identifier in the target, you'd want the mapping to cover the entire identifier in the target.

Likewise, for an operation such as an addition in source a + b, the produced code could look like (a + b) | 0 if it implements arithmetic modulo 0, and is pretty-printed. In that case, arguably, all the characters of that string, except the substrings a and b, should map to the source + operator. That should include whitespace, if only to reduce the number of required mappings.

Overall I'm not sure what the proposed restrictions are meant to achieve? Can the consumers benefit from those restrictions for some reason?

@szuend
Copy link
Collaborator Author

szuend commented Aug 28, 2024

I'm not sure I fully understand. Mappings are not ranges. They only translate a single point in generated code to a single point in authored code. Encoding overlapping generated source ranges is not possible. Generators/consumers might interpret that consecutive mappings form ranges, but that's not actually spec'ed.

For your example, you could encode it in a couple different ways. Either map the generated ( to the authored a and the next non-whitespace token after 0 to after b. Or just map the a + b with empty mappings for the parens and | 0 or add mappings for a , + and b) individually.

The only thing the PR attempting to do, is that it might not make sense if you write (foo + bar) to have any mapping start in the middle of foo.

From a DevTools perspective the mapping for an expression ala (a + b) | 0 wouldn't matter much. The only thing we'd be interested here is where to put breakpoints and that happens 90% on the statement level.

@ehoogeveen-medweb
Copy link

Regarding whitespace or more specifically line terminators, what about minifiers replacing (whitespace including 1 or more) newlines with a ; to avoid/mimic automatic semicolon insertion? The other case I can think of where line terminators are load bearing would be for ending single-line comments, but I don't know if there's something relevant to map there.

@szuend
Copy link
Collaborator Author

szuend commented Aug 28, 2024

If a minifier replaces whitespace with a ; you would add a mapping from the ; to the whitespace which is still allowed. The whitespace would be in authored code, not in the generated bundle.

The other way around is interesting though, but the question is, if there is tooling/use cases for which a mapping from a whitespace to a ; would be relevant?

@ehoogeveen-medweb
Copy link

ehoogeveen-medweb commented Aug 28, 2024

Although it seems somewhat contrived, I guess the reverse use case would be unminifying or deobfuscating with a code formatter that omits semicolons by default. You could use a source map for highlighting the corresponding minified or obfuscated code and vice versa (probably as a serialization format rather than directly), though I suppose you could always use the reverse mapping instead.

Edit: Sorry, I don't know why that became a new comment instead of an edit.

@szuend szuend marked this pull request as draft August 28, 2024 11:32
@sjrd
Copy link

sjrd commented Aug 28, 2024

I'm not sure I fully understand. Mappings are not ranges. They only translate a single point in generated code to a single point in authored code. Encoding overlapping generated source ranges is not possible. Generators/consumers might interpret that consecutive mappings form ranges, but that's not actually spec'ed.

My understanding had always been that a mapping starts at the given offset, and ends where the next mapping starts. Which makes them range-based.

If instead they truly are points and must point to a very specific code point, and there are several consecutive tokens that must all map to the same source position, you have to emit a lot more segments, bloating the source map file.

@szuend
Copy link
Collaborator Author

szuend commented Aug 28, 2024

This is exactly the reason why I want to have the discussion and attempt to nail this down a bit in the spec.

I don't propose emitting a mapping for each token. For your use case where multiple tokens in the generated source map point to the same location in authored code, it would be fine to emit one mapping at the start and one mapping at the end of the range in the generated code. If you then throw or pause somewhere inside that generated source range, tooling would be able to find the authored point by taking the closest mapping (forwards or backwards would both work in this case).

In that sense, yes you are right and we do indeed have ranges. But we can't assume that all mappings close the range that the previous mapping point opened (or is the start of a new range). This would require generators to emit an empty (i.e. unmapped) mapping to close ranges. Which some generators may do and others may not.

I'd love to also harden the rules here, and we might be able to do so as a follow-up.

The PR here is on a more fundamental level. It only clarifies where in the generated source, mapping points are allowed. Generators can still choose to emit a mapping point for every token, just for some or for none at all.

W.r.t. to whitespace, I can see that there are valid use cases, e.g. if you'd like to mark that generated code after some token is unmapped you'd probably want to put an empty mapping there.

@szuend szuend marked this pull request as ready for review September 2, 2024 05:28
@szuend
Copy link
Collaborator Author

szuend commented Sep 2, 2024

I relaxed the restrictions around whitespace and templates as discussed in the meeting.

@nicolo-ribaudo nicolo-ribaudo added this to the October 2024 milestone Sep 11, 2024
nicolo-ribaudo
nicolo-ribaudo previously approved these changes Sep 13, 2024
source-map.bs Outdated Show resolved Hide resolved
Co-authored-by: Nicolò Ribaudo <hello@nicr.dev>
@jkup jkup merged commit 501c85c into tc39:main Sep 18, 2024
2 checks passed
github-actions bot added a commit that referenced this pull request Sep 18, 2024
SHA: 501c85c
Reason: push, by jkup

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions bot added a commit to szuend/source-map that referenced this pull request Sep 19, 2024
SHA: 501c85c
Reason: push, by szuend

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@szuend szuend deleted the clarify-mappings branch November 5, 2024 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants