[mappings] Define for each ECMAScript input element where mappings are allowed #124

szuend · 2024-08-28T09:32:26Z

This PR is a initial take to at least clarify on a very basic level where we allow mappings for JavaScript. The current draft proposes to use "input elements" instead of "ECMAScript tokens" since generators might want to map comments as well (e.g. a TypeScript -> JavaScript with JSDoc transformer that maps the @param back to the TS type annotation).

The main reasoning behind the current draft is that it seldom make sense to have mappings in the middle of tokens. E.g. with let foo = 5;, you don't really need a mapping for the o in foo. The exception (I think) are string literals where minifiers could combine multiline string literals into a single literal. Template literals might also fall under this category so we could move them down to the second bullet point.

I don't see a use case for mapping white space and line terminators, but if there is a valid use-case we should also move them to the second bullet point.

Note: I plan to do a followup where we recommend some AST productions where generators "should" emit mappings to better facilitate stack trace deobfuscation/debugging/etc. But this is kinda tricky since engines don't really agree on where Error.stack frames point to w.r.t. to the ECMAScript AST, so we'd have to cover all or at least the major use cases in the major browsers/runtimes.

…e allowed

sjrd · 2024-08-28T09:40:44Z

Mappings in the generated source have range positions. Why are we restricting them to only point to the first code point? It seems more natural to point to the entire range that is produced.

For example, for an identifier in source that becomes an identifier in the target, you'd want the mapping to cover the entire identifier in the target.

Likewise, for an operation such as an addition in source a + b, the produced code could look like (a + b) | 0 if it implements arithmetic modulo 0, and is pretty-printed. In that case, arguably, all the characters of that string, except the substrings a and b, should map to the source + operator. That should include whitespace, if only to reduce the number of required mappings.

Overall I'm not sure what the proposed restrictions are meant to achieve? Can the consumers benefit from those restrictions for some reason?

szuend · 2024-08-28T09:57:11Z

I'm not sure I fully understand. Mappings are not ranges. They only translate a single point in generated code to a single point in authored code. Encoding overlapping generated source ranges is not possible. Generators/consumers might interpret that consecutive mappings form ranges, but that's not actually spec'ed.

For your example, you could encode it in a couple different ways. Either map the generated ( to the authored a and the next non-whitespace token after 0 to after b. Or just map the a + b with empty mappings for the parens and | 0 or add mappings for a , + and b) individually.

The only thing the PR attempting to do, is that it might not make sense if you write (foo + bar) to have any mapping start in the middle of foo.

From a DevTools perspective the mapping for an expression ala (a + b) | 0 wouldn't matter much. The only thing we'd be interested here is where to put breakpoints and that happens 90% on the statement level.

ehoogeveen-medweb · 2024-08-28T10:03:18Z

Regarding whitespace or more specifically line terminators, what about minifiers replacing (whitespace including 1 or more) newlines with a ; to avoid/mimic automatic semicolon insertion? The other case I can think of where line terminators are load bearing would be for ending single-line comments, but I don't know if there's something relevant to map there.

szuend · 2024-08-28T10:27:09Z

If a minifier replaces whitespace with a ; you would add a mapping from the ; to the whitespace which is still allowed. The whitespace would be in authored code, not in the generated bundle.

The other way around is interesting though, but the question is, if there is tooling/use cases for which a mapping from a whitespace to a ; would be relevant?

ehoogeveen-medweb · 2024-08-28T10:47:24Z

Although it seems somewhat contrived, I guess the reverse use case would be unminifying or deobfuscating with a code formatter that omits semicolons by default. You could use a source map for highlighting the corresponding minified or obfuscated code and vice versa (probably as a serialization format rather than directly), though I suppose you could always use the reverse mapping instead.

Edit: Sorry, I don't know why that became a new comment instead of an edit.

sjrd · 2024-08-28T11:50:11Z

I'm not sure I fully understand. Mappings are not ranges. They only translate a single point in generated code to a single point in authored code. Encoding overlapping generated source ranges is not possible. Generators/consumers might interpret that consecutive mappings form ranges, but that's not actually spec'ed.

My understanding had always been that a mapping starts at the given offset, and ends where the next mapping starts. Which makes them range-based.

If instead they truly are points and must point to a very specific code point, and there are several consecutive tokens that must all map to the same source position, you have to emit a lot more segments, bloating the source map file.

szuend · 2024-08-28T12:26:20Z

This is exactly the reason why I want to have the discussion and attempt to nail this down a bit in the spec.

I don't propose emitting a mapping for each token. For your use case where multiple tokens in the generated source map point to the same location in authored code, it would be fine to emit one mapping at the start and one mapping at the end of the range in the generated code. If you then throw or pause somewhere inside that generated source range, tooling would be able to find the authored point by taking the closest mapping (forwards or backwards would both work in this case).

In that sense, yes you are right and we do indeed have ranges. But we can't assume that all mappings close the range that the previous mapping point opened (or is the start of a new range). This would require generators to emit an empty (i.e. unmapped) mapping to close ranges. Which some generators may do and others may not.

I'd love to also harden the rules here, and we might be able to do so as a follow-up.

The PR here is on a more fundamental level. It only clarifies where in the generated source, mapping points are allowed. Generators can still choose to emit a mapping point for every token, just for some or for none at all.

W.r.t. to whitespace, I can see that there are valid use cases, e.g. if you'd like to mark that generated code after some token is unmapped you'd probably want to put an empty mapping there.

szuend · 2024-09-02T05:29:12Z

I relaxed the restrictions around whitespace and templates as discussed in the meeting.

.

source-map.bs

Co-authored-by: Nicolò Ribaudo <hello@nicr.dev>

SHA: 501c85c Reason: push, by jkup Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

SHA: 501c85c Reason: push, by szuend Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

[mappings] Define for each ECMAScript input element where mappings ar…

42fe607

…e allowed

szuend requested a review from nicolo-ribaudo August 28, 2024 09:32

szuend marked this pull request as draft August 28, 2024 11:32

Relax mapping restrictions for templates and whitespace

28a4431

szuend marked this pull request as ready for review September 2, 2024 05:28

nicolo-ribaudo added this to the October 2024 milestone Sep 11, 2024

rbuckton approved these changes Sep 12, 2024

View reviewed changes

nicolo-ribaudo previously approved these changes Sep 13, 2024

View reviewed changes

nicolo-ribaudo reviewed Sep 13, 2024

View reviewed changes

source-map.bs Outdated Show resolved Hide resolved

Update source-map.bs

763ae35

Co-authored-by: Nicolò Ribaudo <hello@nicr.dev>

nicolo-ribaudo approved these changes Sep 18, 2024

View reviewed changes

jkup merged commit 501c85c into tc39:main Sep 18, 2024
2 checks passed

github-actions bot added a commit that referenced this pull request Sep 18, 2024

Merge pull request #124 from szuend/clarify-mappings

91f96ea

SHA: 501c85c Reason: push, by jkup Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

szuend deleted the clarify-mappings branch November 5, 2024 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mappings] Define for each ECMAScript input element where mappings are allowed #124

[mappings] Define for each ECMAScript input element where mappings are allowed #124

szuend commented Aug 28, 2024

sjrd commented Aug 28, 2024

szuend commented Aug 28, 2024

ehoogeveen-medweb commented Aug 28, 2024

szuend commented Aug 28, 2024

ehoogeveen-medweb commented Aug 28, 2024 •

edited

Loading

sjrd commented Aug 28, 2024

szuend commented Aug 28, 2024

szuend commented Sep 2, 2024

[mappings] Define for each ECMAScript input element where mappings are allowed #124

[mappings] Define for each ECMAScript input element where mappings are allowed #124

Conversation

szuend commented Aug 28, 2024

sjrd commented Aug 28, 2024

szuend commented Aug 28, 2024

ehoogeveen-medweb commented Aug 28, 2024

szuend commented Aug 28, 2024

ehoogeveen-medweb commented Aug 28, 2024 • edited Loading

sjrd commented Aug 28, 2024

szuend commented Aug 28, 2024

szuend commented Sep 2, 2024

ehoogeveen-medweb commented Aug 28, 2024 •

edited

Loading