Goals for evolving source maps #22

littledan · 2023-04-05T17:40:59Z

In the TC39 tools call in April 2023, a number of stakeholders discussed problems with sourcemaps. I'd summarize the goals from that as:

Evolve the existing sourcemaps v3, rather than creating a new system (e.g., based on DWARF)
Harden the spec
- Allow creation of conformance tests
- Be able to tell tools that they have a bug and should fix it (as opposed to working around everything)
- Definition of a column (defined in the spec as 0-based, Unicode interpretation ambiguous)
- Well-defined points to make associations for JS/TS (while maintaining polyglot nature of sourcemaps)
Associating the original source file with the compiled/minified file
- Harden the spec: Specify semantics of URLs and resolution
- Additional feature: IDs/hashes to associate/name
Scopes / variable / function name decoding
- Improve mappings for names, if possible
- Some workflows, e.g., reconstructing a stack trace, should be possible without parsing the minified/compiled code.

sjrd · 2023-04-06T09:17:11Z

In the "harden the spec" area, would it be possible to also make the semantics of name mapping more precise? In particular,

whether we are supposed to fill it in for every occurrence of a variable, or whether the original var/let/const definition is enough, and
whether it is meaningful on property names (field names and method names), and if yes, again on which occurrences it is necessary.

For speed, the Scala.js compiler only puts them on the definition of variables and properties, but not their usages.

On the speed aspect, would it be possible to take source map generation speed into account, and in particular to the incremental generation thereof? In the Scala.js linking pipeline, which is incremental end-to-end, source map generation is one of the most expensive steps! And that's despite recent optimizations in that area (commits of Feb-Mar 2023 here, if you're interested).

The main difficulty, compared to the rest of our pipeline, is that we cannot make it fully incremental, due to the very nature of the encoding. We can make it incremental up to the generation of so-called Fragments. However, we must linearly recompute name indices (and diffs of indices) into the name table and source URI table, which requires hash map lookups as well as Base64VLQ recomputations. This is so performance sensitive that it was worth micro-optimizing that area of the code (here and here, if you're interested), to an extent we did not do anywhere else in our codebase.

By comparison, incrementally generating the .js files is mostly a matter of sending pre-computed byte blobs directly to the target files, and is basically instantaneous.

I would love if we could have a solution for that. Perhaps a "development-mode" version of source maps, more amenable to incremental re-generation to the detriment of raw size?

Swatinem · 2023-04-06T10:01:27Z

whether we are supposed to fill it in for every occurrence of a variable, or whether the original var/let/const definition is enough, and

whether it is meaningful on property names (field names and method names), and if yes, again on which occurrences it is necessary.

Yes, it is meaningful on property names, and I would suggest to create such a name mapping and reference that for any kind of "identifier" that was renamed/generated and appears in the code.
The desire was expressed to reason about the executed code as best as possible with just the SourceMap itself, without the need to fully parse and understand the underlying (minified) code. Having such reliable mappings would go a long way to achieve that.

Related to hardening of "identifier" tokens with a name reference:

It would be very good to clearly specify where the boundary of such mappings should be in terms of whitespace.

I have seem " identifier" with leading whitespace, "identifier " with trailing whitespace and also "identifier" for which the sourcemap token does not include any whitespace.

As each token does not have an explicit end, but is rather implicitly bounded by the following token or the EOF, I would argue that tools already account for trailing whitespace or unrelated characters at the end of a token.

However, we did run into problems with leading whitespace, as in the example " identifier" here: getsentry/js-source-scopes#15

mitsuhiko · 2023-04-06T13:46:46Z

Can I propose that we open individual issues for hardening the spec? Otherwise all these proposals are likely to die out in large discussions.

jkrems · 2023-04-11T13:47:15Z

I would also add:

Under hardening the spec: Remove and/or mark things that aren't implemented anywhere, e.g. sections[].url. Hardening: Remove or at least warn about sections.url #29
Associating/mapping compiled to original source: Allow 1:n mappings or some other mechanism that allows to expand inlined function calls.

jaro-sevcik · 2023-04-12T08:54:12Z

Some notes on names (perhaps we should continue the discussion about names to Issue 6?):

We have also seen that identifiers get lumped together with subsequent commas, right parens, etc. Perhaps the spec should say that we only associate the JavaScript-identifier prefix with the particular name (I think that's what Arpad is suggesting above).
Chrome DevTools (CDT) generally needs just the variable definition to have the name associated with it.
It would be useful to have notation for hiding temporary names introduced by the compiler.
CDT does not use names for properties because it is hard to see how to apply such a rename consistently. I am really curious to hear what the names entry for a property is useful for. It feels like renamed properties is a lost battle (e.g., vscode.dev has a custom renamer that makes it nearly impossible to interpret private field names, even if the names section was properly populated).
CDT looks at the name associated with the opening paren of the function parameter list, and interprets it as a function name. This is a hack, but in the absence of scope descriptors, this is the best we could come up with to show partly deobfuscated stack traces. Some tools (e.g., terser) emit the name for that token (by coincidence, I think). I am not sure if this hack should be mentioned in the spec.

We should also revisit the "Multi-level Mapping Notes" section of the spec - "multiple levels of mapping" are not supported by any tool I know of and I do not really see how it could be supported (we do not want to parse arbitrary intermediate representations). Perhaps the section should say that code transformers (compilers, bundlers, etc.) are responsible for composing source maps if their inputs come with source maps. The consumers of source maps (debuggers, post-mortem tools) should only deal with one level of source mapping.

jridgewell · 2023-04-26T21:42:56Z

If were stipulating 0-based columns, should we also define lines? Everything I've used has 1-based (it's super confusing). And what counts as a line terminator?

Jack-Works · 2023-06-01T11:54:05Z

I prefer to have a new binary format. It can solve many problems. You can see a similar trend in HTTP to HTTP2.

Source map will be used in the future for many years, if we can make it better (easy to generate/parse/small), it will benefit a lot to the whole ecosystem more than avoiding the temporary compatibility pain.

littledan · 2023-06-02T21:04:09Z

I have trouble seeing significant benefit to a binary format to make it part of our initial goals. I'd prefer that we first focus on evolving source maps as they are. We can consider a binary encoding if it has well-demonstrated performance improvements, but initial goals are around improving accuracy, rather than performance.

connor4312 mentioned this issue Apr 5, 2023

Correlating original sources with sourcemapped scripts #21

Open

littledan mentioned this issue Apr 13, 2023

Exit criteria for a hardened specification #35

Open

littledan added the Standards meta label Apr 25, 2023

littledan mentioned this issue Apr 25, 2023

Follow-up group meeting #39

Open

kamilogorek mentioned this issue May 17, 2023

Inline sourcesContent vs. sources files: which should maintainers use? #42

Open

nicolo-ribaudo pushed a commit to nicolo-ribaudo/source-map that referenced this issue Mar 13, 2024

Merge pull request tc39#22 from JSMonk/wasm-source-map-url

cbbc033

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goals for evolving source maps #22

Goals for evolving source maps #22

littledan commented Apr 5, 2023 •

edited

Loading

sjrd commented Apr 6, 2023

Swatinem commented Apr 6, 2023

mitsuhiko commented Apr 6, 2023

jkrems commented Apr 11, 2023 •

edited

Loading

jaro-sevcik commented Apr 12, 2023

jridgewell commented Apr 26, 2023

Jack-Works commented Jun 1, 2023

littledan commented Jun 2, 2023

Goals for evolving source maps #22

Goals for evolving source maps #22

Comments

littledan commented Apr 5, 2023 • edited Loading

sjrd commented Apr 6, 2023

Swatinem commented Apr 6, 2023

mitsuhiko commented Apr 6, 2023

jkrems commented Apr 11, 2023 • edited Loading

jaro-sevcik commented Apr 12, 2023

jridgewell commented Apr 26, 2023

Jack-Works commented Jun 1, 2023

littledan commented Jun 2, 2023

littledan commented Apr 5, 2023 •

edited

Loading

jkrems commented Apr 11, 2023 •

edited

Loading