Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goals for evolving source maps #22

Open
littledan opened this issue Apr 5, 2023 · 8 comments
Open

Goals for evolving source maps #22

littledan opened this issue Apr 5, 2023 · 8 comments

Comments

@littledan
Copy link
Member

littledan commented Apr 5, 2023

In the TC39 tools call in April 2023, a number of stakeholders discussed problems with sourcemaps. I'd summarize the goals from that as:

  • Evolve the existing sourcemaps v3, rather than creating a new system (e.g., based on DWARF)
  • Harden the spec
    • Allow creation of conformance tests
    • Be able to tell tools that they have a bug and should fix it (as opposed to working around everything)
    • Definition of a column (defined in the spec as 0-based, Unicode interpretation ambiguous)
    • Well-defined points to make associations for JS/TS (while maintaining polyglot nature of sourcemaps)
  • Associating the original source file with the compiled/minified file
    • Harden the spec: Specify semantics of URLs and resolution
    • Additional feature: IDs/hashes to associate/name
  • Scopes / variable / function name decoding
    • Improve mappings for names, if possible
    • Some workflows, e.g., reconstructing a stack trace, should be possible without parsing the minified/compiled code.
@sjrd
Copy link

sjrd commented Apr 6, 2023

In the "harden the spec" area, would it be possible to also make the semantics of name mapping more precise? In particular,

  • whether we are supposed to fill it in for every occurrence of a variable, or whether the original var/let/const definition is enough, and
  • whether it is meaningful on property names (field names and method names), and if yes, again on which occurrences it is necessary.

For speed, the Scala.js compiler only puts them on the definition of variables and properties, but not their usages.


On the speed aspect, would it be possible to take source map generation speed into account, and in particular to the incremental generation thereof? In the Scala.js linking pipeline, which is incremental end-to-end, source map generation is one of the most expensive steps! And that's despite recent optimizations in that area (commits of Feb-Mar 2023 here, if you're interested).

The main difficulty, compared to the rest of our pipeline, is that we cannot make it fully incremental, due to the very nature of the encoding. We can make it incremental up to the generation of so-called Fragments. However, we must linearly recompute name indices (and diffs of indices) into the name table and source URI table, which requires hash map lookups as well as Base64VLQ recomputations. This is so performance sensitive that it was worth micro-optimizing that area of the code (here and here, if you're interested), to an extent we did not do anywhere else in our codebase.

By comparison, incrementally generating the .js files is mostly a matter of sending pre-computed byte blobs directly to the target files, and is basically instantaneous.

I would love if we could have a solution for that. Perhaps a "development-mode" version of source maps, more amenable to incremental re-generation to the detriment of raw size?

@Swatinem
Copy link
Member

Swatinem commented Apr 6, 2023

  • whether we are supposed to fill it in for every occurrence of a variable, or whether the original var/let/const definition is enough, and

  • whether it is meaningful on property names (field names and method names), and if yes, again on which occurrences it is necessary.

Yes, it is meaningful on property names, and I would suggest to create such a name mapping and reference that for any kind of "identifier" that was renamed/generated and appears in the code.
The desire was expressed to reason about the executed code as best as possible with just the SourceMap itself, without the need to fully parse and understand the underlying (minified) code. Having such reliable mappings would go a long way to achieve that.


Related to hardening of "identifier" tokens with a name reference:

It would be very good to clearly specify where the boundary of such mappings should be in terms of whitespace.

I have seem " identifier" with leading whitespace, "identifier " with trailing whitespace and also "identifier" for which the sourcemap token does not include any whitespace.

As each token does not have an explicit end, but is rather implicitly bounded by the following token or the EOF, I would argue that tools already account for trailing whitespace or unrelated characters at the end of a token.

However, we did run into problems with leading whitespace, as in the example " identifier" here: getsentry/js-source-scopes#15

@mitsuhiko
Copy link
Contributor

Can I propose that we open individual issues for hardening the spec? Otherwise all these proposals are likely to die out in large discussions.

@jkrems
Copy link
Contributor

jkrems commented Apr 11, 2023

I would also add:

  • Under hardening the spec: Remove and/or mark things that aren't implemented anywhere, e.g. sections[].url. Hardening: Remove or at least warn about sections.url #29
  • Associating/mapping compiled to original source: Allow 1:n mappings or some other mechanism that allows to expand inlined function calls.

@jaro-sevcik
Copy link
Contributor

Some notes on names (perhaps we should continue the discussion about names to Issue 6?):

  • We have also seen that identifiers get lumped together with subsequent commas, right parens, etc. Perhaps the spec should say that we only associate the JavaScript-identifier prefix with the particular name (I think that's what Arpad is suggesting above).
  • Chrome DevTools (CDT) generally needs just the variable definition to have the name associated with it.
  • It would be useful to have notation for hiding temporary names introduced by the compiler.
  • CDT does not use names for properties because it is hard to see how to apply such a rename consistently. I am really curious to hear what the names entry for a property is useful for. It feels like renamed properties is a lost battle (e.g., vscode.dev has a custom renamer that makes it nearly impossible to interpret private field names, even if the names section was properly populated).
  • CDT looks at the name associated with the opening paren of the function parameter list, and interprets it as a function name. This is a hack, but in the absence of scope descriptors, this is the best we could come up with to show partly deobfuscated stack traces. Some tools (e.g., terser) emit the name for that token (by coincidence, I think). I am not sure if this hack should be mentioned in the spec.

We should also revisit the "Multi-level Mapping Notes" section of the spec - "multiple levels of mapping" are not supported by any tool I know of and I do not really see how it could be supported (we do not want to parse arbitrary intermediate representations). Perhaps the section should say that code transformers (compilers, bundlers, etc.) are responsible for composing source maps if their inputs come with source maps. The consumers of source maps (debuggers, post-mortem tools) should only deal with one level of source mapping.

@jridgewell
Copy link
Member

If were stipulating 0-based columns, should we also define lines? Everything I've used has 1-based (it's super confusing). And what counts as a line terminator?

@Jack-Works
Copy link
Member

I prefer to have a new binary format. It can solve many problems. You can see a similar trend in HTTP to HTTP2.

Source map will be used in the future for many years, if we can make it better (easy to generate/parse/small), it will benefit a lot to the whole ecosystem more than avoiding the temporary compatibility pain.

@littledan
Copy link
Member Author

I have trouble seeing significant benefit to a binary format to make it part of our initial goals. I'd prefer that we first focus on evolving source maps as they are. We can consider a binary encoding if it has well-demonstrated performance improvements, but initial goals are around improving accuracy, rather than performance.

nicolo-ribaudo pushed a commit to nicolo-ribaudo/source-map that referenced this issue Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants