-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Source Maps v4 (or v3.1): Improved post-hoc debuggability #12
Comments
Here are my opinions on that. VersioningAnything that is not but that's not necessary bad. e. g. a hard version change would allow to extend Personally I would go with Storing function namesOption 1: Store in the existing “names” field Keying “scopes” by generated index VS source indexOption 2: Key in “scopes” is generated line and column
Nested scopes VS linear scopesI'm fine with both. If the encoding of nested scopes is more efficient we can go for that. Format of “scopes” fieldI want to add another option that is possible when using nested scopes: Option 3: Starting and end points And end point would leave the current scope and goes back to the parent scope. It's encoded with only the location information (here generated column, alternatively original line and column) Relative VS absolute indicesRelative indicies They make repeated indicies cheaper, since smaller numbers are shorter in VLQ. Naming of functionsOut of spec, but maybe the spec can recommend something. It depends on the source language. I really would like to see the following feature in "scopes": Defining name bindings per scope.Example: function setName(person, name) {
const magicNumber = 42;
person.number = magicNumber;
person.name = name;
} might be minimized to: function x(p,n){p.m=42,p.n=n} and it would be great to have to following information attached to the scope:
where the left side is a variable binding (in original code) and the right side is an javascript expression (in generated code). Devtools can show these bindings in the side bar (expression evaluated to a value) and maybe allow writing them in the console to view them. It could be implemented as pairs of 2 VLQ values into the Devtools usually allow to access bindings from parent scopes too, so they do not have to be repeated. Devtools would probably show both, mapped bindings and generated bindings, but maybe they could collapse generated bindings by default so only mapped bindings are shown by default. |
Storing function namesI would just keep in Keying “scopes” by generated index VS source index / Format of “scopes” field(These topics are too similar to disentangle) Keying based on the generated position would be more consistent, and considerably easier to work with. It can be decoded at the same time as Tooling will need to update to support the I would also propose a 3rd option here, as I describe in MicrosoftEdge/MSEdgeExplainers#538 (comment). If we use the segment's index, and with everything being part of the current scope until we hit the next indicated Nested scopes VS linear scopesNested scopes should generate smaller outputs, so I would vote that. VersioningIs there any compliance tester that would error? Most things just ignore fields that aren't known to the code, so adding a Defining name bindings per scope.Isn't this already handled by the |
Agree that "Keying Format of
|
I think a risk with keeping v3 is that intermediate libraries may silently drop the additional field (e.g. https://github.com/mozilla/source-map/blob/58819f09018d56ef84dc41ba9c93f554e0645169/lib/source-map-generator.js#L404-L428) and it may be hard to tell why. A clean failure may be preferable here? My only somewhat strong opinion re: versioning would be "either keep 3 or go to 4". 3.1 and 4 are effectively equivalent in terms of ecosystem impact and 4 is nice and succinct. Because forwarding additional fields is an explicit feature in many flows, there is effectively no such thing as a "minor" bump to the version. |
The idea with V3 was that additional fields that don't change meaning of the existing fields would be not increment the version number. There are many things that would happily work with "3" with additional fields, that would have to reject "4". It would be a minor update, but V4 presumably would not be compatible with older browsers (but perhaps they simply ignore the version, as I haven't checked the behavior). Simply "4" would force tools to make decisions about what they want to do with the new fields, "3" would allow backward compatibility. Personally, I favor backward compatibility unless the meaning of the existing fields actually change. I don't really want tools to have to generate both "3" and "4" versions of the file. If the meaning of the existing fields do change (other than the version) "4" is of course the only real choice. |
Right. I think this is the big dilemma in the versioning discussion, and in general, I tend to agree with the notion that "3.1 and 4 are effectively equivalent". Keeping 3 and adding fields (whilst being non-breaking) "feels JavaScripty," and I can confirm that Chromium doesn't break if we just choose to update to version 4. But I can't speak for other browsers, or even libraries. @ldarbi :
I'm not sure I understand what this represents. Are we keying into Key by source"scopes": [
['Global code', 1, 1, 23, 1], ['foo', 2, 1, 19, 1], ['bar', 4, 5, 14, 5]
] Key by generated
One thing about the "Key by generated" here is that it would be nice to include information that functions have been inlined. I think I'll keep a note here at the top about the "it'd also be nice to solve these problems" topics, so feel free to note that ongoing.
I'm not sure why that's such an issue. We already do a binary lookup on the |
I maintain But, this was before I read #12 (comment) 's "2. Separation of concerns for tooling" section. If we do scopes by source location, then I don't have to remap any of the line/columns (only change the source index). That works fine for me. |
In Chrome DevTools, we see several areas where source maps v3 do not work well (in the order of decreasing importance):
Unfortunately, the Source Map v4 proposal only addresses point 3. If we are to change the source map spec, it would be nice to solve the other problems, too. Or at least leave some space for solutions. Note, that some of the points (1 and partly 2) would be addressed by Sokra's comment above. I am wondering if we could perhaps design the new source maps so that they are extensible. I am thinking something along the lines of DWARF, where one defines abbreviations to represent their subset of debug attributes in the .debug_abbrev section and then emits debug info using the abbreviations in the .debug_info section (see a short example here). I started exploring the idea applied to source maps in a slide deck here. |
I think a more aggressive design iteration would be super interesting but it may be easier to discuss in a separate issue, given the poor threading support in Github issues.
It looks like the slide deck isn't publicly available? |
@robpaveza Yes these are column numbers. I was trying to represent what @sokra was suggesting here using a human readable format with absolute column numbers. You have [ In my example
we have top-level scope before column 15, from column 15 we start For the recommended "Key by Source" approach you are right that we can include the top-level scope too, in fact we do do this in pasta-sourcemaps. I omitted it in my write up here because I wanted to make it easy to compare with the "Key by Generated" approach and it still works because you can still infer that anything that doesn't belong to a scope is top-level. |
Perhaps it is also worth noting that Dart compiled to JavaScript has been using "battle-proven" source map extensions for more than three years. A somewhat outdated documentation is here (the latest version uses VLQ encodings). Compared to pasta-sourcemaps, the Dart extensions also describe inlining - they are keyed by generated positions. The Dart source maps also describe property and global renames, but I am not sure if those extensions are flexible enough. Since the renames are global, they cannot describe renames that happen only in some parts of the code (property renames are tricky to describe for other reasons, too). |
ICYMI, it looks like there's also some prior art from back in 2015 in #4 (rendered). From a high-level skim, it seems to share some of the ideas from the DWARF-inspired proposal in @jaro-sevcik's slides. |
FYI there is a call today at 5pm UTC happening here https://meet.google.com/iqb-pdpn-evh to continue the discussion of this topic. Meeting notes here (includes notes from the call two weeks ago): https://docs.google.com/document/d/1RlnAnMa4QzQUK_tNOHdWfnske2X9KZ_e90VBG-DWqUo/edit |
I labeled this as "v4" because it is the commonly used short-hand for "next iteration". As discussed earlier, we may very well keep the version "3" in the actual JSON format and just amend the 3rd revision. But it's a nice and short label as a starting point. |
Sorry for my delay getting back here -- I was just starting to get sick when we last met, and then I was out of it for a fairly long time. @jaro-sevcik I'm not certain that the direction that you mentioned with respect to the DWARF-style encoding makes sense. I've given that a lot of thought offline, but the primary concern that I have is that it's a relatively radical departure from our previous style of encoding. While neither are human-readable by default, it introduces another variable. Because any additional "feature" of Source Maps is going to need to be coded in both the producers and consumers regardless, we need to:
Consider the following sample TypeScript. This sample is deliberately contrived to be weird to make an example about source locations and scopes: 1 class Example {
2 constructor() {
3 const sample: number[] = [];
4 let evens = 0;
5 for (let i = 0; i < 1000; i++) {
6 sample.push(i);
7 if (i % 2 === 0) {
8 let i = evens;
9 evens = i + 1;
10 }
11 }
12 this.sample = sample;
13 this.evens = evens;
14 }
15 } I would propose that the initial compiler is responsible for identifying a range of possible source locations which are interesting. This is presently identified as Then, we can provide that source-level information via additional fields, and the additional fields can be optionally included as needed. So if we wanted to introduce the example of I have the next week and a half dedicated to focus in this space, and will continue to have time to focus even afterwards. I'm planning to do some work around automated testing, perhaps creating an automatic test tool (although of course subject to the spec discussions we're all working through). |
Oh, I forgot that I called out this specific point:
My suggestion here is that top-level fields in the source map should just be passed-through, unless we're synthesizing multiple files (such as in the case of bundlers), in which case they should be converted to arrays of whatever they were in the original maps. This is probably a break with respect to some set of tools that exist today, and it is logic that will need to be added to the parsers; but it provides the most direct and forward-looking answer we can reasonably provide. |
I'd like to propose that we move to #16 to try to come to a consensus about how we make decisions around this. Then we can break out individual conversations about features, implementation, etc. |
I recently became aware of Node's stack trace unminification, and how it fails to determine the unminified name of the enclosing function which throws the error. See jestjs/jest#12786 (comment) for full context. When adding scopes and scope names, we should take care that the scope begins at the first keyword for functions. Eg, for |
I've closed #16. We landed on just the three principles:
However, we're still discussing the ordered priority here. @jkrems proposes that size footprint should raise higher, because increased size results in increased resources utilization in build tools. My only observation is, obviously, more features will lead to higher resource utilization. That having been said, when I made my sample draft PR for TypeScript, I don't believe it used substantially more resources when we weren't opting into the v4 behavior. Put separately, the user can opt into more features, therefore more resource utilization. |
@robpaveza As for the DWARF encoding, I agree that it is a departure from the current source map encoding style and perhaps not very consistent with the rest of source map spec. However, let me note that the typed nature of the DWARF abbreviation encodings would allow describing the meaning of unknown fields because the fields are typed. We could distinguish the generated versus code positions in the type. If the tools understood the types, they could translate those fields accordingly (even though it still gets tricky with things like inlining, where the mapping is not 1:1). That said, if you are planning to introduce the source level scope descriptions and some info on their names (and convince the key players in the ecosystem to adopt this), we would happily consume that in Chrome DevTools to symbolize stack traces. It is unclear to me how the tools would understand how to update the indexes in the <scope-id, name-id> tuples when joining source files together without knowing the semantics of the tuples, but that is more of a questions for bundler developers. |
We (@getsentry) have a lot of interest in scope information. As of recently we have started parsing minified source code to reconstruct scope information as the previous heuristics to reconstruct function names just don't work well. (We wrote about this here). The technical implementation for this can be found here: https://github.com/getsentry/js-source-scopes/ Obviously this is a pretty crude way of resolving this issue and it has a lot of restrictions still. In particular we are using the minified sources which do not have the right source information in all cases. From our perspective longer term source maps probably should be replaced. Particularly for us operating with JSON files at scale has always been quite frustrating and the standard leaves a lot to be desired. Facebook has actually extended the source map format with the Metro bundler for react-native to include some scope information ( I'm super happy to see that something is moving about scope information but since we probably already need to touch a lot of tooling in the process, I wonder if it's not time to start seriously considering actually using DWARF for this instead. |
Specify precedence of header over inline annotation for source map url
I believe this is subsumed by the current scopes proposal. If there is anything missing, we should add it as a follow up to the main proposal. |
Hello! At a recent TC-39 JS Tools meeting we decided to bring the proposal for v4 here as a neutral discussion point.
The proposal document is here: MicrosoftEdge/MSEdgeExplainers#538
(I'd be happy to stage a Pull Request against this or another repo where appropriate. We discussed whether to update the Google Doc which is the "official" standard, but given it's been 8-9 years since that document was last updated, I'm not sure what the best way is to formalize and/or collaborate on that document, vs. treating it as an "artifact").
TL;DR
The proposal is to add (one or two) new field(s) to the existing Source Maps document. This would map text ranges of the source files to a "scope", which would describe the lexical subroutine from the source language. For example:
This would produce the following list of scopes:
constructor for class Example
anonymous callback for document.addEventListener
(Please note: the names of these scopes is up for discussion).
This would enable a series of improvements wherever tooling exposes minified code artifacts related to the call stack:
Error.stack
)Primary Points of Discussion
While the discussion group believes that solving these problems is valuable, there have been the following major points of discussion:
Storing function names
Option 1: Store in the existing
“names”
fieldOption 2: Introduce a new
“scopeNames”
fieldThe main benefit of Option 1 is that it reduces sourcemap size and avoids duplication. This is because many tools already populate the “names” field with many of the function names and they can be directly referenced in the
“scopes”
field.The main downside is that if any tool modifies the
“names”
field by removing an element or changing the order, they will need to become “scopes”-aware and update it accordingly.Recommendation: Option 1 is preferred unless there is evidence that the
“names”
field is tampered with by tools in practice.Keying
“scopes”
by generated index VS source indexOption 1: Key in
“scopes”
is source line and columnOption 2: Key in
“scopes”
is generated line and columnThe main benefit of Option 1 is that only the first tool in a pipeline, and any tool that combines sources, needs to support the new
“scopes”
field. Other tools can ignore it. Note that the use case for decoding a function name will always involve decoding back to the original source location anyway, so this option does not add extra complexity.Option 2 has a benefit of using the same approach as the existing
“mappings”
field so can feel more consistent.Recommendation: Option 1 is preferred
Nested scopes VS linear scopes
Option 1: Scopes are nested
Option 2: Scopes are linear
Option 1 has the benefit of representing the logical structure of the code, as well as requiring only one VLQ per function. With Option 2, a function with many inner functions will need to be repeatedly referenced every time an inner function ends.
Recommendation: Option 1 is preferred
Format of
“scopes”
fieldOption 1: Starting points
Option 2: Ranges
Option 1 has the benefit of being consistent with the existing
“mappings”
field and tools can reuse implementation. It also has the benefit of resulting in smaller sourcemaps due to only specifying starting locations and a scope name index.Option 2 results in larger sourcemaps. However, it has the benefit of representing the logical structure of the code better with a 1-1 mapping between a function and its VLQ.
Recommendation: Option 2 is preferred
Relative VS absolute indices
This will very much depend on the format chosen in the decision point above.
Versioning
Option 1: Version 4
Option 2: Version 3.1
Option 3: Retain version 3, but just add new fields
Option 1 has the benefit of indicating a major version update. Option 2 has the benefit of indicating a minor version update but that the update is non-breaking (conforming to semver). Option 3 has the benefit of being very JavaScripty, in that new fields can just be detected and light up, but might break strict compliance testers.
Recommendation: Option 1 or 2 is preferred; Option 3 is likely to break one or more validators.
Naming of functions
This decision point relates to the naming convention of functions. While a free
function f()
will of course be namedf
, there are more choices available for other types of functions. For example, does a class member function getClass.prototype.f
orClass.f
as its name, or how do you name an anonymous function? These decisions probably don’t belong in the spec, but it would be useful to have a common naming convention across tools.Other things that might be nice to solve in the future
This section was added by an edit
[ ] Information about functions that were inlined during compilation (surfaced by @ldarbi below)
[ ] Direct information about aliasing of variables (this was mentioned by @rbuckton during the call and @sokra below)
The text was updated successfully, but these errors were encountered: