Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Source Maps v4 (or v3.1): Improved post-hoc debuggability #12

Closed
robpaveza opened this issue Jan 19, 2022 · 22 comments
Closed

Comments

@robpaveza
Copy link

robpaveza commented Jan 19, 2022

Hello! At a recent TC-39 JS Tools meeting we decided to bring the proposal for v4 here as a neutral discussion point.

The proposal document is here: MicrosoftEdge/MSEdgeExplainers#538

(I'd be happy to stage a Pull Request against this or another repo where appropriate. We discussed whether to update the Google Doc which is the "official" standard, but given it's been 8-9 years since that document was last updated, I'm not sure what the best way is to formalize and/or collaborate on that document, vs. treating it as an "artifact").

TL;DR

The proposal is to add (one or two) new field(s) to the existing Source Maps document. This would map text ranges of the source files to a "scope", which would describe the lexical subroutine from the source language. For example:

1  class Example {
2    constructor() {
3      document.addEventListener('ready', () => {
4      });
5    }
6  }

This would produce the following list of scopes:

  • 2:2 ... 5:2 named constructor for class Example
  • 3:46 ... 4:4 names anonymous callback for document.addEventListener

(Please note: the names of these scopes is up for discussion).

This would enable a series of improvements wherever tooling exposes minified code artifacts related to the call stack:

  • Call stack window
  • Output call stacks (Error.stack)
  • Performance traces
  • Memory snapshots
  • Exported HARs

Primary Points of Discussion

While the discussion group believes that solving these problems is valuable, there have been the following major points of discussion:

Storing function names

Option 1: Store in the existing “names” field
Option 2: Introduce a new “scopeNames” field

The main benefit of Option 1 is that it reduces sourcemap size and avoids duplication. This is because many tools already populate the “names” field with many of the function names and they can be directly referenced in the “scopes” field.
The main downside is that if any tool modifies the “names” field by removing an element or changing the order, they will need to become “scopes”-aware and update it accordingly.

Recommendation: Option 1 is preferred unless there is evidence that the “names” field is tampered with by tools in practice.

Keying “scopes” by generated index VS source index

Option 1: Key in “scopes” is source line and column
Option 2: Key in “scopes” is generated line and column

The main benefit of Option 1 is that only the first tool in a pipeline, and any tool that combines sources, needs to support the new “scopes” field. Other tools can ignore it. Note that the use case for decoding a function name will always involve decoding back to the original source location anyway, so this option does not add extra complexity.
Option 2 has a benefit of using the same approach as the existing “mappings” field so can feel more consistent.

Recommendation: Option 1 is preferred

Nested scopes VS linear scopes

Option 1: Scopes are nested
Option 2: Scopes are linear

Option 1 has the benefit of representing the logical structure of the code, as well as requiring only one VLQ per function. With Option 2, a function with many inner functions will need to be repeatedly referenced every time an inner function ends.

Recommendation: Option 1 is preferred

Format of “scopes” field

Option 1: Starting points
Option 2: Ranges

Option 1 has the benefit of being consistent with the existing “mappings” field and tools can reuse implementation. It also has the benefit of resulting in smaller sourcemaps due to only specifying starting locations and a scope name index.
Option 2 results in larger sourcemaps. However, it has the benefit of representing the logical structure of the code better with a 1-1 mapping between a function and its VLQ.

Recommendation: Option 2 is preferred

Relative VS absolute indices

This will very much depend on the format chosen in the decision point above.

Versioning

Option 1: Version 4
Option 2: Version 3.1
Option 3: Retain version 3, but just add new fields

Option 1 has the benefit of indicating a major version update. Option 2 has the benefit of indicating a minor version update but that the update is non-breaking (conforming to semver). Option 3 has the benefit of being very JavaScripty, in that new fields can just be detected and light up, but might break strict compliance testers.

Recommendation: Option 1 or 2 is preferred; Option 3 is likely to break one or more validators.

Naming of functions

This decision point relates to the naming convention of functions. While a free function f() will of course be named f, there are more choices available for other types of functions. For example, does a class member function get Class.prototype.f or Class.f as its name, or how do you name an anonymous function? These decisions probably don’t belong in the spec, but it would be useful to have a common naming convention across tools.

Other things that might be nice to solve in the future

This section was added by an edit

[ ] Information about functions that were inlined during compilation (surfaced by @ldarbi below)
[ ] Direct information about aliasing of variables (this was mentioned by @rbuckton during the call and @sokra below)

@sokra
Copy link
Collaborator

sokra commented Jan 19, 2022

Here are my opinions on that.

Versioning

Anything that is not "version": 3 will break some tools. e. g. https://github.com/mozilla/source-map/blob/58819f09018d56ef84dc41ba9c93f554e0645169/lib/source-map-consumer.js#L191-L195

but that's not necessary bad. e. g. a hard version change would allow to extend mappings and makes it easier to find unsupported tools in the pipeline.

Personally I would go with "version": 4 can accept the breaking change. We would have to take less care about breaking stuff. Tooling need to be eventually updated anyway and it isn't that much tooling that need to be updated...

Storing function names

Option 1: Store in the existing “names” field

Keying “scopes” by generated index VS source index

Option 2: Key in “scopes” is generated line and column

  • It's more similar to the mappings field, making the source map format more consistent.
  • It allows to define scopes in generated code. e. g. when during transformations additional helper code is inserted, e. g. polyfills, regenerator
  • The existing more compact encoding with ; (generated line change) and , (next mapping) can be used

Nested scopes VS linear scopes

I'm fine with both. If the encoding of nested scopes is more efficient we can go for that.

Format of “scopes” field

I want to add another option that is possible when using nested scopes:

Option 3: Starting and end points

And end point would leave the current scope and goes back to the parent scope. It's encoded with only the location information (here generated column, alternatively original line and column)

Relative VS absolute indices

Relative indicies

They make repeated indicies cheaper, since smaller numbers are shorter in VLQ.

Naming of functions

Out of spec, but maybe the spec can recommend something. It depends on the source language.


I really would like to see the following feature in "scopes":

Defining name bindings per scope.

Example:

function setName(person, name) {
  const magicNumber = 42;
  person.number = magicNumber;
  person.name = name;
}

might be minimized to:

function x(p,n){p.m=42,p.n=n}

and it would be great to have to following information attached to the scope:

test = x
person = p
name = n
person.name = p.n
person.number = p.m
magicNumber = 42

where the left side is a variable binding (in original code) and the right side is an javascript expression (in generated code). Devtools can show these bindings in the side bar (expression evaluated to a value) and maybe allow writing them in the console to view them.

It could be implemented as pairs of 2 VLQ values into the names array, which are listed at the scope starting point.

Devtools usually allow to access bindings from parent scopes too, so they do not have to be repeated.

Devtools would probably show both, mapped bindings and generated bindings, but maybe they could collapse generated bindings by default so only mapped bindings are shown by default.

@jridgewell
Copy link
Member

Storing function names

I would just keep in names. I'm not aware of any tool will error if there are extra names in the array. Any tool that drops unused names would need to be updated, but then again they would need to be updated not to drop the scopes field.

Keying “scopes” by generated index VS source index / Format of “scopes” field

(These topics are too similar to disentangle)

Keying based on the generated position would be more consistent, and considerably easier to work with. It can be decoded at the same time as mappings field with only an extra index pointer. Doing this by source location will be more difficult, because the source location can jump around arbitrarily. We'd have to perform binary lookups on the scopes field, it's just more of a pain for no benefit.

Tooling will need to update to support the scopes field so it isn't just dropped from the output map. When updating, we can add actual support for translating scopes as needed. It should not be difficult.

I would also propose a 3rd option here, as I describe in MicrosoftEdge/MSEdgeExplainers#538 (comment). If we use the segment's index, and with everything being part of the current scope until we hit the next indicated scope index, we'd need only 2 ints per scope transition (the proposed "Starting points" choice requires 3 ints, the "Ranges" option requires 5 ints). This also supports the Nested scopes option, which would only need 3 ints per scope instead of per scope transition.

Nested scopes VS linear scopes

Nested scopes should generate smaller outputs, so I would vote that.

Versioning

Is there any compliance tester that would error? Most things just ignore fields that aren't known to the code, so adding a scopes field and keeping a v3 should be fine.


Defining name bindings per scope.

Isn't this already handled by the names field? All of the identifiers can be represented, and even a literal 42 number could have a namesIndex as part of it's mappings segment.

@ldarbi
Copy link

ldarbi commented Jan 20, 2022

Agree that "Keying scopes by generated index VS source index" and "Format of scopes field" can converge into one decision point.

Format of scopes field

There are a few variations discussed here, but the top level decision point seems to be whether or not we key by generated index or by source index. I'd like to offer up a few points in support for the source index approach.

Let's look at this simple example where a theoretical minifier has done some function inlining.

Source File

1    // source file
2    function foo() 
3    {
4        function bar(parameter) 
5        {
6            if (parameter) 
7            {
8                log("YES");
9            } 
10           else
11           {
12               log("NO");
13           }
14       }
15       bar(true);
16       log("\n");
17       bar(false);
18       log("\n");
19   }
20   log("before");
21   foo();
22   log("after");

Generated file

1   log("before");log("YES");log("\n");log("NO");log("\n");log("after");

Key by source

The key by source approach would result in this human-readable representation of scopes

"scopes": [
  ['foo', 2, 1, 19, 1], ['bar', 4, 5, 14, 5]
]

We get one entry per function and each function is represented by 5 indices:

<index into 'names', start line, start column, end line, end column>

In the physical representation all indices will be relative and encoded into VLQ to reduce size.

Key by generated

The key by generated approach would result in this human-readable representation of scopes:

"scopes": "['foo', 15], ['bar', 15], [26], ['bar', 36], [46], [56]"

Again, indices would be relative and encoded into VLQ to reduce size in the physical representation. As mentioned in a previous comment, this could be taken even further to use mappings segment indices.

1. Separation of semantic concerns

The existing mappings field already maps generated location to source location. Keying the new scopes field by generated index would implicitly duplicate that information. Keying scopes by source function, in contrast, provides only new information in the way of declaring source function scopes.

2. Separation of concerns for tooling

A tangible benefit of the point above is that only a subset of tooling will need to populate or modify the scopes field. Only a tool dealing with the original source needs to populate the field; tools that simply transform intermediate stages in a pipeline don't need to handle it.

3. Extensibility with extra per-function info

The key by source approach allows us to store function names for now, but it can be easily extended to store other metadata related to a function in the future. Future plugins or extensions to the spec could index into the scopes array.

4. No redundancy

In the code example above, parts of a function get inlined multiple times into the generated code. With a key by generated approach there will be multiple references to the original function. With the key by source approach there is always one entry per source function, avoiding any redundancy of information.

5. Size

A benefit of the key by generated approach is size as it can potentially result in a smaller scopes field due to the more compact format (although as per the point above, if there is a lot of inlining happening, the one entry per source function approach might win).
However, the scopes field is a very small addition compared to the existing mappings field therefore any size win here is negligible.
As an example we can look at some real life data from pasta-sourcemaps running in production.

  • Original JS file: 200KB
  • mappings field: 48KB
  • new scopes field: 1KB which is 2% of mappings

6. Partial parsing

When keying by source, there is a list of mappings entries for each source file. Once a source location is computed using the mappings field, only the relevant source file's mappings need to be decoded, then searched linearly to find the relevant function name.
In comparison, when keying by generated index, the entire scopes field needs to be decoded and searched to find the corresponding function name.

7. Battle proven

As a final point, we have been using the key by source approach in pasta-sourcemaps in production for over two years now and can confidently say we have not hit any size or performance issues.

@jkrems
Copy link
Contributor

jkrems commented Jan 21, 2022

Is there any compliance tester that would error? Most things just ignore fields that aren't known to the code, so adding a scopes field and keeping a v3 should be fine.

I think a risk with keeping v3 is that intermediate libraries may silently drop the additional field (e.g. https://github.com/mozilla/source-map/blob/58819f09018d56ef84dc41ba9c93f554e0645169/lib/source-map-generator.js#L404-L428) and it may be hard to tell why. A clean failure may be preferable here?

My only somewhat strong opinion re: versioning would be "either keep 3 or go to 4". 3.1 and 4 are effectively equivalent in terms of ecosystem impact and 4 is nice and succinct. Because forwarding additional fields is an explicit feature in many flows, there is effectively no such thing as a "minor" bump to the version.

@concavelenz
Copy link
Contributor

The idea with V3 was that additional fields that don't change meaning of the existing fields would be not increment the version number. There are many things that would happily work with "3" with additional fields, that would have to reject "4". It would be a minor update, but V4 presumably would not be compatible with older browsers (but perhaps they simply ignore the version, as I haven't checked the behavior).

Simply "4" would force tools to make decisions about what they want to do with the new fields, "3" would allow backward compatibility. Personally, I favor backward compatibility unless the meaning of the existing fields actually change. I don't really want tools to have to generate both "3" and "4" versions of the file.

If the meaning of the existing fields do change (other than the version) "4" is of course the only real choice.

@robpaveza
Copy link
Author

I think a risk with keeping v3 is that intermediate libraries may silently drop the additional field

Simply "4" would force tools to make decisions about what they want to do with the new fields, "3" would allow backward compatibility.

Right. I think this is the big dilemma in the versioning discussion, and in general, I tend to agree with the notion that "3.1 and 4 are effectively equivalent". Keeping 3 and adding fields (whilst being non-breaking) "feels JavaScripty," and I can confirm that Chromium doesn't break if we just choose to update to version 4. But I can't speak for other browsers, or even libraries.

@ldarbi :

The key by generated approach would result in this human-readable representation of scopes:

"scopes": "['foo', 15], ['bar', 15], [26], ['bar', 36], [46], [56]"

I'm not sure I understand what this represents. Are we keying into mappings indices or column numbers? I'm slightly thrown off -- I think these are column numbers, but I think perhaps I'm thrown off because you're not representing Global code scope. Can I propose that the rendered versions would be:

Key by source

"scopes": [
  ['Global code', 1, 1, 23, 1], ['foo', 2, 1, 19, 1], ['bar', 4, 5, 14, 5]
]

Key by generated

"scopes": "['Global code', 1], ['bar', 15], ['foo', 26], ['bar', 36], ['foo', 46], ['Global code', 56]"

One thing about the "Key by generated" here is that it would be nice to include information that functions have been inlined. I think I'll keep a note here at the top about the "it'd also be nice to solve these problems" topics, so feel free to note that ongoing.

@jridgewell

Keying based on the generated position would be more consistent, and considerably easier to work with. It can be decoded at the same time as mappings field with only an extra index pointer. Doing this by source location will be more difficult, because the source location can jump around arbitrarily. We'd have to perform binary lookups on the scopes field, it's just more of a pain for no benefit.

I'm not sure why that's such an issue. We already do a binary lookup on the mappings field, and binary searches are relatively well-understood in the industry. As an example, in the Chromium DevTools, I was able to implement this in 40 lines of well-commented, non-compact code. I'm not sure that this is particularly expensive or complex, and given the trade-off that @ldarbi mentioned above (the comparative fewer number of tools which will need to be updated to produce these mappings), this seems like a fair trade.

@jridgewell
Copy link
Member

I'm not sure why that's such an issue. We already do a binary lookup on the mappings field, and binary searches are relatively well-understood in the industry.

I maintain @ampproject/remapping, which performs sourcemap merging for any depth of graph. Doing a single binary search wouldn't be a big issue (we're already doing 1 for each segment and at every level of the graph). If I now have to double that, doing another binary search for scope for every segment at every depth, then it's an issue. The difference with generated columns is that it's almost completely free, I can generate it using a pointer and an increment during decoding.

But, this was before I read #12 (comment) 's "2. Separation of concerns for tooling" section. If we do scopes by source location, then I don't have to remap any of the line/columns (only change the source index). That works fine for me.

@jaro-sevcik
Copy link
Contributor

jaro-sevcik commented Jan 24, 2022

In Chrome DevTools, we see several areas where source maps v3 do not work well (in the order of decreasing importance):

  1. Variable name bindings. In theory, name bindings are handled by the mappings.name field, but there are several cases that cannot be described by that field. For example, module imports are often compiled with substitution (at least by webpack and Typescript) , so there is no location that could describe the name. Similarly, renaming of 'this' cannot be captured by mappings.name because there is no definition of 'this' in the code.

  2. Inlining. If a minifier inlines several functions, we would like to be able to get the full inlined call stack, not just the top or bottom inlined/inlinee frame.

  3. Unminified function names in call stacks. The user can already click on the link at the call stack, and they will be pointed to the right place in the source code, but it would be nice to show the original function name. In some cases, we could perhaps figure out some function names from mappings.name, so we maybe we do not even need to change source maps for this.

  4. Property renaming. Some minifiers mangle property names, we would like to have a way to figure out the original names.

  5. Static types. Specifically, TypeScript users would like to see their enum values mapped back to enumerators. Unfortunately, this is fairly hard to capture in source map without introducing some very heavy machinery for representing types.

Unfortunately, the Source Map v4 proposal only addresses point 3. If we are to change the source map spec, it would be nice to solve the other problems, too. Or at least leave some space for solutions.

Note, that some of the points (1 and partly 2) would be addressed by Sokra's comment above.

I am wondering if we could perhaps design the new source maps so that they are extensible. I am thinking something along the lines of DWARF, where one defines abbreviations to represent their subset of debug attributes in the .debug_abbrev section and then emits debug info using the abbreviations in the .debug_info section (see a short example here). I started exploring the idea applied to source maps in a slide deck here.

@jkrems
Copy link
Contributor

jkrems commented Jan 24, 2022

I am wondering if we could perhaps design the new source maps so that they are extensible. I am thinking something along the lines of DWARF [...]

I think a more aggressive design iteration would be super interesting but it may be easier to discuss in a separate issue, given the poor threading support in Github issues.

I started exploring the idea applied to source maps in a slide deck here.

It looks like the slide deck isn't publicly available?

@ldarbi
Copy link

ldarbi commented Jan 25, 2022

The key by generated approach would result in this human-readable representation of scopes:

"scopes": "['foo', 15], ['bar', 15], [26], ['bar', 36], [46], [56]"

I'm not sure I understand what this represents. Are we keying into mappings indices or column numbers? I'm slightly thrown off -- I think these are column numbers, but I think perhaps I'm thrown off because you're not representing Global code scope.

@robpaveza Yes these are column numbers. I was trying to represent what @sokra was suggesting here using a human readable format with absolute column numbers. You have [column, scopeIndex] every time you enter a new scope and [column] every time you leave a scope - like pushing and popping off a stack. The top level scope doesn't have an entry because it is the default scope - so anything before the first scope is pushed and anything after the last scope is popped is considered the top level scope.

In my example

"scopes": "['foo', 15], ['bar', 15], [26], ['bar', 36], [46], [56]"

we have top-level scope before column 15, from column 15 we start foo, and at the same time we also start the inner scope bar. At column 26 we are back to scope foo, column 36 it's bar again, column 46 it's back to foo and column 56 it's back to top-level. Does this make sense?

For the recommended "Key by Source" approach you are right that we can include the top-level scope too, in fact we do do this in pasta-sourcemaps. I omitted it in my write up here because I wanted to make it easy to compare with the "Key by Generated" approach and it still works because you can still infer that anything that doesn't belong to a scope is top-level.

@jaro-sevcik
Copy link
Contributor

Perhaps it is also worth noting that Dart compiled to JavaScript has been using "battle-proven" source map extensions for more than three years. A somewhat outdated documentation is here (the latest version uses VLQ encodings). Compared to pasta-sourcemaps, the Dart extensions also describe inlining - they are keyed by generated positions. The Dart source maps also describe property and global renames, but I am not sure if those extensions are flexible enough. Since the renames are global, they cannot describe renames that happen only in some parts of the code (property renames are tricky to describe for other reasons, too).

@jkrems
Copy link
Contributor

jkrems commented Feb 1, 2022

ICYMI, it looks like there's also some prior art from back in 2015 in #4 (rendered). From a high-level skim, it seems to share some of the ideas from the DWARF-inspired proposal in @jaro-sevcik's slides.

@ldarbi
Copy link

ldarbi commented Feb 2, 2022

FYI there is a call today at 5pm UTC happening here https://meet.google.com/iqb-pdpn-evh to continue the discussion of this topic.

Meeting notes here (includes notes from the call two weeks ago): https://docs.google.com/document/d/1RlnAnMa4QzQUK_tNOHdWfnske2X9KZ_e90VBG-DWqUo/edit

@jkrems jkrems added the v4 label Feb 2, 2022
@jkrems
Copy link
Contributor

jkrems commented Feb 2, 2022

I labeled this as "v4" because it is the commonly used short-hand for "next iteration". As discussed earlier, we may very well keep the version "3" in the actual JSON format and just amend the 3rd revision. But it's a nice and short label as a starting point.

@robpaveza
Copy link
Author

Sorry for my delay getting back here -- I was just starting to get sick when we last met, and then I was out of it for a fairly long time.

@jaro-sevcik I'm not certain that the direction that you mentioned with respect to the DWARF-style encoding makes sense. I've given that a lot of thought offline, but the primary concern that I have is that it's a relatively radical departure from our previous style of encoding. While neither are human-readable by default, it introduces another variable. Because any additional "feature" of Source Maps is going to need to be coded in both the producers and consumers regardless, we need to:

  • Establish how unrecognized fields should be propagated along the tool pipeline, and
  • Determine whether the feature is encoded as source or generated position (so that tools can map appropriately)

Consider the following sample TypeScript. This sample is deliberately contrived to be weird to make an example about source locations and scopes:

1   class Example {
2     constructor() {
3       const sample: number[] = [];
4       let evens = 0;
5       for (let i = 0; i < 1000; i++) {
6         sample.push(i);
7         if (i % 2 === 0) {
8           let i = evens;
9           evens = i + 1;
10        }
11      }
12      this.sample = sample;
13      this.evens = evens;
14    }
15  }

I would propose that the initial compiler is responsible for identifying a range of possible source locations which are interesting. This is presently identified as scopes in the proposal, but perhaps we ought to simplify the proposal such that scopes only maps source ranges which are interesting. Here, the compiler might emit start line/column and end line/column tuples, along with source index (mapping to sourcesContent, sources), and that's it. Fields that need to resolve data about the source code rather than the generated code can be indexed against the scopes field.

Then, we can provide that source-level information via additional fields, and the additional fields can be optionally included as needed. So if we wanted to introduce the example of sourceNames, rather than this only being a list of strings, perhaps it's a tuple of scope index + names index. Or, if we want to provide support for aliasing. If the above code is compiled with --target=es3, TypeScript will emulate block scope for let and const variables. I don't believe the TypeScript compiler presently supports emitting data about these aliases, but I can imagine that we might do something specific here.

I have the next week and a half dedicated to focus in this space, and will continue to have time to focus even afterwards. I'm planning to do some work around automated testing, perhaps creating an automatic test tool (although of course subject to the spec discussions we're all working through).

@robpaveza
Copy link
Author

Oh, I forgot that I called out this specific point:

Establish how unrecognized fields should be propagated along the tool pipeline

My suggestion here is that top-level fields in the source map should just be passed-through, unless we're synthesizing multiple files (such as in the case of bundlers), in which case they should be converted to arrays of whatever they were in the original maps. This is probably a break with respect to some set of tools that exist today, and it is logic that will need to be added to the parsers; but it provides the most direct and forward-looking answer we can reasonably provide.

@robpaveza
Copy link
Author

I'd like to propose that we move to #16 to try to come to a consensus about how we make decisions around this. Then we can break out individual conversations about features, implementation, etc.

@jridgewell
Copy link
Member

I recently became aware of Node's stack trace unminification, and how it fails to determine the unminified name of the enclosing function which throws the error. See jestjs/jest#12786 (comment) for full context.

When adding scopes and scope names, we should take care that the scope begins at the first keyword for functions. Eg, for function foo() {}, the scope begins at the function keyword. For async function* foo() {}, it begins at async. For (x) => x, it begins at (, and for x => x, it begins at the first x. This could then allow node to determine the proper name of the enclosing function without having to inspect the sourcesContent to determine where the naming Identifier is in the original code.

@robpaveza
Copy link
Author

I've closed #16. We landed on just the three principles:

  • Requires the minimum number of tools to update. The JavaScript ecosystem is tremendously huge, and potential customers might encounter friction if they need to do a major-version bump of a build tool. If possible, we should keep the update scoped in such a way that customers could leverage plugins (or something similar) to take advantage of the maximum amount of functionality.
  • Supports future evolution without making huge, industry-moving changes. It would be nice if we didn't have to coordinate everybody in the ecosystem at once to make incremental improvements. The change should be opinionated enough to be able to allow future changes where it's easy enough for, at least, additional fields can be added and progressed with build system plugins
  • Size footprint: Source maps are commonly downloaded over the network, and features should be balanced against bandwidth costs

However, we're still discussing the ordered priority here. @jkrems proposes that size footprint should raise higher, because increased size results in increased resources utilization in build tools.

My only observation is, obviously, more features will lead to higher resource utilization. That having been said, when I made my sample draft PR for TypeScript, I don't believe it used substantially more resources when we weren't opting into the v4 behavior. Put separately, the user can opt into more features, therefore more resource utilization.

@jaro-sevcik
Copy link
Contributor

@robpaveza As for the DWARF encoding, I agree that it is a departure from the current source map encoding style and perhaps not very consistent with the rest of source map spec. However, let me note that the typed nature of the DWARF abbreviation encodings would allow describing the meaning of unknown fields because the fields are typed. We could distinguish the generated versus code positions in the type. If the tools understood the types, they could translate those fields accordingly (even though it still gets tricky with things like inlining, where the mapping is not 1:1).

That said, if you are planning to introduce the source level scope descriptions and some info on their names (and convince the key players in the ecosystem to adopt this), we would happily consume that in Chrome DevTools to symbolize stack traces. It is unclear to me how the tools would understand how to update the indexes in the <scope-id, name-id> tuples when joining source files together without knowing the semantics of the tuples, but that is more of a questions for bundler developers.

@mitsuhiko
Copy link
Contributor

We (@getsentry) have a lot of interest in scope information. As of recently we have started parsing minified source code to reconstruct scope information as the previous heuristics to reconstruct function names just don't work well. (We wrote about this here). The technical implementation for this can be found here: https://github.com/getsentry/js-source-scopes/

Obviously this is a pretty crude way of resolving this issue and it has a lot of restrictions still. In particular we are using the minified sources which do not have the right source information in all cases.

From our perspective longer term source maps probably should be replaced. Particularly for us operating with JSON files at scale has always been quite frustrating and the standard leaves a lot to be desired. Facebook has actually extended the source map format with the Metro bundler for react-native to include some scope information (x_facebook_sources) so there is some prior art about the scope information: https://github.com/getsentry/rust-sourcemap/blob/5187edf627d70ec1198e737bb14eecfe95fab7af/src/hermes.rs#L135

I'm super happy to see that something is moving about scope information but since we probably already need to touch a lot of tooling in the process, I wonder if it's not time to start seriously considering actually using DWARF for this instead.

nicolo-ribaudo pushed a commit to nicolo-ribaudo/source-map that referenced this issue Mar 13, 2024
Specify precedence of header over inline annotation for source map url
@jkup
Copy link
Collaborator

jkup commented Jun 24, 2024

I believe this is subsumed by the current scopes proposal. If there is anything missing, we should add it as a follow up to the main proposal.

@jkup jkup closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants