Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Maps v4 Proposal #538

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 245 additions & 0 deletions DevTools/SourceMaps/explainer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# Source Maps v3.1 Revision

Authors: [Rob Paveza](https://github.com/robpaveza)

## Status of this Document
This document is a starting point for engaging the community in developing collaborative solutions. Source maps are not generally managed or standardized within the W3C or a similar governing body beyond what is supported within the browser developer tools. This document is not intended to block something in that direction, but rather to engage the community with options and discussion about the nature of developer tools.
* This document status: **Active**
* Expected venue: Ad-hoc discussions, to be expanded if there is interest in the community
* **Current version: this document**

### Revising the standard

We want to recognize the previous standard and the contributions the former authors have made as part of this implementation. We are taking this avenue of publishing in the Microsoft Edge Explainers repository because the former standard has not been updated in 8 years, and the Mozilla source maps discussion list is not even available in archives.

## Introduction

The [previous version of the Source Maps specification](https://sourcemaps.info/spec.html) is reasonably thorough in supporting live debugging. However, one shortcoming that the specification lacks is that it isn't obvious how to decode the names of functions on the call stack. Several implementations (including Node.js and a library [stacktrace-gps](https://github.com/stacktracejs/stacktrace-gps)) use name guessing, and [pasta-sourcemaps](https://github.com/bloomberg/pasta-sourcemaps) uses precompilation and a custom extension to reconstruct a stack trace.

## Goals

* Improve post-hoc debugging capabilities
* Preserve compatibility with existing tooling

## Non-Goals

Nothing specific at this time.

## Use Cases

*The following was taken from [an issue opened on the TypeScript repo](https://github.com/microsoft/TypeScript/issues/46695):*

Suppose I receive the following information from my application:

```
TypeError: Cannot read properties of undefined (reading 'x')
at re.draw (bundle.js:1:16372)
at re.drawLayer (bundle.js:1:14170)
at be.draw (bundle.js:1:74592)
at Te.render (bundle.js:1:114230)
at Te.reflowCanvas (bundle.js:1:113897)
at HTMLDocument.anonymous (bundle.js:1:135849)
```

The source map will resolve the first stack frame to `draw(src, sprite, x, y, frame)`, and will correctly point out that the failure here is that the undefined value is actually `frame`. However, *that is not the name of the function*. The function's name is `draw`, which is a member of the `Render` class.

If I simply apply the value of the `names` array to the function, decoding the stack trace is not particularly useful. It would look something like this:

```
TypeError: Cannot read properties of undefined (reading 'x')
at frame (file1.ts:302:7)
at draw (file1.ts:144:5)
at Render (file2.ts:95:5)
at drawArray (file3.ts:178:5)
at render (file3.ts:155:5)
at game (file4.ts:39:5)
```

Using the source map to navigate the source code by hand, I can reconstruct the original call stack:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using the source map to navigate the source code by hand, I can reconstruct the original call stack:
The source line information here is improved and allows me to more easily find the source files and lines, but the function names are modified and now incorrect. Using the source map to navigate the source code by hand, I can reconstruct the original call stack including the class full names and anonymous functions:

I had to jump back and forth between these three stacks a few times to fully grock what was shifting (both good and bad) between them, so trying to add a touch more detail explaining the differences.


```
TypeError: Cannot read properties of undefined (reading 'x')
at Render#draw (file1.ts:302:7)
at Render#drawLayer (file1.ts:144:5)
at GameObject#draw (file2.ts:95:5)
at Game#render (file3.ts:178:5)
at Game#reflowCanvas (file3.ts:155:5)
at [anonymous function passed to addEventListener] (file4.ts:39:5)
```

But doing this required that I dump the mappings and the source files from the source map and manually inspect the source files. In general, this requires that I use a library to parse the `mappings` field from Source Maps (because [as of Source Maps v3](https://sourcemaps.info/spec.html), this field is stored in a stateful way).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what is meant by "stateful", but this wasn't immediately obvious to me when first reading.

Suggested change
But doing this required that I dump the mappings and the source files from the source map and manually inspect the source files. In general, this requires that I use a library to parse the `mappings` field from Source Maps (because [as of Source Maps v3](https://sourcemaps.info/spec.html), this field is stored in a stateful way).
But doing this required that I dump the mappings and the source files from the source map and manually inspect the source files. In general, this requires that I use a library to parse the `mappings` field from Source Maps (because [as of Source Maps v3](https://sourcemaps.info/spec.html), this field is encoded as [VLQ](https://en.wikipedia.org/wiki/Variable-length_quantity)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just that it's VLQ-encoded, as VLQs can be stateless. It's that many of the mappings in V3 are relative to the previous line entry. The comma- and semicolon-separation also are required. The new proposed scopes VLQs are a bit less stateful, but are still stateful from VLQ to VLQ entry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
But doing this required that I dump the mappings and the source files from the source map and manually inspect the source files. In general, this requires that I use a library to parse the `mappings` field from Source Maps (because [as of Source Maps v3](https://sourcemaps.info/spec.html), this field is stored in a stateful way).
But doing this required that I dump the mappings and the source files from the source map and manually inspect the source files. In general, this requires that I use a library to parse the `mappings` field from Source Maps (because [as of Source Maps v3](https://sourcemaps.info/spec.html), the position offsets of each mapping are relative to the one before).


## Proposed Solution

We want to adopt the general approach taken by the `pasta-sourcemaps` ("Pretty (and) Accurate Stack Trace Analysis") library. However, instead of adding to the `names` field, we will add two fields.

* Add two additional field to the source map: `scopes` and `scopeNames`
* The `scopes` field should be a list of ranges and pointers into `scopeNames`
* The ranges in the field should always point to **Original Source** locations.
Comment on lines +76 to +78

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untangling these from names is probably cleaner. Might be be worth speccing here the expectation of the sourcemap "apply" operation needed to combine sourcemaps? (considering that a sourcemap combine stage may be combining N earlier maps with 1 new map in the case of a transformation like inlining).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was thinking about this, too. Since the original change, I pivoted that there's only one scopes field -- not an array of scopes fields -- but the bundler will need to process these either way. I'm working on that with webpack/webpack#14996.


### Practical example

Suppose we have the following TypeScript source:

```ts
// example.ts
1 function foo(bar: number): never {
2 throw new Error('Intentional.');
3 }
4 foo();
```

Direct TypeScript compilation would emit something fairly similar:

```js
// example.js
1 function foo(bar) {
2 throw new Error('Intentional.');
3 }
4 foo();
```

But running it through a minifier would probably transform it to an IIFE lambda:

```js
// example.min.js
1 (()=>{throw new Error('Intentional.')})();
```

Examining the call stack of this looks something like:

```
Error: Intentional.
at <anonymous>:1:13
at <anonymous>:1:40
```

We can get to the exact *locations* with source maps today, but assuming that the `mapping`'s 5th field ("the zero-based index into the `names` list associated with the segment) is encoded as it typically is, which is meant for live debugging, the decoded stack would actually look like this:

```
Error: Intentional
at throw (example.js:2:5)
at foo (example.js:4:1)
```

What we want to get to from here would be, at minimum:

```
Error: Intentional.
at foo (example.js:2:5)
at Global code (example.js:4:1)
```

### Improving the contents of the map

Let us re-examine the original code in this example:

```ts
1 function foo(bar: number): never {
2 throw new Error('Intentional.');
3 }
4 foo();
```

The scopes list should be as follows:

1. Line 1 Col 1 -> Line 1 Col 34 (captures `function foo(bar: number): never ` including the trailing space): `Global code`
1. Line 1 Col 35 -> Line 3 Col 1 (captures the entire body of `foo`): `foo`
1. Line 3 Col 2 -> Line 4 Col 7 (captures the newline + invocation of `foo`): `Global code`
Comment on lines +146 to +148

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrasting with pasta-sourcemaps:

Pasta uses "nested" scopes. Referring to ^ as "listed" scopes for this thread, equivalent would be:

  1. Line 1 Col 1 -> Line 4 Col 7 (captures until end of invocation of foo): <top-level>
  2. Line 1 Col 35 -> Line 3 Col 1 (captures the entire body of foo): foo

The spec doesn't seem to dwell on this, but some observations:

  • "nested" is smaller
  • Can't recursively bisect the "nested" scopes to find position
    • VLQ doesn't permit this anyway, must be linearly unpacked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what "nested" scopes are here, but regarding the second comment: the approach to bisecting the scopes:

  • Parse all of them
  • Sort by start line, then start column
  • Now you can binary search, you should look for the last entry that starts before and ends after the mapping position

Doesn't that meet your requirements?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-post from webpack/webpack#14996 (comment):

// *nested*                            // *linear*
| "<top-level>"                        | "<top-level>"
|                                      |
|  |  "causeError"         vs            |  "causeError"
|  |                                     |
|                                      | "<top-level>"

"Nested" scopes have one "entry" per scope in the source code.
"Linear" scopes have an entry each time you transition from one scope to another, left to right, top to bottom.

Nested will tend to result in fewer mappings, and a data structure that more semantically corresponds to the original source

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, I understand! Actually, the way I implemented it in TypeScript was the "nested" version, and while I think I was meaning for this to indicate the "logical" view of the scopes list, but you're right -- I did put this into the expected list of these in the spec. I'll get that fixed.


By de-duplicating the names, that yields the expected `scopeNames` value of `['Global code', 'foo']`.

Finally, `scopes` will need to be an array of strings, and this array should have the same number of entries as the `sources` field. Each string will be a comma-delimited list of Base64-encoded VLQs with the following set of entries (please note, all offsets are 0-based, in concurrence with the previous Source Maps spec):

1. A value representing starting line, relative to the previous starting line value. If this is the first value, it is an absolute value.
2. A value representing the absolute starting column number.
3. A value representing the ending line, **relative to the starting line** value.
4. A value representing the ending starting column number.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is a typo, and should be one of:

  1. A value representing the ending column number, **relative to the starting column number** value.
  2. A value representing the absolute ending column number.

Or, the approach currently taken by pasta-sourcemaps:

  1. A value representing the ending column, relative to the previous mapping's ending column number

pasta-sourcemaps encoding spec

Some code samples and how these affect the ending column number values (for each we list the values for each of the above strategies on PackageJson, parseFooBar and reasonablyLongFunctionName, respectively):

Sample Code 1 (Prettier default)

class PackageJson {
  parseFooBar() {
    // foo
  }

  static reasonablyLongFunctionName() {
    // bar
  }
}
  1. -18, -14, -36
  2. 0, 2, 2
  3. 0, 2, 0

Sample Code 2

class PackageJson
{
  parseFooBar()
  {
    // foo
  }

  static reasonablyLongFunctionName()
  {
    // bar
  }
}
  1. 0, 0, 0
  2. 0, 2, 2
  3. 0, 2, 0

With Base64 VLQ mappings, smaller is better. Numbers [-15, 15] encode as a single Base64 character.

  1. Suffers from large starting/ending column numbers for Prettier-style code, bounded only by developer apetite for long function names.
  2. Performs reasonable, will be multiples of "tab-width", deep nesting will dip outside [-15, 15]
  3. Closing braces will tend to jump by positive/negative "tab-width", usually 2, 4 or 8, so typically always falls in [-15, 15] range

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be similar considerations for the other fields too, I just opted to go a bit deeper on this one in particular

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative ending column may be preferable here to absolute. Now that I have the TypeScript compiler change, I'm trying to get it to compile itself to see how the performance characteristics change, and we can use iteration here to check which produces smaller values. My supposition, however, is that most often, the absolute values of ending column will generally be < the absolute values of the starting column, because that's usually how we write JS:

function doSomething() { // <-- starting column
 // ...
} // <-- ending column

That's less true for transforming already-minified files, e.g., react-dom.production.min.js.

I'll do some experiments with the TypeScript compiler and report back before we call this "completed"

5. An index into the `scopeNames` array.

This enhancement chooses to use a combination of relative and absolute numbers because of the nature of the values in question. For source line offsets, using relative encoding makes sense, because the numbers are always expected to be increasing. For column numbers, however, it is highly likely that, because these values represent locations in the *source* file, offsets from starting- to ending-location would be negative, and further, that offsets from an outer-scoped function to an inner-scoped function would be negative. Given that one might expect most source lines to be less than 120 characters in length, there does not appear to be a major need to encode the column as relative.

Given the above list, we should then encode the above values:

```json
{
"sources": ["example.js"],
"scopeNames": ["Global code", "foo"],
"scopes": [
"[0,0,0,33,0],[0,34,2,0,1],[2,1,1,6,0]"
]
}
```

Or, when encoded with VLQ, as:

```json
{
"sources": ["example.js"],
"scopeNames": ["Global code", "foo"],
"scopes": [
"AAAiCA,AkCEAC,ECCMA"
]
}
```

### Specific naming recommendations

The following recommendations are **recommendations only** and not part of the "standard". They are, rather, a best attempt to identify how common examples can produce high-fidelity diagnostics experiences.

* **Class names included**: Class member functions should indicate that they are members of classes. Although a common JavaScript convention has been to use `#` as a shorthand for `Class.prototype.member`, the inclusion of `#` as valid syntax for `private` field shorthand suggests that this should not be used. This spec recommends that the expansion `ClassName.functionName` should be used for instance functions, and `static ClassName.functionName` for static functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrasting this with pasta-sourcemaps (from here):

method: ClassName.prototype.methodName
getter: ClassName.prototype.get getterName (similar for setter)
static method: ClassName.staticMethodName
static getter: ClassName.get staticGetterName (similar for setter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the first pass of the TS compiler, I rendered these as:

method: ClassName.methodName
getter: ClassName.get getterName
static method: static ClassName.staticMethodName
static getter: Not sure that I handled this specific one, but it would have been rendered static ClassName.get getterName

I'm open to changing though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not including a prototype in the instance methods will lead to confusion with static methods, because people call the static methods as Foo.bar() matching the ClassName.methodName pattern.

* **Anonymous callback name inference**: Callback functions are often anonymous, but the name of the function being invoked is typically interesting. In the following example, an interesting function name might be something along the lines of `anonymous function passed to map`:

```ts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style Nit: Indenting the code blocks with 4 spaces will bring them in line with the <li>'s padding.

return myArray.map((item, index) => {
/// ...
});
```

* **Computed names expressions**: Computed names also present an interesting problem of data that are not available at build time. As best as possible, the original source text should be simply preserved:

```ts
class Example {
static [Symbol.species]() { return Object; }
['to' + 'String']() {
return 'Hello, world.';
}
}
```

* In the above example, we would want to see `static Example.[Symbol.species]` and `Example.['to' + 'String']` in the `scopeNames` field of the source map.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrasting with pasta-sourcemaps (from here):

Looks like it yields

  • <computed: identifier>
  • <computed: stringLiteral>
  • <computed: numericLiteral>
  • <computed> (else this)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Related Discussions
This proposal has a handful of discussions across related GitHub repos:
- [TypeScript Issue](https://github.com/microsoft/TypeScript/issues/46695)
- [TypeScript Prototype PR](https://github.com/microsoft/TypeScript/pull/47152)
- [WebPack Proposal](https://github.com/webpack/webpack/issues/14996)

There's a few related discussions that are in-depth enough to ensure readers of this explainer are aware of.

## Privacy and Security Considerations

### Privacy

There should be no privacy considerations with this change.

### Security

There should be no security concerns with this change.

## Alternative Solutions

The original change proposed with the issue was to enhance the `mappings` array:

> Substantively: Only the `mappings` field would be altered, and would be altered by adding the 6th field. The complete section is included below:
>
> > The “mappings” data is broken down as follows:
> >
> > - each group representing a line in the generated file is separated by a ”;”
> > - each segment is separated by a “,”
> > - each segment is made up of 1,4<s> or 5</s>**, 5, or 6** variable length fields.
> >
> > The fields in each segment are:
> >
> > 1. The zero-based starting column of the line in the generated code that the segment represents. If this is the first field of the first segment, or the first segment following a new generated line (“;”), then this field holds the whole base 64 VLQ. Otherwise, this field contains a base 64 VLQ that is relative to the previous occurrence of this field. Note that this is different than the fields below because the previous value is reset after every generated line.
> > 2. If present, an zero-based index into the “sources” list. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.
> > 3. If present, the zero-based starting line in the original source represented. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field.
> > 4. If present, the zero-based starting column of the line in the source represented. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field.
> > 5. If present, the zero-based index into the “names” list associated with this segment. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.
> > 6. **If present, the zero-based index into the "names" list associated with the call stack of this segment. This field is a base 64 VLQ relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented.**
>
> In addition, the `version` field of the spec should be bumped to 4.

However, this is likely to require modifications across a tool chain. Supposing we had a tool chain that required, e.g., `typescript`, `webpack`, and `terser`, all of these tools would need to be modified to be aware of the 6th quantity in the VLQ of each mapping, *even if it was just to pass them through* (which is likely the case for this particular use case).