Encode scopes and variables in source map #2

fitzgen · 2015-02-18T04:08:13Z

Source maps should be able to encode the original source language's environment, scopes, and variables. They should be encoded in such a way that if a debugger is paused at a given generated JS location, it can restore and display the original source language's variables and parent scopes that are in scope at the paused location.

fitzgen · 2015-02-18T22:55:23Z

Here are some cases that I think the source map's environment rematerialization
should support:

When there are scopes in JS that do not correspond to any scopes in the
original source language. For example, the compiler emitted an
immediately-invoked-function-expression as an implementation detail that
doesn't reflect any nested function or scope in the original source language.
When there are scopes in the original source language that do not correspond
to any scope in the generated JS code. For example, one could imagine an
ES6 to ES3 compiler transforming this ES6 code:
```
  {
    let x = 1;
    console.log(x);

    {
      let x = 2;
      console.log(x);
    }

    console.log(x);
  }
```
Into this ES3 code:
```
  {
    var x1 = 1;
    console.log(x1);
    var x2 = 2;
    console.log(x2);
    console.log(x1);
  }
```
Note the nested block scope in the ES6 source that does not exist in the ES3
source. We should be able to recreate this scope.
We should support simple variable renaming.

For example, a Scheme-to-JS compiler might emit variables with ! replaced as
_bang: set-cdr! becomes set_cdr_bang.

Another example: an ES6 to ES3 compiler might bind the this outside of an
arrow function to a variable and close over it:
```
  var myArrow = () => this.x;

          |
          V

  var _this = this;
  var myArrow = function () { return _this.x; };
```
The rematerialized scope inside the myArrow function should have a this
binding that points to the _this variable.
We should support hiding bindings in the generated JS code that do not
correspond to any bindings in the original source language. These might be
gensyms, or temporary variables, or any implementation detail of the
compiler's emitted JS code.

We should support rematerializing bindings in the original source language
that do not have any corresponding bindings in the generated JS code.

First example (Python to JS):

  result = [x + 1 for x in list]

          |
          V

  var result = [];
  for (var i = 0; i < list.length; i++) {
    // Note: no `x` binding in this generated JS code.
    result.push(list[i] + 1);
  }

Second example (C++ ish to JS, psuedo-code for brevity):

  class Point {
    int x;
    int y;

    // `Point lhs += Point rhs` == `lhs.x += rhs.x; lhs.y += rhs.y;`
  }

  void moveDiagonal(Point &a) {
    Point offset(1, 1);
    a += offset;
  }

          |
          V

  function moveDiagonal(a) {
    // Note: `offset` not only doesn't exist as a binding in this generated JS
    // code, its members have been exploded and inlined!
    a.x += 1;
    a.y += 1;
  }

andysterland · 2015-02-19T06:04:07Z

Awesome, that's pretty damn comprehensive and covers the vast majority of the use cases developer will see.

Few questions:

Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design.
2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.

fitzgen · 2015-02-19T18:17:39Z

Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design.

Types are interesting, especially when the generated JS representation of multiple source-level types are the same. An example where this is true is with emscripten's pointers and integers. It seems to me that the only way a pretty printer could do the right thing in this case is if it had the source-level type information. So yes, I agree it is very important.

There's lots of information we could (and I hope to) encode in source maps about variables:

Source-level type (should be optional, since not every compile-to-js language is statically typed)
Declaration location
Whether it is a formal parameter vs. constant vs. local definition

However, trying to bite off everything at once is less than ideal. We'd risk either getting stuck trying to over-engineer the perfect format, or we would ship the wrong things and have no good story for fixing it in future iterations.

I'd prefer if we could figure out the (a) bare minimum set of data points needed to recreate the source environment, and (b) how to ensure that we can extend the format in the future to add the bells and whistles.

My hope is that we can initially add pretty printing functions that take only the value, and independently add environments without optional source-level type information. After we've agreed upon those things, we can add optional type information to the environment, and pass that as a second parameter to the pretty printing functions. In this way, we can continually and incrementally ship improvements to the format without trapping ourselves in a dead end by making future improvements impossible.

fitzgen · 2015-02-19T18:24:17Z

2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.

I have much less of an understanding of how compilers targeting CSS use source maps than I do of compilers targeting JS and JS debuggers consuming source maps.

My understanding is that they work pretty alright, and there were much less deficiencies than with the to-JS case. It would be great if someone who understands this subdomain really well stepped up and took responsibility for ensuring that we provide for the to-CSS needs as well.

fitzgen · 2015-06-19T15:40:12Z

I wrote a little bit about how DWARF solves this problem: http://fitzgeraldnick.com/weblog/62/

swannodette · 2015-06-24T15:49:56Z

How does this proposal address or complicate the issue of transitivity? With source maps it's currently trivial to merge transformations between distinct and unrelated JavaScript compilation technology. Once you start encoding scopes it seems to me transitivity becomes increasingly more difficult to preserve. I could be wrong about that and happy to hear that there is prior art or that this is fundamentally a non-issue.

fitzgen · 2015-06-30T22:31:14Z

How does this proposal address or complicate the issue of transitivity? With
source maps it's currently trivial to merge transformations between distinct
and unrelated JavaScript compilation technology. Once you start encoding
scopes it seems to me transitivity becomes increasingly more difficult to
preserve. I could be wrong about that and happy to hear that there is prior
art or that this is fundamentally a non-issue.

The good news is that adding these scopes and bindings doesn't make it more
difficult to compose a source map's location mappings, which as you point out
many tools do now. A tool could easily ignore the environment information and
nothing would be any different from the situation now. In general, it is a goal
of future extensions to be 100% backwards compatible so that existing tooling
doesn't break; that tooling just won't take advantage of the shiny new features
enabled by such extensions.

If some tool in the pipeline does not modify the environment in any way, then
all it need do is apply the same transformation of locations that it does to
each mapping to the start and end bounds of each scope.

As far as prior art goes, unfortunately I'm not aware of anything directly
related, nor are the ex-gdb folks I asked. Traditional compilers don't really
have this issue because there isn't usually any post-processing (such as
minification) of the resulting executables. Either libraries are statically
compiled into the binary, in which case the compiler generates debug info along
with the main program's debug info, or they are dynamically loaded, in which
case they already have their own separate debug info.

The next closest tools are things like Valgrind and DTrace which instrument the
executable with additional probes after the fact. Valgrind instruments the
binary to jump to its own JIT'd code which records its traces and then jumps
back to the normal program. If you want to debug with gdb while using
valgrind, it actually implements its own gdb server and internally translates
whatever shifted offsets happened because of the instrumentation. On the other
hand,
DTrace bends over backwards to avoid shifting offsets whatsoever.
Neither approach seems too relevant to our discussion.

Yacc emits #line pragmas, which is fairly similar to composing source map
location information, but punts on scopes/bindings. dwz is a commandline tool
to compress DWARF debugging info, but doesn't actually modify the executable.

I'd be interested if you know of any bytecode instrumenters in JVM-land that
both modify the environment and maintain source-level debugging of the
environment. That certainly seems relevant, but I am ignorant of JVM bytecode
instrumentation.

The minifier, or any other tool that takes JS for further processing and changes
the environment, is the only thing that understands the changes it
makes. Therefore, it would have to propogate that information via the source
map, by doing some translation for each scope and binding. I've sketched out an
algorithm below:

1) For each scope S:
  1.1) For each binding B in S:
    1.1.1) Parse the JS snippet for locating B's value
    1.1.2) Walk the resulting AST and create a map M mapping from old JS
           bindings that snippet relies on to their new, renamed binding
    1.1.3) Generate a new JS snippet that first defines `var <old> = <new>;`
           for each of the entries in M
    1.1.4) Append the original JS snipet for locating B's value to that
           snippet
    1.1.5) Use that new JS snippet for locating B's value in the new
           source map
  1.2) Adjust the start and end bounds of S the same way location mappings
       are adjusted during source map composition now

Note again that if the tool does not modify the environment, then it can skip
step 1.1 and only do 1.2 for each scope. When this is the case, composing source
maps is not harder than without scopes and binding information.

When the tool does modify the environment (eg, a minifier shortening variable
names), while I wouldn't say this process is super straightforward, it is far
from impossible. Furthermore, I don't see a way to encode any environment
information without having some kind of process like this when composing source
maps and maintaining the ability to rematerialize the source-level
environment. And that's regardless of how the environment data is encoded:
whether it be purely data, a 100% JS reflection API, or some custom opcode
language like what DWARF has.

At the end of the day, I think the benefits outweight the drawbacks, especially
because things can only get better, and not worse, if we maintain backwards
compatibility.

littledan · 2018-03-05T15:33:42Z

It's great to see this investigation. The lack of encoding of variables and scopes was cited in the draft WebAssembly/source-map integration as a downside of source-maps, and it seems like there is frequent discussion of this feature in the mailing list (e.g., @rbuckton's post).

concavelenz · 2018-03-05T16:44:59Z

I've always thought that it might one day be replaced with something with a different structure more inline with a traditional debug format (a binary format) but something would need to be proposed.

…

On Mon, Mar 5, 2018 at 7:34 AM, Daniel Ehrenberg ***@***.***> wrote: It's great to see this investigation. The lack of encoding of variables and scopes was cited <https://github.com/WebAssembly/design/pull/1051/files#diff-8e85308ab5cc1e83e91ef59233648be2R338> in the draft WebAssembly/source-map integration as a downside of source-maps, and it seems like there is frequent discussion of this feature in the mailing list (e.g., @rbuckton <https://github.com/rbuckton>'s post <https://groups.google.com/forum/#!topic/mozilla.dev.js-sourcemap/NVuynvaFQDY> ). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABMDKvkvc5D7itStxCg8HS3z-GFmcJ8Yks5tbVrXgaJpZM4Dh5o4> .

littledan · 2018-03-05T21:22:52Z

@concavelenz Now that we have the motivation from both @dschuff's WebAssembly integration and the continued widespread use of minimizers and transpilers, should we start this effort to create this new structure?

* Configure github actions * Fix missing source * Explicitly configure toolchain * Newer ubuntu runner

jkup · 2024-06-24T08:58:58Z

I believe this is subsumed by the current scopes proposal. If there is anything missing, we should add it as a follow up to the main proposal.

fitzgen mentioned this issue Feb 19, 2015

Pretty print values #1

Open

fitzgen mentioned this issue Jul 22, 2015

Proposal for encoding source-level environment information #4

Merged

erights mentioned this issue Oct 20, 2017

Evaluating source language expressions #3

Open

hbenl mentioned this issue Apr 24, 2023

Scopes and variable shadowing #37

Open

littledan added the Workstream: Naming label Apr 25, 2023

nicolo-ribaudo pushed a commit to nicolo-ribaudo/source-map that referenced this issue Mar 13, 2024

Configure github actions (tc39#2)

7eb4861

* Configure github actions * Fix missing source * Explicitly configure toolchain * Newer ubuntu runner

jkup added the Proposal: Scopes label Mar 13, 2024

jkup closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode scopes and variables in source map #2

Encode scopes and variables in source map #2

fitzgen commented Feb 18, 2015

fitzgen commented Feb 18, 2015

andysterland commented Feb 19, 2015

fitzgen commented Feb 19, 2015

fitzgen commented Feb 19, 2015

fitzgen commented Jun 19, 2015

swannodette commented Jun 24, 2015

fitzgen commented Jun 30, 2015

littledan commented Mar 5, 2018

concavelenz commented Mar 5, 2018 via email

littledan commented Mar 5, 2018

jkup commented Jun 24, 2024

Encode scopes and variables in source map #2

Encode scopes and variables in source map #2

Comments

fitzgen commented Feb 18, 2015

fitzgen commented Feb 18, 2015

andysterland commented Feb 19, 2015

fitzgen commented Feb 19, 2015

fitzgen commented Feb 19, 2015

fitzgen commented Jun 19, 2015

swannodette commented Jun 24, 2015

fitzgen commented Jun 30, 2015

littledan commented Mar 5, 2018

concavelenz commented Mar 5, 2018 via email

littledan commented Mar 5, 2018

jkup commented Jun 24, 2024