Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode scopes and variables in source map #2

Closed
fitzgen opened this issue Feb 18, 2015 · 11 comments
Closed

Encode scopes and variables in source map #2

fitzgen opened this issue Feb 18, 2015 · 11 comments

Comments

@fitzgen
Copy link
Contributor

fitzgen commented Feb 18, 2015

Source maps should be able to encode the original source language's environment, scopes, and variables. They should be encoded in such a way that if a debugger is paused at a given generated JS location, it can restore and display the original source language's variables and parent scopes that are in scope at the paused location.

@fitzgen
Copy link
Contributor Author

fitzgen commented Feb 18, 2015

Here are some cases that I think the source map's environment rematerialization
should support:

  • When there are scopes in JS that do not correspond to any scopes in the
    original source language. For example, the compiler emitted an
    immediately-invoked-function-expression as an implementation detail that
    doesn't reflect any nested function or scope in the original source language.

  • When there are scopes in the original source language that do not correspond
    to any scope in the generated JS code. For example, one could imagine an
    ES6 to ES3 compiler transforming this ES6 code:

      {
        let x = 1;
        console.log(x);
    
        {
          let x = 2;
          console.log(x);
        }
    
        console.log(x);
      }
    

    Into this ES3 code:

      {
        var x1 = 1;
        console.log(x1);
        var x2 = 2;
        console.log(x2);
        console.log(x1);
      }
    

    Note the nested block scope in the ES6 source that does not exist in the ES3
    source. We should be able to recreate this scope.

  • We should support simple variable renaming.

    For example, a Scheme-to-JS compiler might emit variables with ! replaced as
    _bang: set-cdr! becomes set_cdr_bang.

    Another example: an ES6 to ES3 compiler might bind the this outside of an
    arrow function to a variable and close over it:

      var myArrow = () => this.x;
    
              |
              V
    
      var _this = this;
      var myArrow = function () { return _this.x; };
    

    The rematerialized scope inside the myArrow function should have a this
    binding that points to the _this variable.

  • We should support hiding bindings in the generated JS code that do not
    correspond to any bindings in the original source language. These might be
    gensyms, or temporary variables, or any implementation detail of the
    compiler's emitted JS code.

  • We should support rematerializing bindings in the original source language
    that do not have any corresponding bindings in the generated JS code.

    First example (Python to JS):

      result = [x + 1 for x in list]
    
              |
              V
    
      var result = [];
      for (var i = 0; i < list.length; i++) {
        // Note: no `x` binding in this generated JS code.
        result.push(list[i] + 1);
      }
    

    Second example (C++ ish to JS, psuedo-code for brevity):

      class Point {
        int x;
        int y;
    
        // `Point lhs += Point rhs` == `lhs.x += rhs.x; lhs.y += rhs.y;`
      }
    
      void moveDiagonal(Point &a) {
        Point offset(1, 1);
        a += offset;
      }
    
              |
              V
    
      function moveDiagonal(a) {
        // Note: `offset` not only doesn't exist as a binding in this generated JS
        // code, its members have been exploded and inlined!
        a.x += 1;
        a.y += 1;
      }
    

@andysterland
Copy link

Awesome, that's pretty damn comprehensive and covers the vast majority of the use cases developer will see.

Few questions:

  1. Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design.
    2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.

@fitzgen
Copy link
Contributor Author

fitzgen commented Feb 19, 2015

  1. Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design.

Types are interesting, especially when the generated JS representation of multiple source-level types are the same. An example where this is true is with emscripten's pointers and integers. It seems to me that the only way a pretty printer could do the right thing in this case is if it had the source-level type information. So yes, I agree it is very important.

There's lots of information we could (and I hope to) encode in source maps about variables:

  • Source-level type (should be optional, since not every compile-to-js language is statically typed)
  • Declaration location
  • Whether it is a formal parameter vs. constant vs. local definition

However, trying to bite off everything at once is less than ideal. We'd risk either getting stuck trying to over-engineer the perfect format, or we would ship the wrong things and have no good story for fixing it in future iterations.

I'd prefer if we could figure out the (a) bare minimum set of data points needed to recreate the source environment, and (b) how to ensure that we can extend the format in the future to add the bells and whistles.

My hope is that we can initially add pretty printing functions that take only the value, and independently add environments without optional source-level type information. After we've agreed upon those things, we can add optional type information to the environment, and pass that as a second parameter to the pretty printing functions. In this way, we can continually and incrementally ship improvements to the format without trapping ourselves in a dead end by making future improvements impossible.

@fitzgen
Copy link
Contributor Author

fitzgen commented Feb 19, 2015

2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.

I have much less of an understanding of how compilers targeting CSS use source maps than I do of compilers targeting JS and JS debuggers consuming source maps.

My understanding is that they work pretty alright, and there were much less deficiencies than with the to-JS case. It would be great if someone who understands this subdomain really well stepped up and took responsibility for ensuring that we provide for the to-CSS needs as well.

@fitzgen
Copy link
Contributor Author

fitzgen commented Jun 19, 2015

I wrote a little bit about how DWARF solves this problem: http://fitzgeraldnick.com/weblog/62/

@swannodette
Copy link

How does this proposal address or complicate the issue of transitivity? With source maps it's currently trivial to merge transformations between distinct and unrelated JavaScript compilation technology. Once you start encoding scopes it seems to me transitivity becomes increasingly more difficult to preserve. I could be wrong about that and happy to hear that there is prior art or that this is fundamentally a non-issue.

@fitzgen
Copy link
Contributor Author

fitzgen commented Jun 30, 2015

How does this proposal address or complicate the issue of transitivity? With
source maps it's currently trivial to merge transformations between distinct
and unrelated JavaScript compilation technology. Once you start encoding
scopes it seems to me transitivity becomes increasingly more difficult to
preserve. I could be wrong about that and happy to hear that there is prior
art or that this is fundamentally a non-issue.

The good news is that adding these scopes and bindings doesn't make it more
difficult to compose a source map's location mappings, which as you point out
many tools do now. A tool could easily ignore the environment information and
nothing would be any different from the situation now. In general, it is a goal
of future extensions to be 100% backwards compatible so that existing tooling
doesn't break; that tooling just won't take advantage of the shiny new features
enabled by such extensions.

If some tool in the pipeline does not modify the environment in any way, then
all it need do is apply the same transformation of locations that it does to
each mapping to the start and end bounds of each scope.


As far as prior art goes, unfortunately I'm not aware of anything directly
related, nor are the ex-gdb folks I asked. Traditional compilers don't really
have this issue because there isn't usually any post-processing (such as
minification) of the resulting executables. Either libraries are statically
compiled into the binary, in which case the compiler generates debug info along
with the main program's debug info, or they are dynamically loaded, in which
case they already have their own separate debug info.

The next closest tools are things like Valgrind and DTrace which instrument the
executable with additional probes after the fact. Valgrind instruments the
binary to jump to its own JIT'd code which records its traces and then jumps
back to the normal program. If you want to debug with gdb while using
valgrind, it actually implements its own gdb server and internally translates
whatever shifted offsets happened because of the instrumentation. On the other
hand,
DTrace bends over backwards to avoid shifting offsets whatsoever.
Neither approach seems too relevant to our discussion.

Yacc emits #line pragmas, which is fairly similar to composing source map
location information, but punts on scopes/bindings. dwz is a commandline tool
to compress DWARF debugging info, but doesn't actually modify the executable.

I'd be interested if you know of any bytecode instrumenters in JVM-land that
both modify the environment and maintain source-level debugging of the
environment. That certainly seems relevant, but I am ignorant of JVM bytecode
instrumentation.


The minifier, or any other tool that takes JS for further processing and changes
the environment, is the only thing that understands the changes it
makes. Therefore, it would have to propogate that information via the source
map, by doing some translation for each scope and binding. I've sketched out an
algorithm below:

1) For each scope S:
  1.1) For each binding B in S:
    1.1.1) Parse the JS snippet for locating B's value
    1.1.2) Walk the resulting AST and create a map M mapping from old JS
           bindings that snippet relies on to their new, renamed binding
    1.1.3) Generate a new JS snippet that first defines `var <old> = <new>;`
           for each of the entries in M
    1.1.4) Append the original JS snipet for locating B's value to that
           snippet
    1.1.5) Use that new JS snippet for locating B's value in the new
           source map
  1.2) Adjust the start and end bounds of S the same way location mappings
       are adjusted during source map composition now

Note again that if the tool does not modify the environment, then it can skip
step 1.1 and only do 1.2 for each scope. When this is the case, composing source
maps is not harder than without scopes and binding information.

When the tool does modify the environment (eg, a minifier shortening variable
names), while I wouldn't say this process is super straightforward, it is far
from impossible. Furthermore, I don't see a way to encode any environment
information without having some kind of process like this when composing source
maps and maintaining the ability to rematerialize the source-level
environment. And that's regardless of how the environment data is encoded:
whether it be purely data, a 100% JS reflection API, or some custom opcode
language like what DWARF has.

At the end of the day, I think the benefits outweight the drawbacks, especially
because things can only get better, and not worse, if we maintain backwards
compatibility.

@littledan
Copy link
Member

It's great to see this investigation. The lack of encoding of variables and scopes was cited in the draft WebAssembly/source-map integration as a downside of source-maps, and it seems like there is frequent discussion of this feature in the mailing list (e.g., @rbuckton's post).

@concavelenz
Copy link
Contributor

concavelenz commented Mar 5, 2018 via email

@littledan
Copy link
Member

@concavelenz Now that we have the motivation from both @dschuff's WebAssembly integration and the continued widespread use of minimizers and transpilers, should we start this effort to create this new structure?

nicolo-ribaudo pushed a commit to nicolo-ribaudo/source-map that referenced this issue Mar 13, 2024
* Configure github actions

* Fix missing source

* Explicitly configure toolchain

* Newer ubuntu runner
@jkup
Copy link
Collaborator

jkup commented Jun 24, 2024

I believe this is subsumed by the current scopes proposal. If there is anything missing, we should add it as a follow up to the main proposal.

@jkup jkup closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants