Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get export info in transform without reloading the module #2829

Closed
mattjohnsonpint opened this issue Mar 23, 2024 · 17 comments · Fixed by #2832
Closed

Can't get export info in transform without reloading the module #2829

mattjohnsonpint opened this issue Mar 23, 2024 · 17 comments · Fixed by #2832
Labels

Comments

@mattjohnsonpint
Copy link
Contributor

mattjohnsonpint commented Mar 23, 2024

Bug description

In a transform's afterCompile stage, I should be able to get information about the module's exports. Indeed, the number of exports seems to be correct, and they have addresses. But for some reason I can't quite figure out - their names are blank.

However, if I generate binary from the module, then reload it - then I get the information I was looking for. But this seems rather inefficient and unnecessary.

Steps to reproduce

Make a new AssemblyScript project, containing at least one exported function.

Add a myTransform.js file:

import { Transform } from "assemblyscript/dist/transform.js";
import binaryen from "binaryen";
import console from "console";

export default class MyTransform extends Transform {
  afterCompile(module) {

    // Uncomment this to workaround the issue
    // module = binaryen.readBinary(module.emitBinary());
    
    const n = module.getNumExports();
    for (let i = 0; i < n; ++i) {
      const ref = module.getExportByIndex(i);
      const info = binaryen.getExportInfo(ref);
      console.log(ref, info);
    }
  }
}

Add the transform in asconfig.json, or compile with --transform ./myTransform.js, as per the docs.

Compile, and note that the output is something like the following:

5532704 { kind: 0, name: '', value: '' }
5533280 { kind: 0, name: '', value: '' }
5746088 { kind: 0, name: '', value: '' }
5747704 { kind: 0, name: '', value: '' }
5749368 { kind: 0, name: '', value: '' }
5751000 { kind: 0, name: '', value: '' }
5751160 { kind: 0, name: '', value: '' }
5754088 { kind: 0, name: '', value: '' }
5753920 { kind: 0, name: '', value: '' }

Then, uncomment the line indicated to workaround the bug, and it compile again. This time, the information is presented correctly:

5551080 { kind: 0, name: 'myFunction1', value: '0' }
5551256 { kind: 0, name: 'myFunction2', value: '1' }
5551416 { kind: 0, name: '__new', value: '58' }
5551592 { kind: 0, name: '__pin', value: '59' }
5551720 { kind: 0, name: '__unpin', value: '60' }
5551568 { kind: 0, name: '__collect', value: '61' }
5551512 { kind: 3, name: '__rtti_base', value: 'global$16' }
5552176 { kind: 2, name: 'memory', value: '0' }
5552256 { kind: 0, name: '_start', value: '66' }

AssemblyScript version

v0.27.25

@CountBleck
Copy link
Member

Hm, could this be a Binaryen bug?

@mattjohnsonpint
Copy link
Contributor Author

I think it's more to do with AssemblyScript's usage of binaryen.js. Internally, before the afterCompile is invoked, there's a call to binaryen.wrapModule. The result of which is passed by parameter as module. But that instance of binaryen is lost. Or rather, it's captured in the closure, but it's not exposed for reuse. When I import binaryen, it's a new instance of the binaryen object, which apparently is stateful.

A solution would be for the original binaryen object to be either set as a property on the transformer, or passed by parameter along with the module. Then I wouldn't import it, I'd just re-use the same instance.

@CountBleck
Copy link
Member

There isn't an old instance of Binaryen. The binaryen.wrapModule(/* ... */ module.ref) call is used to change an instance of AS's Module class to an instance of binaryen.js's Module class. The original Binaryen module isn't discarded; that's represented by module.ref (a pointer into Binaryen's heap).

@mattjohnsonpint
Copy link
Contributor Author

So, how would I use module.ref in the example code?

@CountBleck
Copy link
Member

What do you mean? By the way, I still don't think this is an AS problem, but I could be wrong.

@CountBleck
Copy link
Member

Actually, does anything change if you change from "binaryen" to from "assemblyscript/lib/binaryen.js"? (Apologies if the import path isn't exactly right, but you get the idea)

@mattjohnsonpint
Copy link
Contributor Author

Nope. That was the first thing I tried. Same result.

@mattjohnsonpint
Copy link
Contributor Author

In assemblyscript/cli/index.d.ts:

import { Program, Parser, Module } from "../src";
...
export abstract class Transform {
  ...
  afterCompile?(module: Module): void | Promise<void>;
}

So the Typescript definition of module used on the transform is the AssemblyScript Module type.

But then in assemblyscript/cli/index.js:

// From here on we are going to use Binaryen.js
  binaryenModule = binaryen.wrapModule(
    typeof module === "number" || module instanceof Number
      ? assemblyscript.getBinaryenModuleRef(module)
      : module.ref
  );
 
  ...

  {
    let error = await applyTransform("afterCompile", binaryenModule);
    if (error) return prepareResult(error);
  }

It's passing the wrapped binaryenModule.

So the type definition is incorrect. I've got a Binaryen module, not an AssemblyScript module.

Looking at this old example I think an older version passed the AssemblyScript module, but no longer.

@mattjohnsonpint
Copy link
Contributor Author

mattjohnsonpint commented Mar 24, 2024

There isn't an old instance of Binaryen.

Well, I'm not 100% sure about that. It sure seems like it's behaving like my import of binaryen and AsemblyScript's import of binaryen each have some different state in them. (the heap you mentioned, perhaps?)

Sorry, I'm not super familiar with the internals here.

@mattjohnsonpint
Copy link
Contributor Author

mattjohnsonpint commented Mar 24, 2024

If I compile from with VS Code's JavaScript Debug Terminal, I can set breakpoints and step through to the implementation of the binaryen module's functions. It's the minified version, which is annoying, but I can still see a bit of what's going on there.

For example, module.getNumExports implementation minified is:

I.getNumExports=function(){return A._BinaryenGetNumExports(g)}

Where I is the binaryen Module object, and A is the binaryen object.

From there, I can get the data just fine. For example:

A.getExportInfo(I.getExportByIndex(0))

output:

{kind: 0, name: 'myFunction1', value: 'assembly/index/myFunction1'}

So the information is there, it's just only accessible if using A - which is hidden behind a closure within the binaryen module. I can't get at it in code.

That's why I was saying it's behaving as if it's two different instances. binaryen imported by me, and A appear to both have the same set of APIs, but different state.

@dcodeIO
Copy link
Member

dcodeIO commented Mar 24, 2024

I remember that similar issues in the past were a result of two different Binaryen packages in the module graph being present. For instance, AS used version A pinned in its package.json, whereas the surrounding package imported version B.

Each instance of Binaryen has its own memory, and wrapModule assumes that it's the exact same package, then sharing the same memory, essentially passing it a pointer to the respective module instance.

AS provides a re-export of the exact Binaryen package it uses here

"./binaryen": {
"import": "./lib/binaryen.js",
"types": "./lib/binaryen.d.ts"
},

i.e.

import binaryen from "assemblyscript/binaryen";
...

which should mitigate such issues - that is, unless something broke, say with the switch to Binaryen's Wasm variant.

@CountBleck
Copy link
Member

Hm, I thought assemblyscript/lib/binaryen.js (plus or minus a node_modules) would've resolved to the same file...

@dcodeIO
Copy link
Member

dcodeIO commented Mar 24, 2024

It should, yeah.

@mattjohnsonpint
Copy link
Contributor Author

I tried all variations of importing binaryen mentioned here, and a few others such as dynamic import. All same results.
If indeed each binaryen instance has its own memory, then I would need access to the binaryen instance used by AssemblyScript. Perhaps it can be added as a property on the transform, similar to program?

@CountBleck
Copy link
Member

CountBleck commented Mar 25, 2024

@mattjohnsonpint could you try cloning AssemblyScript and adding binaryen, after line 452 of cli/index.js?

I'd like to hear if that works in your project before making a PR. You should be able to use this.binaryen in the transform afterwards.

@mattjohnsonpint
Copy link
Contributor Author

Will try tomorrow and get back to you. Thanks! 🙂

@mattjohnsonpint
Copy link
Contributor Author

Yes. It works perfectly. After adding binaryen where you indicated, I just changed my transform to use this.binaryen instead of importing it, and I get the expected output I was looking for.

I will send a PR shortly, including the Typescript changes. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants