Skip to content
This repository has been archived by the owner on Jun 27, 2024. It is now read-only.

Support Wasm Text Format (%.wat) #1

Closed
codefromthecrypt opened this issue Nov 25, 2021 · 10 comments
Closed

Support Wasm Text Format (%.wat) #1

codefromthecrypt opened this issue Nov 25, 2021 · 10 comments

Comments

@codefromthecrypt
Copy link
Contributor

We currently decode Wasm Binary Format. This means that usage depends on an external tool that targets it (ex tinygo -> wasm).

Wasm also defines a Text Format, which if supported gives a number of pros:

  • unit tests can define inlined code and trigger specific edge cases
  • new users can familiarize themselves before learning a toolchain. Ex. following along in a book
  • wasm used for benchmarks will have more predictable speed and allocations as the translation is direct (not embedding GC impl etc)
  • power users who want better performance or less bloat can write the text format directly instead of another language first.
  • those supplying text format could be optimized for a single pass into wazeroir intermediate format

Non-goals:

  • supporting the unreleased 1.1 specification. This will only be 1.0
  • wast format, this will only be wat format
  • exporting anything except a Parse function and mandatory options

Optional and should be in follow-up pull requests, not a Big Bang:

  • writing %.wasm from %.wat (or at all) is optional
  • While a go-native wat2wasm is a nice to have, it can be decoupled

There will be some challenges and choices along the way.

  • The text format supports a mix of expression styles: S expressions vs stack
  • The text format supports both indexed parameters and named ones
  • Validation may overlap with what's already done when decoding the binary
  • There are pros and cons about single pass vs parsing into Module first

Development can and should be done incrementally. For example, by making all code internal, we can convert simplest to most complicated wasm already used in this project into inlined or testdata wat for test cases or benchmarks.

Those working around this meanwhile, can install wat2wasm on their system then reference it from go, similar to this, then using wasm.DecodeModule() on the result:

// requireWasm temporarily calls system `wat2wasm` until we implement it here
func requireWasm(t *testing.T, wat string) []byte {
	dir := t.TempDir()
	watFile := path.Join(dir, "temp.wat")
	require.NoError(t, os.WriteFile(watFile, []byte(wat), 0o600))

	wasmFile := path.Join(dir, "temp.wasm")
	require.NoError(t, exec.Command("wat2wasm", watFile, "-o", wasmFile).Run())
	bytes, err := os.ReadFile(wasmFile)
	require.NoError(t, err)
	return bytes
}

To implement this well will include parsing and lexing well, including retention of line and column information on errors. For example, goawk and mugo may be helpful background reading.

@benhoyt
Copy link

benhoyt commented Nov 25, 2021

Thanks for the shout-out to GoAWK. :-) Regarding line and column info: most GoAWK errors are errors during parsing, where I have line and column info in the tokens being parsed, so I can include that in the error. However, I don't store this in the syntax tree (AST nodes), so the few runtime errors that do exist don't include col/line info. If doing it again I'd probably add that info to AST nodes, though.

@codefromthecrypt
Copy link
Contributor Author

went with a loop instead of an iterator api, and only implemented the basic lexer so far. the column and line positions will indeed make it upwards. next step is a proof of concept parser, later finish the lexing which floating points will be the most tedious! tetratelabs/wazero#63

@codefromthecrypt
Copy link
Contributor Author

in studying next steps one thing I recognize is at some point we need to do full wat2wasm to satisfy FunctionInstance.Body which is a field that contains the binary encoding of the function.

Meanwhile, I'm focused on how to surface the stream of parsing. It appears that routinely peek 2 known tokens is needed. Ex lparen and a field name (keyword). It may be efficient to give a windowing function of up to N tokens to allow the parser function to do more. At the moment, I think only seeing one token isn't useful at all, and you can see wabt for example routinely needs 2 https://github.com/WebAssembly/wabt/blob/main/src/wast-parser.h#L79-L96

@codefromthecrypt
Copy link
Contributor Author

TL;DR: I think we should delete the text compiler to focus energy on the ever expanding responsibilities catering to other core specs. 👍 or if you 👎 please add a comment why and how we can solve the labor issue.

  • WebAssembly 2.0 is dramatically larger and features beyond that larger still. We have limited capacity as specs expand.
  • Few "real" integrations actually use the text format. For example, most distribute the binary format
  • The text format isn't the larger concern of compilers, as many more ask about other languages like Golang (TinyGo)
  • We already need to use wabt routinely to address things like spec tests
  • The existing text compiler needs rework which costs attention better spent towards more requested things like DWARF
  • The existing text compiler could be spun into a dependency free repo and used near seamlessly should people become available to do it, and a separate repo removes weight.

Even though I spent personally months on this, I think the best choice is to delete this code, for better attention to the core responsibilities of a runtime, and also before we release 1.0

@codefromthecrypt
Copy link
Contributor Author

PS I'm totally game at least internally to divert attention to a WasmBuilder, which starting with internal code be able to materialize a module in the binary format. This is significantly easier to do than parsing and could produce the simple modules we tend to use in unit tests. In other words, removing the text compiler doesn't imply a huge amount of checked in binaries in this repo.. we could spend the energy in a different way to support ad-hoc modules, and providing an alternate utility could be in the same change that removes the text compiler.

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented May 28, 2022

If folks are wondering why I piped up now, it is more than just the things on wazero 1.0 or webassembly 2.0. What I noticed was that the next WASI is built on the component model which extends grammar even further.

https://github.com/WebAssembly/component-model/blob/bcc2002c8b74381004c363f7b04853c1e636d9ca/design/mvp/Explainer.md

I think the best focus of this project is the runtime, and particularly laboring towards the best JIT we can do, and best dev/debug story we can do. The text format has very little to do with this, yet an ever expanding definition. Choosing battles wisely is ditching it.

EOF

@codefromthecrypt
Copy link
Contributor Author

What I'll do is create a new repo called watzero and migrate code in such a way that it is standalone. I'll temporarily add a dependency back here to watzero then replace those parts with a wasm builder api before we cut 1.0

That's the most decoupled plan I can think of which also gives hope if someone wants to help move forward a dependency free wat2wasm go lib. If watzero doesn't end up doing that, we'll archive it.

@codefromthecrypt
Copy link
Contributor Author

I started to do the migration in a separate repo, but that didn't work out well. I started over doing it in internal/watzero, which we can then git subtree out to its own repo. Doing it this way helps as it allows less thrash while sorting out the code dependencies.

codefromthecrypt referenced this issue in tetratelabs/wazero Jun 1, 2022
This drops the text format (%.wat) and renames
InstantiateModuleFromCode to InstantiateModuleFromBinary as it is no
longer ambiguous.

We decided to stop supporting the text format as it isn't typically used
in production, yet costs a lot of work to develop. Given the resources
available and the increased work added with WebAssembly 2.0 and soon
WASI 2, we can't afford to spend the time on it.

The old parser is used only internally and will eventually be moved to
its own repository named watzero, possibly towards archival.

See #59

Signed-off-by: Adrian Cole <adrian@tetrate.io>
codefromthecrypt referenced this issue in tetratelabs/wazero Jun 1, 2022
This drops the text format (%.wat) and renames
InstantiateModuleFromCode to InstantiateModuleFromBinary as it is no
longer ambiguous.

We decided to stop supporting the text format as it isn't typically used
in production, yet costs a lot of work to develop. Given the resources
available and the increased work added with WebAssembly 2.0 and soon
WASI 2, we can't afford to spend the time on it.

The old parser is used only internally and will eventually be moved to
its own repository named watzero, possibly towards archival.

See #59

Signed-off-by: Adrian Cole <adrian@tetrate.io>
codefromthecrypt referenced this issue in tetratelabs/wazero Jun 1, 2022
This drops the text format (%.wat) and renames
InstantiateModuleFromCode to InstantiateModuleFromBinary as it is no
longer ambiguous.

We decided to stop supporting the text format as it isn't typically used
in production, yet costs a lot of work to develop. Given the resources
available and the increased work added with WebAssembly 2.0 and soon
WASI 2, we can't afford to spend the time on it.

The old parser is used only internally and will eventually be moved to
its own repository named watzero, possibly towards archival.

See #59

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt codefromthecrypt transferred this issue from tetratelabs/wazero Aug 30, 2022
@codefromthecrypt
Copy link
Contributor Author

planning to cancel the text parser altogether in #2 Doing so can help us keep the rest of the code alive, which has more utility and easier to maintain. If we start having enough help, the prior commit has the last working version.

@codefromthecrypt
Copy link
Contributor Author

closing as won't fix so that we can focus on the other parts. It seems a lot more people are interested in compiling go into wasm, not wat2wasm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants