Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[interpreter] Section payload size is inefficiently encoded #625

Closed
tharvik opened this issue Dec 14, 2017 · 4 comments
Closed

[interpreter] Section payload size is inefficiently encoded #625

tharvik opened this issue Dec 14, 2017 · 4 comments

Comments

@tharvik
Copy link

tharvik commented Dec 14, 2017

In binary, the section payload size is inefficiently encoded.

It is generated by the binary/encode.ml, the output is still correctly loaded
by binary/decode.mlbut defy the purpose of using a compact representation for
the Section payload size.

A payload size is a varuint7, the upper byte set indicate a next block to read
in stream. For the problem at hand, it would be decoded so
[0x84 0x80 0x80 0x80 0x00] fold ((a,b) => a | ((b & 0x7F) << 7))
which equals 4.
The correct (or smallest) varuint7 encoding would be directly 0x04.

Example text format

(module 
  (type (func))
)

Hex stream

0061736D 01000000 01 8480808000 01 60 00 00  implementation
0061736d 01000000 01 04         01 60 00 00  specification

Header

hex stream name: type value
00 61 73 6D magic: uint8 \0asm
01 00 00 00 version: uint8 1

Sections

hex stream name: type value
01 id: varuint7 Type
84 80 80 80 00 / 04 payload_size: varuint7 6

Type

hex stream name: type value
01 count: varuint32 1

Func Type

hex stream name: type value
60 form: varint7 func type constructor
00 param_count: varuint 0
00 return_count: varuint 0
@rossberg
Copy link
Member

This was a conscious implementation choice. The binary format is intentionally defined to allow padding, since this enables backpatching, which is not just used by the interpreter but by most other generators as well, as far as I'm aware. One could optimise this with sufficient work, but both encodings are correct. There are only a few sections in a binary, so a couple of extra bytes for each are usually in the noise.

@RyanLamansky
Copy link

I agree that this is legal according to the spec, but I'd be hesitant to assume that most other generators use padding here. A generator that does enough staging in RAM (like mine) can use efficient encoding. This actually threw me off during development because my sizes came out smaller when round-tripping 3rd party WASMs.

@tharvik
Copy link
Author

tharvik commented Dec 19, 2017

Also, in the specification

varuintN

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

So the implementation should correctly patch and output 80 80 80 80 04.

@rossberg
Copy link
Member

@RyanLamansky, at least the ones I am aware of do (or did).

@tharvik, I agree that the text you cite is somewhat inaccurate, which is why that formulation is avoided in the actual specification.

@rossberg rossberg changed the title Section payload size is inefficiently encoded [interpreter] Section payload size is inefficiently encoded May 3, 2018
@rossberg rossberg closed this as completed May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants