[interpreter] Section payload size is inefficiently encoded #625

tharvik · 2017-12-14T19:30:25Z

In binary, the section payload size is inefficiently encoded.

It is generated by the binary/encode.ml, the output is still correctly loaded
by binary/decode.mlbut defy the purpose of using a compact representation for
the Section payload size.

A payload size is a varuint7, the upper byte set indicate a next block to read
in stream. For the problem at hand, it would be decoded so
[0x84 0x80 0x80 0x80 0x00] fold ((a,b) => a | ((b & 0x7F) << 7))
which equals 4.
The correct (or smallest) varuint7 encoding would be directly 0x04.

Example text format

(module 
  (type (func))
)

Hex stream

0061736D 01000000 01 8480808000 01 60 00 00  implementation
0061736d 01000000 01 04         01 60 00 00  specification

Header

hex stream	name: `type`	value
`00 61 73 6D`	magic: `uint8`	`\0asm`
`01 00 00 00`	version: `uint8`	1

Sections

hex stream	name: `type`	value
`01`	id: `varuint7`	Type
`84 80 80 80 00` / `04`	payload_size: `varuint7`	6

Type

hex stream	name: `type`	value
`01`	count: `varuint32`	1

Func Type

hex stream	name: `type`	value
`60`	form: `varint7`	func type constructor
`00`	param_count: `varuint`	0
`00`	return_count: `varuint`	0

The text was updated successfully, but these errors were encountered:

rossberg · 2017-12-15T09:46:37Z

This was a conscious implementation choice. The binary format is intentionally defined to allow padding, since this enables backpatching, which is not just used by the interpreter but by most other generators as well, as far as I'm aware. One could optimise this with sufficient work, but both encodings are correct. There are only a few sections in a binary, so a couple of extra bytes for each are usually in the noise.

RyanLamansky · 2017-12-16T13:20:14Z

I agree that this is legal according to the spec, but I'd be hesitant to assume that most other generators use padding here. A generator that does enough staging in RAM (like mine) can use efficient encoding. This actually threw me off during development because my sizes came out smaller when round-tripping 3rd party WASMs.

tharvik · 2017-12-19T15:12:42Z

Also, in the specification

varuintN

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

So the implementation should correctly patch and output 80 80 80 80 04.

rossberg · 2017-12-19T15:58:35Z

@RyanLamansky, at least the ones I am aware of do (or did).

@tharvik, I agree that the text you cite is somewhat inaccurate, which is why that formulation is avoided in the actual specification.

rossberg changed the title ~~Section payload size is inefficiently encoded~~ [interpreter] Section payload size is inefficiently encoded May 3, 2018

rossberg closed this as completed May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[interpreter] Section payload size is inefficiently encoded #625

[interpreter] Section payload size is inefficiently encoded #625

tharvik commented Dec 14, 2017

rossberg commented Dec 15, 2017

RyanLamansky commented Dec 16, 2017

tharvik commented Dec 19, 2017

rossberg commented Dec 19, 2017

[interpreter] Section payload size is inefficiently encoded #625

[interpreter] Section payload size is inefficiently encoded #625

Comments

tharvik commented Dec 14, 2017

rossberg commented Dec 15, 2017

RyanLamansky commented Dec 16, 2017

tharvik commented Dec 19, 2017

rossberg commented Dec 19, 2017