Skip to content

Commit

Permalink
[spec/interpreter] Specify text format and adapt interpreter (#471)
Browse files Browse the repository at this point in the history
This change specifies the text format, based on earlier discussion between @binji, @lukewagner, @sunfishcode, and myself. It also adapts the interpreter to the changes listed below.

The changes relative to the .wast format previously implemented in the interpreter and other tools are the following.

Removals:

- some of the more baroque forms of sugar for `if`
- binary module bodies
- anything script related (assertions, invokes, etc)
- `infinity` as a secondary spelling for float `inf`

Additions:

- \u{...} escapes in strings (see below)
- more than just one inline export (that is, you can write (func $f (export "f1") (export "f2") ...), closing a gap in the syntax), and it combines with import
- the toplevel (module ...) is optional

Changes:

One breaking change makes the syntax forward compatible with some of the future extensions that have been discussed:

- non-empty block signatures must now be written (result i32), in order to generalise cleanly to function signatures

Unicode:

- the lexical syntax is defined in terms of Unicode characters (i.e., code points)
- comments and strings may contain mostly arbitrary Unicode, the rest stays within ASCII
- in strings, a Unicode character denotes its UTF-8 encoding
- in strings, Unicode characters can be given explicitly with \u{...} notation
- .wat files are assumed to be encoded in UTF-8

Misc Remarks:

- formatting characters: currently only the minimum set of formatting characters are allowed as white space (\t, \n, \r); we could include more, e.g. the whole set of ASCII "format effectors" (\b, \v, \f), but you quickly get into a lot of Unicode complexity if you want to go further than that

- Unicode in comments: similarly, in order to avoid getting into Unicode specifics, any legal code point is currently allowed in comments; should we be more restrictive?

- binary module bodies: they would seem pretty unusual for a "text" format, so are not included for now.

- abbreviations: to avoid combinatorial complexity in defining the AST to map on, most syntactic sugar is specified in the form of "abbreviations", simple rewritings into the core syntax

- inline function signatures: I tried to come up with a decent way to describe their rewriting into type indices (and the potential insertion of new type definitions) in terms of rules, but ultimately gave up; it's too cumbersome to express succinctly; so this is the one part that is left partially informal (though hopefully still unambiguous)

- formatting: many of the rules do not currently fit the page width; I left them as is for now, and plan to clean up layout issues once the spec is complete, probably tweaking some layout parameters as well

- tests: lots of stuff we could write tests for, e.g. regarding the Unicode support...
  • Loading branch information
rossberg committed Jun 1, 2017
1 parent 8319a61 commit c4774b4
Show file tree
Hide file tree
Showing 75 changed files with 5,443 additions and 3,132 deletions.
1 change: 1 addition & 0 deletions document/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
_build
_static
document/*.pyc
8 changes: 0 additions & 8 deletions document/appendix-textual/index.rst

This file was deleted.

4 changes: 2 additions & 2 deletions document/binary/conventions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ In order to distinguish symbols of the binary syntax from symbols of the abstrac
* :math:`B^n` is a sequence of :math:`n\geq 0` iterations of :math:`B`.

* :math:`B^\ast` is a possibly empty sequence of iterations of :math:`B`.
(This is a shorthand for :math:`A^n` used where :math:`n` is not relevant.)
(This is a shorthand for :math:`B^n` used where :math:`n` is not relevant.)

* :math:`B^?` is an optional occurrence of :math:`B`.
(This is a shorthand for :math:`A^n` where :math:`n \leq 1`.)
(This is a shorthand for :math:`B^n` where :math:`n \leq 1`.)

* :math:`x{:}B` denotes the same language as the nonterminal :math:`B`, but also binds the variable :math:`x` to the attribute synthesized for :math:`B`.

Expand Down
34 changes: 14 additions & 20 deletions document/binary/instructions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,23 @@ The only exception are :ref:`structured control instructions <binary-instr-contr
.. _binary-instr-control:
.. index:: control instructions, structured control, label, block, branch, result type, label index, function index, type index, vector, polymorphism
pair: binary format; instruction
single: abstract syntax; instruction

Control Instructions
~~~~~~~~~~~~~~~~~~~~

:ref:`Control instructions <syntax-instr-control>` have varying encodings. For structured instructions, the nested instruction sequences are terminated with explicit opcodes for |END| and |ELSE|.

.. _valid-nop:
.. _valid-unreachable:
.. _valid-block:
.. _valid-loop:
.. _valid-if:
.. _valid-br:
.. _valid-br_if:
.. _valid-br_table:
.. _valid-return:
.. _valid-call:
.. _valid-call_indirect:
.. _binary-nop:
.. _binary-unreachable:
.. _binary-block:
.. _binary-loop:
.. _binary-if:
.. _binary-br:
.. _binary-br_if:
.. _binary-br_table:
.. _binary-return:
.. _binary-call:
.. _binary-call_indirect:

.. math::
\begin{array}{llclll}
Expand All @@ -51,8 +50,8 @@ Control Instructions
&\Rightarrow& \IF~\X{rt}~\X{in}_1^\ast~\ELSE~\X{in}_2^\ast~\END \\ &&|&
\hex{0C}~~l{:}\Blabelidx &\Rightarrow& \BR~l \\ &&|&
\hex{0D}~~l{:}\Blabelidx &\Rightarrow& \BRIF~l \\ &&|&
\hex{0E}~~l^\ast{:}\Bvec(\Blabelidx)~~l_N{:}\Blabelidx &\Rightarrow&
\BRTABLE~l^\ast~l_N \\ &&|&
\hex{0E}~~l^\ast{:}\Bvec(\Blabelidx)~~l_N{:}\Blabelidx
&\Rightarrow& \BRTABLE~l^\ast~l_N \\ &&|&
\hex{0F} &\Rightarrow& \RETURN \\ &&|&
\hex{10}~~x{:}\Bfuncidx &\Rightarrow& \CALL~x \\ &&|&
\hex{11}~~x{:}\Btypeidx &\Rightarrow& \CALLINDIRECT~x \\
Expand All @@ -65,7 +64,6 @@ Control Instructions
.. _binary-instr-parametric:
.. index:: value type, polymorphism
pair: binary format; instruction
single: abstract syntax; instruction

Parametric Instructions
~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -86,7 +84,6 @@ Parametric Instructions
.. _binary-instr-variable:
.. index:: variable instructions, local index, global index
pair: binary format; instruction
single: abstract syntax; instruction

Variable Instructions
~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -113,8 +110,7 @@ Variable Instructions
.. _binary-instr-memory:
.. _binary-memarg:
.. index:: memory instruction, memory index
pair: validation; instruction
single: abstract syntax; instruction
pair: binary format; instruction

Memory Instructions
~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -168,7 +164,6 @@ Each variant of :ref:`memory instruction <syntax-instr-memory>` is encoded with
.. _binary-instr-numeric:
.. index:: numeric instruction
pair: binary format; instruction
single: abstract syntax; instruction

Numeric Instructions
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -370,7 +365,6 @@ All other numeric instructions are plain opcodes without any immediates.
.. _binary-expr:
.. index:: expression
pair: binary format; expression
single: abstract syntax; expression
single: expression; constant

Expressions
Expand Down
28 changes: 5 additions & 23 deletions document/binary/modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,14 @@ except that :ref:`function definitions <syntax-func>` are split into two section
.. _binary-globalidx:
.. _binary-localidx:
.. _binary-labelidx:
.. index:: index, index space, type index, function index, table index, memory index, global index, local index, label index
.. index:: index, type index, function index, table index, memory index, global index, local index, label index
pair: binary format; type index
pair: binary format; function index
pair: binary format; table index
pair: binary format; memory index
pair: binary format; global index
pair: binary format; local index
pair: binary format; label index
single: abstract syntax; type index
single: abstract syntax; function index
single: abstract syntax; table index
single: abstract syntax; memory index
single: abstract syntax; global index
single: abstract syntax; local index
single: abstract syntax; label index

Indices
~~~~~~~
Expand Down Expand Up @@ -112,7 +105,6 @@ Their contents consist of a :ref:`name <syntax-name>` further identifying the cu
.. _binary-type:
.. index:: ! type section, type definition
pair: binary format; type section
single: abstract syntax; type definition
pair: section; type

Type Section
Expand All @@ -132,7 +124,6 @@ It decodes into a vector of :ref:`function types <syntax-functype>` that represe
.. _binary-import:
.. index:: ! import section, import, name, function type, table type, memory type, global type
pair: binary format; import
single: abstract syntax; import
pair: section; import

Import Section
Expand Down Expand Up @@ -160,7 +151,6 @@ It decodes into a vector of :ref:`imports <syntax-import>` that represent the |I
.. _binary-func:
.. index:: ! function section, function, type index, function type
pair: binary format; function
single: abstract syntax; function
pair: section; function

Function Section
Expand All @@ -181,7 +171,6 @@ The |LOCALS| and |BODY| fields of the respective functions are encoded separatel
.. _binary-table:
.. index:: ! table section, table, table type
pair: binary format; table
single: abstract syntax; table
pair: section; table

Table Section
Expand All @@ -203,7 +192,6 @@ It decodes into a vector of :ref:`tables <syntax-table>` that represent the |TAB
.. _binary-mem:
.. index:: ! memory section, memory, memory type
pair: binary format; memory
single: abstract syntax; memory
pair: section; memory

Memory Section
Expand All @@ -225,7 +213,6 @@ It decodes into a vector of :ref:`memories <syntax-mem>` that represent the |MEM
.. _binary-global:
.. index:: ! global section, global, global type, expression
pair: binary format; global
single: abstract syntax; global
pair: section; global

Global Section
Expand All @@ -248,7 +235,6 @@ It decodes into a vector of :ref:`globals <syntax-global>` that represent the |G
.. _binary-export:
.. index:: ! export section, export, name, index, function index, table index, memory index, global index
pair: binary format; export
single: abstract syntax; export
pair: section; export

Export Section
Expand Down Expand Up @@ -276,7 +262,6 @@ It decodes into a vector of :ref:`exports <syntax-export>` that represent the |E
.. _binary-start:
.. index:: ! start section, start function, function index
pair: binary format; start function
single: abstract syntax; start function
single: section; start
single: start function; section

Expand All @@ -299,7 +284,6 @@ It decodes into an optional :ref:`start function <syntax-start>` that represents
.. _binary-elem:
.. index:: ! element section, element, table index, expression, function index
pair: binary format; element
single: abstract syntax; element
pair: section; element
single: table; element
single: element; segment
Expand All @@ -324,7 +308,7 @@ It decodes into a vector of :ref:`element segments <syntax-elem>` that represent
.. _binary-local:
.. index:: ! code section, function, local, type index, function type
pair: binary format; function
single: abstract syntax; function
pair: binary format; local
pair: section; code

Code Section
Expand Down Expand Up @@ -360,14 +344,14 @@ denoting *count* locals of the same value type.
&\Rightarrow& \X{code} & (\X{size} = ||\Bfunc||) \\
\production{function} & \Bfunc &::=&
(t^\ast)^\ast{:}\Bvec(\Blocals)~~e{:}\Bexpr
&\Rightarrow& \F{concat}((t^\ast)^\ast), e^\ast
& (|\F{concat}((t^\ast)^\ast)| < 2^{32}) \\
&\Rightarrow& \concat((t^\ast)^\ast), e^\ast
& (|\concat((t^\ast)^\ast)| < 2^{32}) \\
\production{locals} & \Blocals &::=&
n{:}\Bu32~~t{:}\Bvaltype &\Rightarrow& t^n \\
\end{array}
Here, :math:`\X{code}` ranges over pairs :math:`(\valtype^\ast, \expr)`.
The meta function :math:`\F{concat}((t^\ast)^\ast)` denotes the sequence of types formed by concatenating all sequences :math:`t_i^\ast` in :math:`(t^\ast)^\ast`.
The meta function :math:`\F{concat}((t^\ast)^\ast)` concatenates all sequences :math:`t_i^\ast` in :math:`(t^\ast)^\ast`.
Any code for which the length of the resulting sequence is out of bounds of the maximum size of a :ref:`vector <syntax-vec>` is malformed.

.. note::
Expand All @@ -379,7 +363,6 @@ Any code for which the length of the resulting sequence is out of bounds of the
.. _binary-data:
.. index:: ! data section, data, memory, memory index, expression, byte
pair: binary format; data
single: abstract syntax; data
pair: section; data
single: memory; data
single: data; segment
Expand All @@ -405,7 +388,6 @@ It decodes into a vector of :ref:`data segments <syntax-data>` that represent th
.. _binary-version:
.. index:: module, section, type definition, function type, function, table, memory, global, element, data, start function, import, export, context, version
pair: binary format; module
single: abstract syntax; module

Modules
~~~~~~~
Expand Down
16 changes: 3 additions & 13 deletions document/binary/types.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
.. _binary-type:
.. index:: type
pair: binary format; type
single: abstract syntax; type

Types
-----

.. _binary-valtype:
.. index:: value type
pair: binary format; value type
single: abstract syntax; value type

Value Types
~~~~~~~~~~~
Expand All @@ -34,7 +32,6 @@ Value Types
.. _binary-blocktype:
.. index:: result type, value type
pair: binary format; result type
single: abstract syntax; result type

Result Types
~~~~~~~~~~~~
Expand All @@ -55,7 +52,6 @@ The only :ref:`result types <syntax-resulttype>` occurring in the binary format
.. _binary-functype:
.. index:: function type, value type, result type
pair: binary format; function type
single: abstract syntax; function type

Function Types
~~~~~~~~~~~~~~
Expand All @@ -73,7 +69,6 @@ Function Types
.. _binary-limits:
.. index:: limits
pair: binary format; limits
single: abstract syntax; limits

Limits
~~~~~~
Expand All @@ -90,8 +85,7 @@ Limits
.. _binary-memtype:
.. index:: memory type, limits, page size
single: binary format; memory type
pair: abstract syntax; memory type
pair: binary format; memory type

Memory Types
~~~~~~~~~~~~
Expand All @@ -105,13 +99,11 @@ Memory Types
\end{array}
.. _syntax-tabletype:
.. _syntax-elemtype:
.. _binary-tabletype:
.. _binary-elemtype:
.. index:: table type, element type, limits
pair: binary format; table type
pair: binary format; element type
single: abstract syntax; table type
single: abstract syntax; element type

Table Types
~~~~~~~~~~~
Expand All @@ -131,8 +123,6 @@ Table Types
.. index:: global type, mutability, value type
pair: binary format; global type
pair: binary format; mutability
single: abstract syntax; global type
single: abstract syntax; mutability

Global Types
~~~~~~~~~~~~
Expand Down
Loading

0 comments on commit c4774b4

Please sign in to comment.