Table of contents
+-------------+
| Source code | --> Tokenizer ------>---+
+-------------+ |
+------------+
+---------<----- Parser <-- | Token tree |
| +------------+
+-------+ +---------------------------------+
| AST | --> Verify --> | [ Either TypeError ProofError ] |
+-------+ +---------------------------------+
| +--------+
\-------> Exec --> | Output |
+--------+
Generally, the flow of the program is fairly linear. Given a single file as input, the tokens
submodule produces a token tree (as described in the next section). Assuming that the tokenizing was
successful, the parser constructs the AST. This is in the appropriately-named submodule ast
.
From this point, there are two processes that act on the AST. The first of these is general program
verification (housed in the submodule verify
), which largely does three things: (1) ensuring all
names are in scope, (2) type-checking (using src/types
), and (3) proof validation (using
src/proof
). If there are no errors in verification, the AST will be interpreted directly by the
exec
submodule.
Token trees are a concept that is worth discussing briefly. The term seems to have originated within
Rust. Token trees are defined by requiring certain delimeters to always be matched with their
respective closing delimeter - e.g. (
requires )
and {
requires }
. The token tree itself is
then defined by treating tokens within parenthesized blocks as children of their containing
parenthesis token and all other token kinds as having no children.
This creates a structure that is easier for the parser module to work with. The following input, for example:
let x = (a + b) - c;
becomes:
--------- Root ----------
/ / / | \ \ \
"let" "x" "=" ( ) "-" "c" ";"
/ | \
"a" "+" "b"
Most of the parsing operations are fairly standard, but there are a few things to note. Firstly,
"proof statements" exist outside of function bodies and are given their own submodule within ast
in order to separate those two implementations. Secondly, instead of using a shift-reduce parser,
most of the functions that produce AST types are given a set of tokens with which to parse the
entirety of that structure. For example, an expression (represented by the Expr
type) might be
expected to construct an expression with the tokens [``"``x``"``,
"``+``"``,
"``y``"``]
,
instead of consuming as much of the token tree as is necessary. This is also reflected in the
implementation of binary operator parsing; it is done with the same strategy as mentioned above,
which it has O(n^2)
time complexity, instead of a more advanced method.
As mentioned above, there are broadly three distinct properties that are checked during verification. These all occur in a single pass, but will be described separately. The first is that all variables and functions referenced are in scope (note that functions are special; they cannot be considered first-class variables). Scoping is represented by a series of linked scope items - single variables and their types. All top-level definitions are functions (and vice versa), so a separate "top-level scope" is provided for accessing information about functions. Whenever a new variable is defined (via the let keyword), the succeeding statements are verified with a new scope, created with the addition of the variable definition.
struct Scope {
item: Option<ScopeItem>,
parent: Option<&Scope>,
top_level: &HashMap<&str, Func>,
}
The second process is type-checking. Because there are no parameterized types, type checking (and
with it, type inference) is fairly simple; it is only operators (logical, comparison, and
arithmetic) and function calls that determine types. This is done directly. There is a pattern among
many of the functions in the verify
module to - as one of the returned values - give the type that
an expression produces, which is recursively used to determine the type of each expression (which
includes variables).
For only those expressions that evaluate to integer types (either Int
or UInt
), the prover
tracks their relations, assigning temporary variable names where necessary, of the form <n>
,
where n
is a unique identifier. This is used in the prover via a process of variable definitions
to transform requirements. For example, given the statement let x = y + 3
, the two definitions
given to the prover might be <0>
as y + 3
, followed by x
as <0>
, because the right-hand
side of the assignment produces a single identifier to use to replace x
. These definitions are
then expanded in reverse whenever something must be proven about a variable, so that the prover
interface need not account for definitions and re-definitions. The basic prover interface is wrapped
in two levels: once by ScopedSimpleProver
in 'src/proof/mod.rs', and once again by WrappedProver
in 'src/verify/prover.rs'. These combine to provide the more complex usage needed for verification,
while allowing the proof algorithms (of which there are two) to stay simple.
The two algorithms for proof currently available are the "bounds" method and the "graph" method, and can be switched between by the feature flags with the same name. Both of these are described in more depth in their respective documents.
Program execution is exceptionally simple; it operates directly on the AST. The executor expects
there to be a single main
function to run, and simply keeps the current scope as a
linked-list-like structure of hashmaps from variable names to values. Some checks are performed at
runtime but these all lead to a panic (and therefore immediate termination) as opposed to a useful
error message which would have been generated during verification - those that are performed are
typically to assert that operands have the correct type (which is required to unwrap the value).
The Value
type (see below) as used by the executor could be used in a way similar to many
dynamically-typed languages, but its flexibility is narrowed by the static verification provided
earlier in the pipeline.
enum Value {
Unit,
Bool(bool),
Int(isize),
Struct { fields: HashMap<String, Value> },
}
Note that the set of builtin types is minimal.