This is a simple compiler written for the University of Helsinki Compilers course. Compiler is written using Java 23.
A simple program written in this language would be as follows.
var n: Int = read_int();
print_int(n);
while n > 1 do {
if n % 2 == 0 then {
n = n / 2;
} else {
n = 3*n + 1;
}
print_int(n);
}
Note Copied from the course page.
An expression is defined recursively as follows, where E
, E1
, E2
, …
En
represent some other arbitrary expression.
- Integer literal: a positive whole number.
- Negative numbers should be composed of token
-
followed by an integer literal token. - Boolean literal: either
true
orfalse
. - Identifier: a word consisting of letters, underscores or digits, but the first character must not be a digit.
- Unary operator: either
-E
ornot E
. - Binary operator:
E1 op E2
whereop
is one of the following:+
,-
,*
,/
,%
,==
,!=
,<
,<=
,>
,>=
,and
,or
,=
.- Operator
=
is right-associative. - All other operators are left-associative.
- Precedences are defined below.
- Operator
- Parentheses:
(E)
, used to override precedence. - Block:
{ E1; E2; ...; En }
or{ E1; E2; ...; En; }
(may be empty, last semicolon optional).- Semicolons after subexpressions that end in
}
are optional.
- Semicolons after subexpressions that end in
- Untyped variable declaration:
var ID = E
whereID
is an identifier. - Typed variable declaration:
var ID: T = E
whereID
is an identifier andT
isInt
,Bool
orUnit
. - If-then conditional:
if E1 then E2
- If-then-else conditional:
if E1 then E2 else E3
- While-loop:
while E1 do E2
- Function call:
ID(E1, E2, ..., En)
where ID is an identifier
Variable declarations (var ...
) are allowed only directly inside blocks ({ ... }
) and in top-level expressions.
=
or
and
,==
,!=
<
,<=
,>
,>=
+
,-
*
,/
,%
- Unary
-
andnot
- All other constructs: literals, identifiers, if, while, var, blocks, parentheses, function calls.
The program consists of a single top-level expression. If the program text has multiple expressions separated by semicolons, they are treated like the contents of a block, and that block becomes the top-level expression. The last expression may be optionally followed by a semicolon.
Arbitrary amounts of whitespace are allowed between tokens. One-line comments starting with #
or //
are supported.
These are the main parts of the implementation stages/progress of the compiler.
- Tokenizer
- Basic tokenization
- Basic test cases
- Edge test cases
- Negative test cases
- Parser
- Integer literal
- Identifiers
- Boolean literal
- If then else blocks
- Comparison operators (=, ==, !=, <=, >=, >, <, and, or)
- While block
- Function call
- Type declaration
- Interpreter - This is an optional part of the compiler which is done for learning
- Basic recursion
- Symbol table
- All operators
- Function call
- Conditional block
- While block
- Type Checker
- Positive test cases
- Negative test cases
- IR Generator
- Integer literal
- Identifiers
- Boolean literal
- If then else blocks
- Comparison operators (=, ==, !=, <=, >=, >, <, and, or)
- While block
- Function call
- Type declaration
- Assembly Generator
- Analysis & Optimization