Yo-ho-ho and the Compilation of Rum!
ITMO compilers course project: Implementation of Compiler for Rum language.
- Compiler
To run on your local machine follow the next steps:
To build the Compiler using stack try next command:
stack build
Then run the .rum
file with the option you need
stack exec Compiler -- -flag file_name.rum
where next flags are available
-i
— to interpret the program fromfile_name.rum
-s
— to compile the program fromfile_name.rum
into the presentation of the stack machine with the further interpretation of the stack machine-c
— to compile the program fromfile_name.rum
into executablefile_name
Consider next requirements
Rum is an artificial dynamically typed programming language, which was developed in learning purposes.
Simple program example
fun fact(n : int) : int
begin
if n < 2
then return 1
else return n * fact(n-1)
fi
end
n := read();
write(fact(n))
For the further examples let's assume
S := "Yo-ho-ho, and the Bottle of Rum!"
Next features were implemented for the Rum language.
int
— integerchar
— symbolstr
— string (reference type)arr
— array (reference type)void
— void typebool
— boolean (false
andtrue
are represented as0
and1
respectively, the integer not equal to0
is considered astrue
)
All variable names must begin with a letter of the alphabet or an underscore _
or dollar sign $
. After the first initial letter, variable names can also contain letters, numbers and one of these symbols: _
, $
.
Keywords can not be used as variable names. In Rum language these words are reserved: skip
, if
, then
, else
, fi
, repeat
, until
, do
, od
, while
, for
, fun
, begin
, end
, return
, true
, false
.
Note: All references are considered to start with capital letter when all other variables from with lowercase.
- Arithmetic :
+
,-
,*
,/
,%
,^
- Logical :
==
,!=
,<=
,<
,>=
,>
- Comparison :
&&
,||
- Assignment :
:=
- Unary :
-
- Void :
skip
Rum language supports different kind of expressions with consideration of precedence and associativity
- Variables — local and global variables support.
- Operations
- Functions call
can be expression and require to evaluate the value to use it in statements
S := strmake(5, 'm')
Precedence | Operators | Operator Name | Associativity |
---|---|---|---|
0 | () , [] |
Parentheses | left |
1 | - |
Unary minus | prefix |
2 | ^ |
Power | right |
3 | * , \ , % |
Arithmetic | left |
4 | + , - |
Plus, minus | left |
5 | == , != , <= , < , >= , > |
Logical | none |
6 | && |
And | left |
7 | || , !! |
Or | left |
8 | := |
Assignment | right |
Each statement is separated by ;
except the last one in the program, where the program is finished list of statements.
Rum implements next types of statements:
- Assignment
x := 911; y := read(); z := x && (y * 2 - 5 ^ 1); ar := {1, 2, 3}
- Function call
— can be separated statement also, e.g.
write(42)
- Void statement
skip
- Conditional statement
— several kinds of
if
statements are supportedalso alternative ending supportedif strcmp(Bottle, "Rum") == 0 then yell("Yo-ho-ho!") fi
and unlimited nesting level of alternative withif strcmp(Bottle, "Rum") == 0 then yell("Yo-ho-ho!") else Mood := "sad" fi
elif
if strcmp(Bottle, "Rum") == 0 then yell("Yo-ho-ho!") elif strcmp(Language, "Rum") == 0 then yell("Yo-ho-wait.."); Mood := "exited" else Heart := "broken" fi
- Loops
- Repeat Until
repeat yell("Yo-ho-ho!"); drink(); rumQuantity := rumQuantity - tumbler until rumQuantity == 0;
- While
while (n != 0) do sing(strcat(n, " men on the dead man's chest")); n := n - 1; sing("Yo-ho-ho, and the Bottle of Rum") od
- For
sum := 0; for i := 1, i <= 100, i := i + 1 do sum := sum + 1 od
- Function — all function declarations should be done in the beginning of the file. Important not to separate functions with any special symbols.
fun lang(L : str) : str begin if (strcmp(L, "Rum") == 0) then return "☠️" else return "😞" end
- Return
Rumlude
is the library of standard functions of the language. RumludeFuns
is the special data type for representing them during parsing to AST.
In the Compiler.Rum.Internal.Rumlude
module there is a map from string to RumludeFuns
type, so it can be used at parsing stage. The main function of the module is runRumlude
which takes as arguments RumludeFuns
representative and the list of the parameters, that are going to be arguments of the given function, and returns the result of the work of the standard function.
Here is the list of functions that are based in module with usage examples:
-
read()
— reads integer from standard input
-
write(n : int)
— prints integer
n
into standard output. -
strlen(S : str) : int
— returns length of the given string
s
write(strlen(S)) > 32
-
strget(S : str, i : int) : int
— returns the symbol of the string on the given positioni
converted intoint
write(strget(S, 1)) > 111 -- letter 'o'
-
strsub(S : str, start : int, len: int) : str
— returns string, which is build by takinglen
symbols starting fromstart
positionD := strsub(S, 15, 18) > the Bottle of Rum!
-
strdup(S : str) : str
— returns the duplicate of the stringD := strdup(S) > Yo-ho-ho, and the Bottle of Rum!
-
strset(S: str, i : int, ch : char) : str
— replacesi
-th symbol in the stringS
with symbolch
strset(S, 1, 'u') > Yu-ho-ho, and the Bottle of Rum!
-
strcat(S1 : str, S2 : str) : str
— returns the result of concatenation of two stringsstrcat("Yo-ho-ho, ", "and the Bottle of Rum!") > Yo-ho-ho, and the Bottle of Rum!
-
strcmp(S : str, D : str) : int
— returns the result of lexical comparison of two strings. The result is equals to0
if stringsS
andD
are equal,-1
ifS
is less thanD
and1
in other casewrite(strcmp(S, D)); write(strcmp(S, "Yo-ho-ho")) write(strcmp(S, "Yu-ho-ho")) write(strcmp(S, "Ya-ho-ho")) > 0 > -1 > -1 > 1
-
strmake(n : int, ch : char) : str
— builds the string of lengthn
fulfilled withch
symbolstrmake(3, 'm') > mmm
-
arrlen(: arr) : int
— returns length of the given array -
arrmake(n : int, e : int) : arr
— builds unboxed array of the lengthn
fulfilled with the integere
-
Arrmake(n : int, e : int) : arr
— builds boxed array of the lengthn
fulfilled with the integere
For parsing part of the project monadic parser combinator library MegaParsec.
Source files are in the Compiler.Rum.Internal.Parser
module of the project.
The input is the file and the result of the work of the parser is AST.
In case of parsing error the message with additional information (error row and column coordinates) will be shown.
(TODO: make beautiful output of errors)
This module gives an idea about the model of the language. Data types and structure of the program can be found in Compiler.Rum.Internal.AST
module. Almost all of the types were described before, so here will be the brief enumeration
- Expression
- Statement
- Program — the list of statements
Different types of environment are used in interpreter and stack machine.
Interpret a
newtype was described in Compiler.Rum.Internal.AST
which is monad transformer StateT
.
newtype Interpret a =
Interpret { runInterpret :: StateT Environment (MaybeT IO) a}
data Environment =
Env { varEnv :: VarEnv -- to hold vars
, refVarEnv :: RefVarEnv -- to hold reference vars
, funEnv :: FunEnv -- to hold custom functions
, isReturn :: Bool -- necessary for funs declarations
}
For Stack Machine's Environment there is special type in Compiler.Rum.StackMachine.Structure
type InterpretStack =
ReaderT ([Instruction], Labels) (StateT StackEnvironment IO) ()
data StackEnvironment =
SEnv { vars :: RefSVarEnv -- store by reference
, stack :: Stack
, pos :: Int -- current position at Stack
}
Interpreter receives Program
data type and returns evaluated result.
All source files are in the Compiler.Rum.Interpreter
directory. In Rummer
module all evaluation stuff for both expressions and statements take place. interpret
function works under Interpret
environment and this is main working function for interpreter.
Work of this part can be separated into two steps.
Translation of Program
into readable by Stack Machine list of Instruction
.
Here are permitted instructions
Nop
— No operationSBinOp
— Binary operation signSLogicOp
— Binary operation signSCompOp
— Binary operation signPush
— Push a value onto the stackPop
— Pop the most recent value from the stackJump
— Jump unconditionally to a locationJumpIfTrue
— Jump if conditions are metJumpIfFalse
— Jump if conditions are not metLoad
— Load ordinary variableLoadRef
— Load reference variableStore
— Put VariableLabel
— Location coordinatesSFunCall
— Call a functionSRumludeCall
— Call function from standardSReturn
— Return from a functionPushNArr
— Put array on stackLoadArr
— Load arrayStoreArr
— Store array
Compiler.Rum.StackMachine.Stacker
module receives set of Instruction
from work of the Translator
and transforms data into result.
Main function of the module is stacker
. It finds the label of starting position in the instructions and start execute them one by one.
And finally compilation. Implementation of the Rum language was made with LLVM. Special thanks for Stephen Diehl's work which explains the basics of how to build a compiler in Haskell. For the implementation these two packages are required:
llvm-hs-pure
— pure Haskell representation of the LLVM Intermediate Representation (IR).llvm-hs
— FFI bindings to LLVM required for constructing the C representation of the LLVM IR and performing optimization and compilation.
Few consistent steps should be done, and the first one is code generation of LLVM IR from the AST. All necessary functions for code generation are located in Compiler.Rum.Compiler.CodeGen
module and the transformation process is realized in the Compiler.Rum.Compiler.Emmiter
module of the project. Compiler.Rum.Compiler.JIT
module contains of optimizations for the language.
There is a bunch of different kind of tests in compiler-tests
submodule. Each folder consist of different type of tests:
core
— number of tests to cover all basic features of the languageexpressions
&deep-expressions
— huge amount of test basically for arithmetic expressions and IOperformance
— tests to compare Compiler efficiency withgcc
.
cd
to the directory of the test which have to be run and use the next command:
make
This will run tests in the -i
, -s
and -c
mode.
For example,
cd compiler-tests/core && make
This project is licensed under the MIT License - see the LICENSE.md file for details
Stephen Diehl article "Implementing a JIT Compiled Language with Haskell and LLVM"