Simple Scheme compiler. Trying to mostly mimic R5RS.
This repository contains the code for my bachelor thesis about a simple compiler for Scheme. The compiler is written in Haskell and makes a lot use of Algebraic Data Types for representing the AST in memory.
It depends on the scheme runtime I have written: https://github.com/teichholz/Schemer-Runtime
The compiler uses multiple phases to break the code down into a core language which is easy to compile into LLVM IR:
- Parse S-Expressions: Parses the scheme code into generic symbolic expressions. Symbolic expressions are just expressions in polish notation, so I represent them like that in form of arbitrary deep nested lists. Can be found under: src/Sexp/Parser.hs.
- Expand Macros: Next I expand Macros. Note that macros as defined in R5RS are not as sophisticated as they are today in the most used LISPs.
- Parse into AST: Parses the macro expanded symbolic expressions into an AST which allows to identify all the relevant expressions like function application or function definition. Can be found under: src/Parser/ScSyn.hs
- Toplevel Transformation: Removes all toplevel declarations of functions and variables. Uses a recursive let-binding to bind them into a single expressions which is further simplified in the next steps. Can be found under src/Phases/Toplevel.hs
- Simplication: Simplifes certain scheme expressions. Can be found under src/Phases/Simplify.hs
- Mutation Removal: Removes all the places where variables are mutated by instead wrapping these variables in a one element vector. Can be found under src/Phases/Assignment.hs
- Administrative Normal Form: Converts code into ANF. Can be found under src/Phases/ANF.hs
- Continuation Passing Style: Converts code into CPS. Can be found under src/Phases/CPS.hs
- Unification of functions and function application: Removes all uses of apply by enforcing all functions to expect a single list as an argument. Can be found src/Phases/Unify.hs
- Closure Conversion: Use Closure Conversion to eliminate all free variables. Can be found under src/Phases/Closure.hs
- Code Generation: Emits LLVM IR for the current AST. At this phase the core language consists of: literals, let-forms and variable access, if-expressions and function application. Function application differs between calling primitive functions and calling self-declared functions. Can be found under: src/Phases/Codegen.hs
To run the compiler, make sure that nix is installed. Then first clone this repo:
git clone https://github.com/teichholz/Schemer && cd Schemer
After that you can run:
nix-shell && stack run -- -v
Which will throw you into a repl with verbose output enabled.