Computers speak only 1s and 0s
Yeah... We've all heard that sentence at least once, but have you ever wondered how computers actually read and execute your code?
In this article we are going to take a look at how compilers work.
But first of all, what is a compiler?
A compiler is just a program, that takes the source code as input, and return an executable as output.
The first phase of the compilation is lexing.
The lexer will take all the source code, and convert it into a stream of abstract unities that our computer can understand, we generally call these tokens.
A token generally holds a type and a value.
But let's make an example:
var a = 4 + 5
will be converted to:
[VAR, IDENTIFIER:"a", EQUAL, INT:"4", PLUS, INT:"5"]
Perfect! Now that we have our tokens, we can pass to the next phase.
The job of the parser or the syntax analyzer is to analyze the tokens and generate an Abstract Syntax Tree (AST).
A data structure that describes the structure of our program and, for example, can be used to evaluate expressions.
For example, the AST generated from this expression:
4 + 5 / 2 - 7
will be this:
+
/ \
/ \
4 -
/ \
/ \
รท 7
/ \
/ \
5 2
The parses tries match some predefined patterns, that are usually described in the grammar of our language.
We won't dive into this topic, but if you're interested this is a great article.
But what if the tokens don't match the grammar?
Well... We get a SyntaxError
.
Parsers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation) divides parsing into two types : top-down parsing and bottom-up parsing.
When the parser starts constructing the parse tree from the start symbol and then tries to transform the start symbol to the input, it is called top-down parsing.
It is a common form of top-down parsing. It is called recursive as it uses recursive procedures to process the input. Recursive descent parsing suffers from backtracking.
It means, if one derivation of a production fails, the parser restarts the process using different rules of same production. This technique may process the input string more than once to determine the right production.
As the name suggests, bottom-up parsing starts with the input symbols and tries to construct the parse tree up to the start symbol.
Probably the most important part.
Now that we have our AST we can walk through it and generate equivalent Assembly code.
An interpreter works pretty much the same, but instead, it directly executes the code.
We got to the end... Now all that remains is to execute the code generated by our compiler and that's it!
Happy debugging,
Kappa, Sk9 | Engrafa team