Skip to content

Latest commit

 

History

History
90 lines (66 loc) · 3.04 KB

compilers.md

File metadata and controls

90 lines (66 loc) · 3.04 KB

How do compilers work?

Computers speak only 1s and 0s

Yeah... We've all heard that sentence at least once, but have you ever wondered how computers actually read and execute your code?
In this article we are going to take a look at how compilers work.

What is a compiler?

But first of all, what is a compiler?
A compiler is just a program, that takes the source code as input, and return an executable as output.

Lexing

The first phase of the compilation is lexing.
The lexer will take all the source code, and convert it into a stream of abstract unities that our computer can understand, we generally call these tokens.
A token generally holds a type and a value.

But let's make an example:

var a = 4 + 5

will be converted to:

[VAR, IDENTIFIER:"a", EQUAL, INT:"4", PLUS, INT:"5"]

Perfect! Now that we have our tokens, we can pass to the next phase.

Parsing

The job of the parser or the syntax analyzer is to analyze the tokens and generate an Abstract Syntax Tree (AST).
A data structure that describes the structure of our program and, for example, can be used to evaluate expressions.

For example, the AST generated from this expression:

4 + 5 / 2 - 7

will be this:

    +
   / \
  /   \
 4     -
      / \
     /   \
    รท     7
   / \   
  /   \
 5     2

The parses tries match some predefined patterns, that are usually described in the grammar of our language.
We won't dive into this topic, but if you're interested this is a great article.

But what if the tokens don't match the grammar?
Well... We get a SyntaxError.

Types of Parsing

Parsers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation) divides parsing into two types : top-down parsing and bottom-up parsing.

Top-Down Parsing

When the parser starts constructing the parse tree from the start symbol and then tries to transform the start symbol to the input, it is called top-down parsing.

Recursive descent pairing

It is a common form of top-down parsing. It is called recursive as it uses recursive procedures to process the input. Recursive descent parsing suffers from backtracking.

Backtracking

It means, if one derivation of a production fails, the parser restarts the process using different rules of same production. This technique may process the input string more than once to determine the right production.

Bottom-Up Parsing

As the name suggests, bottom-up parsing starts with the input symbols and tries to construct the parse tree up to the start symbol.

Compiling

Probably the most important part.
Now that we have our AST we can walk through it and generate equivalent Assembly code.
An interpreter works pretty much the same, but instead, it directly executes the code.

Thanks for the attention

We got to the end... Now all that remains is to execute the code generated by our compiler and that's it!

Happy debugging,
Kappa, Sk9 | Engrafa team