FCC is a minimal and pedagogical Forth compiler written in a single C file that generates assembly code for the FASM assembler. It supports a Forth-like syntax and serves as both a learning tool for compiler construction and a functional compiler for programs.
- Author: Chris Curl
- License: MIT License (c) 2025
- Language: C
- Target: 32-bit x86 Assembly (FASM format)
- fcc.c: The Forth compiler source code
- binaries/fcl: The Forth compiler for Linux
- binaries/fcw.exe: The Forth compiler for Windows
The compiler follows a streamlined three-phase approach:
- IRL Generation - Parse source and generate Intermediate Representation Language (IRL)
- Iterative Optimization - Repeatedly perform peephole optimizations until no changes
- Code Generation - Output platform-specific assembly code (Linux/Windows)
- There is a solution file
fcc.sln
. - This is a 32-bit system, so only the 32-bit configuration is supported.
- It makes a program named
fcw.exe
. fcw.exe
is the Forth Compiler for Windows.
- There is a Makefile.
- This is a 32-bit system, so only the 32-bit configuration is supported.
make
creates a program namedfcl
.fcl
is the Forth Compiler for Linux.
- Run
fcw
orfcl
depending on if you are running Windows or Linux. - The programs take a single parameter, the name of a source file.
- The programs write the generated source to stdout.
- Any errors detected are written to stderr.
- Redirect the output into a file (e.g. -
fcl pgm.fh > pgm.asm
). - Execute
fasm
using that file for input (e.g. -fasm pgm.asm
).
- Linux: see the Makefile for the 'test' target.
- Windows: see the 'make.bat' file.
#define VARS_SZ 500 // Maximum number of variables/symbols
#define STRS_SZ 500 // Maximum number of string literals
#define LOCS_SZ 500 // Size of local storage array
#define CODE_SZ 5000 // Maximum number of IRL instructions
#define HEAP_SZ 5000 // Maximum number of characters in the HEAP
typedef struct {
char type; // 'I'=Integer, 'F'=Function, 'T'=Target
char name[23]; // Symbol name
char asmName[8]; // Generated assembly name
int sz; // Size in bytes
char *str; // String pointer
} SYM_T;
next_ch()
- Advances to next character, handles line reading and EOFnext_line()
- Reads next line from input filenext_token()
- Extracts next token, handles comments (//
) and numbers
checkNumber(char *w, int base)
- Parses numbers in multiple bases:- Binary:
%1010
(prefix%
) - Decimal:
#123
or123
(prefix#
or none) - Hexadecimal:
$FF
(prefix$
) - Character literals:
'Y'
(single quotes) - Supports negative numbers with
-
prefix
- Binary:
findSymbol(char *name, char type)
- Locates symbol by name and typeaddSymbol(char *name, char type)
- Adds new symbol to table
The compiler uses an internal instruction set:
Stack Operations:
PUSHA
,POPA
- Push/pop accumulatorSWAP
,SP4
- Stack manipulationPOPB
- Pop to second register
Memory Operations:
STORE
,FETCH
- 32-bit memory store/loadCSTORE
,CFETCH
- 8-bit (byte) memory store/loadLOADSTR
- Load string address
Arithmetic:
ADD
,SUB
,MULT
,DIVIDE
- Basic arithmeticDIVMOD
- Division with both quotient and remainderLT
,GT
,EQ
,NEQ
- ComparisonsAND
,OR
,XOR
- Bitwise operations
Control Flow:
TESTA
- Test accumulator against zeroJMP
,JMPZ
,JMPNZ
- Conditional/unconditional jumpsTARGET
- Jump target labelsDEF
,CALL
,RETURN
- Function definition and calls
Register and Pointer Operations:
MOVAB
,MOVAC
,MOVAD
- Copy accumulator to EBX, ECX, EDXADDEDI
,SUBEDI
- Add/subtract constant to EDI (pointer arithmetic)EDIOFF
- Load EDI+offset into EAXSYS
- System call interrupt
A-Register Operations:
AFET
,ASTO
- Fetch from/store to A register variableAINC
,ADEC
- Increment/decrement A register variable
Special:
LIT
- Literal valuesPLEQ
- Plus-store operation (+!
)INCTOS
,DECTOS
- Increment/decrement top of stackCODE
- Embed straight FASM code into assembly file
Variables:
var myVar // Declare variable (default size 1 DWORD)
var buf 100 allot // Declare variable with size 100 DWORDs
Functions:
: myFunc // Function definition
42 myVar ! // Store 42 in myVar
; // End function
Control Structures:
condition if // Conditional execution
// code
then
begin // Loops
// code
condition
while // While loop
again // Infinite loop
until // Until loop
Stack Operations:
42 // Push literal
dup // Duplicate TOS
drop // Remove TOS
swap // Swap TOS and NOS
over // Copy second to top
Memory Operations:
@ // Fetch 32-bit value from address
! // Store 32-bit value to address
c@ // Fetch 8-bit (byte) value from address
c! // Store 8-bit (byte) value to address
+! // Add to memory location
1+ 1- // Increment/decrement TOS
Register, Locals, and System Operations:
->reg1 // Copy TOS to EAX (no-op, EAX is TOS)
->reg2 // Copy TOS to EBX
->reg3 // Copy TOS to ECX
->reg4 // Copy TOS to EDX
sys // Execute system call (INT 0x80)
+locs // Add 24 to EDI (allocate 6 locals)
-locs // Subtract 24 from EDI (free last 6 locals)
l0..l5 // Push addr of local #x to the stack
A-Register Operations:
a@ // Fetch value from A register variable
a! // Store value to A register variable
a+ // Increment A register variable
a- // Decrement A register variable
String Literals:
s" Hello" // Push string address to stack
Arithmetic and Logic:
+ - * / // Basic arithmetic
/mod // Division with quotient and remainder
< = <> > // Comparisons
AND OR XOR // Bitwise operations
Source Code Comments:
// // Comment until the end of the line
( ... ) // In-line comment
Inline Assembly Code:
: bye code
xor ebx, ebx
mov eax, 1
int 0x80
end-code
;
Linux (32-bit):
- ELF executable format
- No external library dependencies
- Direct system calls via
sys
command orcode
- Custom function call convention using EBP stack
Windows (32-bit):
- PE executable format
- Windows API integration
- Built-in console output support
Common Features:
- Enhanced optimization with iterative peephole passes
- Uses EDI to point to a
locs
array for local storage - A-register variable for quick access operations
- Syntax errors show line number, column, and source context
- Fatal errors terminate compilation
- Warnings are displayed as comments in output
: c@a+ a@ c@ a+ ;
// strlen: n = length of string at address a
: strlen ( a--n )
+locs a@ l0 ! a!
0 begin 1+ c@a+ while
1- l0 @ a! -locs ;
var (counter)
: counter ( --n ) (counter) @ ;
: counter! (n -- ) (counter) ! ;
: increment counter 1+ counter! ;
var (limit)
: limit ( --n ) (limit) @ ;
: limit! (n -- ) (limit) ! ;
: mil ( n--m ) 1000 dup * * ;
: main
0 counter!
1 mil limit!
begin
increment counter limit >
until
// Program complete
;
- Input Processing - Read source file (error if no argument provided)
- IRL Generation - Parse declarations and generate intermediate representation
- Iterative Optimization - Repeatedly perform peephole optimizations until no changes
- Code Generation - Output assembly with startup code and runtime support
- Symbol Output - Generate variable declarations with proper sizing in data section
- No built-in I/O functions (must use system calls via
sys
) - Limited error checking and recovery (errors output to stderr)
- No floating-point support
- Fixed-size tables and heap
else
clause not yet implemented
- Byte and word memory access (
c@
,c!
,@
,!
) - Direct system call support via register operations
- Pointer arithmetic and local array access via EDI and
locs
- Variable-sized variable declarations with
allot
- A-register variable for optimized frequent access
- Multi-base number literals (binary, decimal, hex, character)
- Iterative optimization passes for better code generation
- Compact, single-file, self-contained compiler
- Clean separation of IRL generation and code emission
- Stack-based execution model with register and pointer access
- Enhanced error reporting with stderr output
This compiler serves as an example of a minimal but functional compiler implementation, demonstrating core compiler concepts in a clear and understandable way.