This repository contains code that implements static analysis techniques for programs in an intermediate language (LIR) which look similar to C. The input to each analysis is usually an LIR program (plus few other inputs based on the analysis) and the output is printed to stdout. The following static analyses techniques are implemented:
- Constant analysis
- Reaching definitions
- Control Flow Analysis
- Andersen-style pointer analysis constraint generator
- Andersen-style pointer analysis constraint solver
- Program Slicing
- Taint Analysis - Includes context-insensitive taint analysis, k-limited callstring taint analysis and functional context sensitive taint analysis
- cmake
In the root directory of the repository, run the following commands
cmake .
make
-
a LIR program is:
- a set of struct types
- a set of global variables
- a set of extern function declarations
- a set of function definitions
-
a type is one of:
int
(an integer)- a struct type (identified by its name)
- a function type
- a pointer to a type
-
a struct type is a name and a non-empty list of field names and types.
- these are just like C-style structs.
-
a global variable is a name and a type.
-
an extern function declaration is a name and a function type.
- an extern function is one that is defined externally to the current program, i.e., we know its name and type signature and can call it, but we don't have access to its source code.
-
a function definition is:
- a name
- an ordered list of parameters (with names and types)
- an optional return type
- a set of local variables (with names and types)
- a set of basic blocks
-
a basic block is a label and an ordered list of instructions, ending in a single terminal.
-
an instruction is one of:
-
address-of:
x = $addrof y
assigns the address ofy
tox
. -
memory allocation:
x = $alloc 10 [_a1]
allocates 10 items (what type of items depends on the type of pointer variablex
) and assigns the newly allocated address tox
. the bracketed identifier (_a1
in this example) is a convenient label for the allocation that we will use during some of our analyses. -
integer arithmetic:
x = $arith add y 2
adds 2 to the value ofy
and assigns the result tox
; similarly for the other arithmetic operatorssub
,mul
,div
. -
integer and pointer comparison
x = $cmp eq y 2
comparesy
to 2 and assigns 0 (false) or 1 (true) tox
; similarly for the other relational operatorsneq
,lt
,lte
,gt
,gte
. -
copy:
x = $copy y
copies the value ofy
tox
. -
get-element-pointer:
x = $gep y 10
takesy
(which is a pointer to an array of elements) and assigns tox
the address of the 10th element of the array. this is the only way to do pointer arithmetic. -
get-field-pointer:
x = $gfp y foo
takesy
(which is a pointer to a struct) and assigns tox
the address of thefoo
field of the struct. the only way to access fields of a struct is via a pointer. -
load from memory:
x = $load y
loads the value pointed to byy
and assigns it tox
. -
store to memory:
$store x y
stores the value ofy
to the location pointed to byx
. -
call external function:
x = $call_ext foo(10)
calls the extern functionfoo
with argument 10 and assigns the returned value tox
. if the extern function does not return a value then there is no assignment, e.g.,$call_ext foo(10)
.
-
-
a terminal signals the end of a basic block and is one of:
-
conditional branch:
$branch x bb1 bb2
will either jump to basic blockbb1
(if the value ofx
is non-zero) orbb2
(if the value ofx
is 0). -
unconditional jump:
$jump bb1
will always jump to basic blockbb1
. -
return from function:
$ret x
will return from the current function with the value ofx
. if the current function does not return a value then it would just be$ret
. -
direct function call:
x = $call_dir foo(10) then bb1
calls internally-defined functionfoo
passing the argument 10, assigns the returned value tox
, then jumps to basic blockbb1
. iffoo
does not return a value then there is no assignment, e.g.,$call_dir foo(10) then bb1
. -
indirect function call:
x = $call_idr fp(10) then bb1
calls an internally-defined function (which one depends on the value of the function pointer variablefp
) passing the argument 10, assigns the returned value tox
, then jumps to basic blockbb1
. if the called function does not return a value then there is no assignment, e.g.,$call_idr fp(10) then bb1
.
-
-
validity requirements: in order to be a valid LIR program, the following requirements must be met:
-
the program is well-typed.
-
the program contains a function
main : () -> int
, which is the entry point to the program. -
every struct type has at least one field.
-
an extern function's name cannot be the same as an internally-defined function's name.
-
for each function there must be exactly one basic block labeled
entry
, which is the entry point to the function. -
for each function there must be exactly one basic block ending in a
$ret
(return) terminal. -
all basic blocks in a function must be reachable from
entry
and all basic blocks must reach the basic block containing the$ret
terminal. -
every
$alloc
instruction has a unique bracketed identifier. -
the
main
function cannot be called. -
if a global variable has the same name as an internally-defined function then that variable is a function pointer to the associated function and has the appropriate type (but there cannot be a global variable named
main
).
-
struct bar {
f1: int
f2: &int
}
// global pointer to function foo.
foo: &(&int) -> int
// externally-defined function func.
extern func: (int) -> int
fn main() -> int {
entry:
$call_dir baz(42) then exit
exit:
$ret 0
}
fn baz(p:int) -> _ {
let a:int, b:&int, c:&bar, d:&&int, fp:&(&int) -> int
entry:
a = $copy p
b = $addrof a
$store b 42
c = $alloc 42 [_1]
c = $gep c 12
d = $gfp c f2
a = $cmp lte a 3
$branch a bb2 bb3
bb2:
b = $load d
a = $call_dir foo(b) then bb4
bb3:
fp = $copy foo
a = $call_idr fp(b) then bb4
bb4:
p = $arith add p 1
a = $arith mul a p
a = $call_ext bar(a)
$jump exit
exit:
$ret
}
fn foo(p:&int) -> int {
let r:int
entry:
r = $load p
$ret r
}
The file at headers/datatypes.h contains code for converting an LIR program in JSON format into an iterable data structure. This allows us to easily iterate over basic blocks, functions, instructions, structs, etc required during program analysis.