-
Notifications
You must be signed in to change notification settings - Fork 438
Analyze a Simple C Program
We use an example to go through each component of SVF: (1) Memory Model including PAG and Constraint Graph, (2) Pointer Analysis including both flow-insensitive and flow-sensitive analyses, and (3) Value-Flow Construction.
void swap(char **p, char **q){
char* t = *p;
*p = *q;
*q = t;
}
int main(){
char a1, b1;
char *a = &a1;
char *b = &b1;
swap(&a,&b);
}
2. LLVM IR after the mem2Reg option is turned on (A project or a C file can also be compiled to generate bc using wllvm)
clang -S -c -Xclang -disable-O0-optnone -fno-discard-value-names -emit-llvm swap.c -o swap.ll
opt -S -mem2reg swap.ll -o swap.ll
define void @swap(i8** %p, i8** %q) #0 {
entry:
%0 = load i8** %p, align 8
%1 = load i8** %q, align 8
store i8* %1, i8** %p, align 8
store i8* %0, i8** %q, align 8
ret void
}
define i32 @main() #0 {
entry:
%a1 = alloca i8, align 1
%b1 = alloca i8, align 1
%a = alloca i8*, align 8
%b = alloca i8*, align 8
store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8
call void @swap(i8** %a, i8** %b)
ret i32 0
}
wpa -ander -dump-callgraph swap.ll
wpa -type -dump-icfg swap.ll
wpa -nander -dump-pag swap.ll
The different kinds of PAG nodes are denoted by Graphviz shapes. Here is the PAG node type hierarchy and the corresponding shapes:
- PAGNode, the abstract base class
-
ValPN, pointer value base class, shape=box
- GepValPN, getelementptr node, represents an offset from a base pointer, shape=hexagon
- DummyValPN, used for special nodes not associated with an LLVM value, shape=diamond
-
ObjPN, base class of memory objects, shape=component
- GepObjPN, represents memory at an offset (a field or array element), shape=doubleoctagon
- FIObjPN, field-insensitive object, shape=box3d
- DummyObjPN, special memory, shape=tab
- RetPN, for unified function return, shape=Mrecord (box with rounded corners)
- VarArgPN, represents the variable parameters in a call, shape=octagon
-
ValPN, pointer value base class, shape=box
An edge between two nodes represents a constraint:
- a green edge for a memory allocation (AddrPE),
- a blue edge for a store (StorePE),
- a red edge for a load (LoadPE),
- a black edge for a copy (CopyPE),
- a purple edge for a getelementptr (GepPE),
- a grey edge for a comparison, binary, or unary operation (CmpPE, BinaryOPPE, UnaryOPPE),
- a turquoise edge for a thread fork or join (TDForkPE, TDJoinPE),
- a dashed black edge for parameter passing (CallPE).
- a dotted black edge for a function return value (CallPE).
wpa -nander -dump-pag -dump-constraint-graph swap.ll
A Constraint Graph is used when Andersen's flow-insensitive analysis is performed. The following rules are used in resolving the constraints in the program during such an inclusion-based pointer analysis:
During the analysis, new copy edges (solid black arrows) will be often added incrementally to the constraint graph until a fixed point has been reached. The final constraint graph is:
The nodes of the constraint graph are in one-to-one correspondence with the PAG nodes and use the same shape. The graph shown here is the brief version. [Coming soon: If -brief-constraint-graph=false is specified then the nodes will also show the instructions.]
wpa -nander -print-pts swap.ll
NodeID 4 PointsTo: { 5 }
NodeID 7 PointsTo: { 22 }
NodeID 8 PointsTo: { 24 }
NodeID 9 PointsTo: { 18 20 }
NodeID 10 PointsTo: { 18 20 }
NodeID 14 PointsTo: { 15 }
NodeID 17 PointsTo: { 18 }
NodeID 19 PointsTo: { 20 }
NodeID 21 PointsTo: { 22 }
NodeID 23 PointsTo: { 24 }
wpa -ander -stat swap.ll
****Andersen Pointer Analysis Statistics****
################ (program : )###############
TotalPointers 18
TotalObjects 8
TotalFieldObjects 6
MaxStructSize 0
TotalEdges 30
FunctionObjs 2
GlobalObjs 0
HeapObjs 0
StackObjs 4
FIObjNum 0
FSObjNum 6
VarStructObj 0
VarArrayObj 0
ConstStructObj 0
ConstArrayObj 0
NonPtrObj 4
AddrsNum 6
LoadsNum 2
StoresNum 4
CopysNum 1
GepsNum 0
CallsNum 2
ReturnsNum 0
IndCallSites 0
LocalVarInRecur 0
BitCastNumber 0
BBWith2Succ 0
BBWith3Succ 0
-------------------------------------------------------
CollapseTime 0
TotalTime 0.001
SCCDetectTime 0.001
SCCMergeTime 0
LoadStoreTime 0
CopyGepTime 0
UpdateCGTime 0
AvgPtsSetSize 0.533333
AvgTopLvlPtsSize 1.2
CGNodeNum 27
PointsToConstPtr 0
PointsToBlkPtr 0
TotalPointers 18
TotalObjects 14
TotalEdges 11
AddrsNum 4
LoadsNum 2
StoresNum 4
CopysNum 5
GepsNum 0
AddrProcessed 6
LoadProcessed 2
StoreProcessed 4
CopyProcessed 8
GepProcessed 0
Pointers 18
DYFieldPtrs 0
MemObjects 8
DYFieldObjs 6
MaxPtsSetSize 2
Iterations 2
IndCallSites 0
IndEdgeSolved 0
NumOfSCCDetect 2
TotalCycleNum 1
TotalPWCCycleNum 0
NodesInCycles 4
MaxNodesInSCC 4
NullPointer 0
#######################################################
wpa -fspta -print-pts swap.ll
NodeID 4 PointsTo: { 5 }
NodeID 7 PointsTo: { 22 }
NodeID 8 PointsTo: { 24 }
NodeID 9 PointsTo: { 18 }
NodeID 10 PointsTo: { 20 }
NodeID 14 PointsTo: { 15 }
NodeID 17 PointsTo: { 18 }
NodeID 19 PointsTo: { 20 }
NodeID 21 PointsTo: { 22 }
NodeID 23 PointsTo: { 24 }
wpa -fspta -stat swap.ll
****Flow-Sensitive Pointer Analysis Statistics****
################ (program : )###############
TotalPointers 18
TotalObjects 8
TotalFieldObjects 6
MaxStructSize 0
TotalEdges 30
FunctionObjs 2
GlobalObjs 0
HeapObjs 0
StackObjs 4
FIObjNum 0
FSObjNum 6
VarStructObj 0
VarArrayObj 0
ConstStructObj 0
ConstArrayObj 0
NonPtrObj 4
AddrsNum 6
LoadsNum 2
StoresNum 4
CopysNum 1
GepsNum 0
CallsNum 2
ReturnsNum 0
IndCallSites 0
LocalVarInRecur 0
BitCastNumber 0
BBWith2Succ 0
BBWith3Succ 0
-------------------------------------------------------
SolveTime 0
SCCTime 0
ProcessTime 0
PropagationTime 0
DirectPropaTime 0
IndirectPropaTime 0
UpdateTime 0
AddrTime 0
CopyGepTime 0
LoadTime 0
StoreTime 0
UpdateCGTime 0
AvgPtsSize 1
AvgTopLvlPtsSize 1
AvgAddrTakenVarPts 1
AvgINPtsSize 1
AvgOUTPtsSize 1
AverageSCCSize 0
TotalTime 0
PointsToConstPtr 0
PointsToBlkPtr 0
StrongUpdates 4
SNodesHaveIN 6
SNodesHaveOUT 4
FI_SNodesHaveIN 0
FI_SNodesHaveOUT 0
FO_SNodesHaveIN 0
FO_SNodesHaveOUT 0
AI_SNodesHaveIN 0
AI_SNodesHaveOUT 0
AO_SNodesHaveIN 2
AO_SNodesHaveOUT 0
LD_SNodesHaveIN 2
LD_SNodesHaveOUT 0
ST_SNodesHaveIN 2
ST_SNodesHaveOUT 4
PHI_SNodesHaveIN 0
PHI_SNodesHaveOUT 0
VarHaveIN 6
VarHaveOUT 4
VarHaveIN_FI 0
VarHaveOUT_FI 0
VarHaveIN_FO 0
VarHaveOUT_FO 0
VarHaveIN_AI 0
VarHaveOUT_AI 0
VarHaveIN_AO 2
VarHaveOUT_AO 0
VarHaveIN_LD 2
VarHaveOUT_LD 0
VarHaveIN_ST 2
VarHaveOUT_ST 4
VarHaveIN_PHI 0
VarHaveOUT_PHI 0
MaxPtsSize 1
MaxTopLvlPtsSize 1
MaxINPtsSize 1
MaxOUTPtsSize 1
NumOfAddrTakenVar 4
MaxAddrTakenVarPts 1
ProcessedAddr 6
ProcessedCopy 2
ProcessedGep 0
ProcessedLoad 4
ProcessedStore 8
ProcessedPhi 4
ProcessedAParam 0
ProcessedFRet 0
ProcessedMSSANode 4
NumOfNodesInSCC 0
MaxSCCSize 1
NumOfSCC 0
TotalPointers 18
TotalObjects 14
StoresNum 4
CopysNum 1
Pointers 18
DYFieldPtrs 0
MemObjects 8
DYFieldObjs 6
Iterations 1
IndEdgeSolved 0
NullPointer 0
#######################################################
The memory SSA representation of a program is built by adding MU and CHI functions to the LLVM IR of the program.
wpa -ander -svfg -dump-mssa swap.ll
==========FUNCTION: swap==========
2V_1 = ENCHI(MR_2V_0) pts{22 }
4V_1 = ENCHI(MR_4V_0) pts{24 }
entry
LDMU(MR_2V_1) pts{22 }
%0 = load i8** %p, align 8
LDMU(MR_4V_1) pts{24 }
%1 = load i8** %q, align 8
store i8* %1, i8** %p, align 8
2V_2 = STCHI(MR_2V_1) pts{22 }
store i8* %0, i8** %q, align 8
4V_2 = STCHI(MR_4V_1) pts{24 }
ret void
RETMU(MR_2V_2) pts{22 }
RETMU(MR_4V_2) pts{24 }
==========FUNCTION: main==========
2V_1 = ENCHI(MR_2V_0) pts{22 }
4V_1 = ENCHI(MR_4V_0) pts{24 }
entry
%a1 = alloca i8, align 1
%b1 = alloca i8, align 1
%a = alloca i8*, align 8
%b = alloca i8*, align 8
store i8* %a1, i8** %a, align 8
2V_2 = STCHI(MR_2V_1) pts{22 }
store i8* %b1, i8** %b, align 8
4V_2 = STCHI(MR_4V_1) pts{24 }
CALMU(MR_2V_2) pts{22 }
CALMU(MR_4V_2) pts{24 }
call void @swap(i8** %a, i8** %b) CallSite: call void @swap(i8** %a, i8** %b)
2V_3 = CALCHI(MR_2V_2) pts{22 }
4V_3 = CALCHI(MR_4V_2) pts{24 }
ret i32 0
RETMU(MR_2V_3) pts{22 }
RETMU(MR_4V_3) pts{24 }
An inter-procedural sparse value-flow graph (SVFG) for a program is a directed graph that captures the def-use chains of both top-level pointers and address-taken objects.
A SVFG Node can be
(1) a statement (PAGEdge),
(2) a parameter or
(3) a memory region (a set of abstract objects)
A green rectangle stands for an address PAG edge (AddrPE), a red rectangle stands for a load PAG edge (LoadPE), and a blue rectangle stands for a store PAG edge (StorePE).
A yellow rectangle represents a parameter (e.g., SVFG NodeID 14, which is an actual parameter corresponding to "a" PAG NodeID 21) or a memory region at the entry/exit of a function, a callsite or a load/store. For example, SVFG NodeID 27 represents the memory object PAG NodeID 24 that is indirectly read inside the callee function "swap" via the callsite ID 1.
wpa -ander -svfg -dump-svfg swap.ll
A SVFG can be made more compact by eliminating unnecessary nodes such as ActualParm, AcutalIn and FormalRet, FormalOut using SVFG Optimizer.