Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SimplifyCFG] Avoid increasing too many phi entries when removing empty blocks #104887

Merged
merged 8 commits into from
Sep 25, 2024

Conversation

Chengjunp
Copy link
Contributor

@Chengjunp Chengjunp commented Aug 20, 2024

Now in the simplifycfg and jumpthreading passes, we will remove the empty blocks (blocks only have phis and an unconditional branch). However, in some cases, this will increase size of the IR and slow down the compile of other passes dramatically. For example, we have the following CFG:

  1. BB1 has 100 predecessors, and unconditionally branches to BB2 (does not have any other instructions).
  2. BB2 has 100 phis.

Then in this case, if we remove BB1, for every phi in BB2, we need to increase 99 entries (replace the incoming edge from BB1 with 100 edges from its predecessors). Then in total, we will increase 9900 phi entries, which can slow down the compile time for many other passes.

Therefore, in this change, we add a check to see whether removing the empty blocks will increase lots of phi entries. Now, the threshold is 1000 (can be controlled by the command line option max-phi-entries-increase-after-removing-empty-block), which means that we will not remove an empty block if it will increase the total number of phi entries by 1000. This threshold is conservative and for most of the cases, we will not have such a large phi. So, this will only be triggered in some unusual IRs.

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 20, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (Chengjunp)

Changes

Now in the simplifycfg and jumpthreading passes, we will remove the empty blocks (blocks only have phis and an unconditional branch). However, in some cases, this will increase size of the IR and slow down the compile of other passes dramatically. For example, we have the following CFG:

  1. BB1 has 100 predecessors, and unconditionally branches to BB2 (does not have any other instructions).
  2. BB2 has 100 phis.

Then in this case, if we remove BB1, for every phi in BB2, we need to increase 99 entries (replace the incoming edge from BB1 with 100 edges from its predecessors). Then in total, we will increase 9900 phi entries, which can slow down the compile time for many other passes.

Therefore, in this change, we add a check to see whether removing the empty blocks will introduce complex phis. Now, the threshold is 100 (can be controlled by the command line option max-phi-entries-after-removing-empty-block), which means that we will not remove an empty block if it will make a phi to have more than 100 entries. This threshold is conservative and for most of the cases, we will not have such a large phi. So, this will only be triggered in some unusual IRs.


Patch is 32.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/104887.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+42-1)
  • (added) llvm/test/Transforms/SimplifyCFG/avoid-complex-phi.ll (+737)
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index efb02fdec56d7e..71c464417a2d88 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -112,6 +112,11 @@ static cl::opt<unsigned> PHICSENumPHISmallSize(
         "When the basic block contains not more than this number of PHI nodes, "
         "perform a (faster!) exhaustive search instead of set-driven one."));
 
+static cl::opt<unsigned> MaxPhiEntriesAfterRemovingEmptyBlock(
+    "max-phi-entries-after-removing-empty-block", cl::init(100), cl::Hidden,
+    cl::desc("Stop removing an empty block if removing it will make a PHI have "
+             "more than this number of incoming entries."));
+
 // Max recursion depth for collectBitParts used when detecting bswap and
 // bitreverse idioms.
 static const unsigned BitPartRecursionMaxDepth = 48;
@@ -1040,6 +1045,42 @@ CanRedirectPredsOfEmptyBBToSucc(BasicBlock *BB, BasicBlock *Succ,
   return true;
 }
 
+// Check whether removing BB will make the phis in its Succ will have too
+// many incoming entries. This function does not check whether BB is foldable
+// or not. 
+static bool introduceTooComplexPhi(BasicBlock *BB) {
+  // Check BB only has phi and an unconditional branch
+  BranchInst *Branch = dyn_cast<BranchInst>(BB->getFirstNonPHIOrDbg(true));
+  assert(Branch && Branch->isUnconditional() && "BB is not an empty block");
+
+  // If BB only has one predecessor, then removing it will not introduce more
+  // incoming edges for phis.
+  if (BB->hasNPredecessors(1))
+    return false;
+  int NumPreds = pred_size(BB);
+  auto *Succ = BB->getTerminator()->getSuccessor(0);
+  for (auto &Phi : Succ->phis()) {
+    auto BlockIdx = Phi.getBasicBlockIndex(BB);
+    if (BlockIdx >= 0) {
+      // If the incoming value is a phi and the phi is defined in BB,
+      // then removing BB will not increase the total phi entries of the ir.
+      if (PHINode *IncomingPhi =
+              dyn_cast<PHINode>(Phi.getIncomingValue(BlockIdx)))
+        if (IncomingPhi->getParent() == BB)
+          continue;
+      // Otherwise, we need to add (NumPreds - 1) entries to the phi node.
+      // If removing BB makes the phi have more than
+      // MaxPhiEntriesAfterRemovingEmptyBlock incoming values, then it will be
+      // considered as introducing too complex phi to the ir.
+      // The default threshold is 100.
+      if ((NumPreds - 1) + Phi.getNumIncomingValues() >
+          MaxPhiEntriesAfterRemovingEmptyBlock)
+        return true;
+    }
+  }
+  return false;
+}
+
 /// Replace a value flowing from a block to a phi with
 /// potentially multiple instances of that value flowing from the
 /// block's predecessors to the phi.
@@ -1139,7 +1180,7 @@ bool llvm::TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB,
       BBKillable ||
       CanRedirectPredsOfEmptyBBToSucc(BB, Succ, BBPreds, SuccPreds, CommonPred);
 
-  if (!BBKillable && !BBPhisMergeable)
+  if ((!BBKillable && !BBPhisMergeable) || IntroduceTooComplexPhi(BB))
     return false;
 
   // Check to see if merging these blocks/phis would cause conflicts for any of
diff --git a/llvm/test/Transforms/SimplifyCFG/avoid-complex-phi.ll b/llvm/test/Transforms/SimplifyCFG/avoid-complex-phi.ll
new file mode 100644
index 00000000000000..4e5aab00186e42
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/avoid-complex-phi.ll
@@ -0,0 +1,737 @@
+; RUN: opt < %s -passes=simplifycfg -S | FileCheck --check-prefixes=CHECK-100 %s
+; RUN: opt < %s -max-phi-entries-after-removing-empty-block=101 -passes=simplifycfg -S | FileCheck --check-prefixes=CHECK-101 %s
+; CHECK-100: %x = phi i16 {{((\[ [0-2], %BB[0-9]+ \], ){51})\[ [0-2], %BB[0-9]+ \]}}
+; CHECK-101: %x = phi i16 {{((\[ [0-2], %BB[0-9]+ \], ){100})\[ [0-2], %BB[0-9]+ \]}}
+
+; This test has the following CFG:
+;   1. entry has a switch to 100 blocks: BB1 - BB100
+;   2. For BB1 to BB50, it branch to BB101 and BB103
+;   3. For BB51 to BB100, it branch to BB102 and BB103
+;   4. BB101, BB102, BB103 branch to Merge unconditionally
+;   5. Merge returns value %x which is a phi having incoming blocks BB101, BB102, BB103
+; 
+; In the first test, we are going to check that the simplifycfg will not introduce a phi with more than
+; 100 incoming values. So the simplifycfg can remove only one of BB101 and BB102. Otherwise, the
+; phi of %x will have at least 50 + 50 + 1 = 101 incoming blocks (BB1 - BB100 and BB103).
+; 
+; In the second test, the threshold is changed to 101. Then both BB101 and BB102 can be removed, and 
+; %x will have 101 incoming entries.
+;
+define i16 @example(i32 %a, ptr %array) {
+entry:
+  switch i32 %a, label %BB1 [
+    i32 1, label %BB1
+    i32 2, label %BB2
+    i32 3, label %BB3
+    i32 4, label %BB4
+    i32 5, label %BB5
+    i32 6, label %BB6
+    i32 7, label %BB7
+    i32 8, label %BB8
+    i32 9, label %BB9
+    i32 10, label %BB10
+    i32 11, label %BB11
+    i32 12, label %BB12
+    i32 13, label %BB13
+    i32 14, label %BB14
+    i32 15, label %BB15
+    i32 16, label %BB16
+    i32 17, label %BB17
+    i32 18, label %BB18
+    i32 19, label %BB19
+    i32 20, label %BB20
+    i32 21, label %BB21
+    i32 22, label %BB22
+    i32 23, label %BB23
+    i32 24, label %BB24
+    i32 25, label %BB25
+    i32 26, label %BB26
+    i32 27, label %BB27
+    i32 28, label %BB28
+    i32 29, label %BB29
+    i32 30, label %BB30
+    i32 31, label %BB31
+    i32 32, label %BB32
+    i32 33, label %BB33
+    i32 34, label %BB34
+    i32 35, label %BB35
+    i32 36, label %BB36
+    i32 37, label %BB37
+    i32 38, label %BB38
+    i32 39, label %BB39
+    i32 40, label %BB40
+    i32 41, label %BB41
+    i32 42, label %BB42
+    i32 43, label %BB43
+    i32 44, label %BB44
+    i32 45, label %BB45
+    i32 46, label %BB46
+    i32 47, label %BB47
+    i32 48, label %BB48
+    i32 49, label %BB49
+    i32 50, label %BB50
+    i32 51, label %BB51
+    i32 52, label %BB52
+    i32 53, label %BB53
+    i32 54, label %BB54
+    i32 55, label %BB55
+    i32 56, label %BB56
+    i32 57, label %BB57
+    i32 58, label %BB58
+    i32 59, label %BB59
+    i32 60, label %BB60
+    i32 61, label %BB61
+    i32 62, label %BB62
+    i32 63, label %BB63
+    i32 64, label %BB64
+    i32 65, label %BB65
+    i32 66, label %BB66
+    i32 67, label %BB67
+    i32 68, label %BB68
+    i32 69, label %BB69
+    i32 70, label %BB70
+    i32 71, label %BB71
+    i32 72, label %BB72
+    i32 73, label %BB73
+    i32 74, label %BB74
+    i32 75, label %BB75
+    i32 76, label %BB76
+    i32 77, label %BB77
+    i32 78, label %BB78
+    i32 79, label %BB79
+    i32 80, label %BB80
+    i32 81, label %BB81
+    i32 82, label %BB82
+    i32 83, label %BB83
+    i32 84, label %BB84
+    i32 85, label %BB85
+    i32 86, label %BB86
+    i32 87, label %BB87
+    i32 88, label %BB88
+    i32 89, label %BB89
+    i32 90, label %BB90
+    i32 91, label %BB91
+    i32 92, label %BB92
+    i32 93, label %BB93
+    i32 94, label %BB94
+    i32 95, label %BB95
+    i32 96, label %BB96
+    i32 97, label %BB97
+    i32 98, label %BB98
+    i32 99, label %BB99
+    i32 100, label %BB100
+  ]
+
+BB1:                                              ; preds = %default, %entry
+  %elem1 = getelementptr i32, ptr %array, i32 1
+  %val1 = load i32, ptr %elem1, align 4
+  %cmp1 = icmp eq i32 %val1, 1
+  br i1 %cmp1, label %BB101, label %BB103
+
+BB2:                                              ; preds = %entry
+  %elem2 = getelementptr i32, ptr %array, i32 2
+  %val2 = load i32, ptr %elem2, align 4
+  %cmp2 = icmp eq i32 %val2, 2
+  br i1 %cmp2, label %BB101, label %BB103
+
+BB3:                                              ; preds = %entry
+  %elem3 = getelementptr i32, ptr %array, i32 3
+  %val3 = load i32, ptr %elem3, align 4
+  %cmp3 = icmp eq i32 %val3, 3
+  br i1 %cmp3, label %BB101, label %BB103
+
+BB4:                                              ; preds = %entry
+  %elem4 = getelementptr i32, ptr %array, i32 4
+  %val4 = load i32, ptr %elem4, align 4
+  %cmp4 = icmp eq i32 %val4, 4
+  br i1 %cmp4, label %BB101, label %BB103
+
+BB5:                                              ; preds = %entry
+  %elem5 = getelementptr i32, ptr %array, i32 5
+  %val5 = load i32, ptr %elem5, align 4
+  %cmp5 = icmp eq i32 %val5, 5
+  br i1 %cmp5, label %BB101, label %BB103
+
+BB6:                                              ; preds = %entry
+  %elem6 = getelementptr i32, ptr %array, i32 6
+  %val6 = load i32, ptr %elem6, align 4
+  %cmp6 = icmp eq i32 %val6, 6
+  br i1 %cmp6, label %BB101, label %BB103
+
+BB7:                                              ; preds = %entry
+  %elem7 = getelementptr i32, ptr %array, i32 7
+  %val7 = load i32, ptr %elem7, align 4
+  %cmp7 = icmp eq i32 %val7, 7
+  br i1 %cmp7, label %BB101, label %BB103
+
+BB8:                                              ; preds = %entry
+  %elem8 = getelementptr i32, ptr %array, i32 8
+  %val8 = load i32, ptr %elem8, align 4
+  %cmp8 = icmp eq i32 %val8, 8
+  br i1 %cmp8, label %BB101, label %BB103
+
+BB9:                                              ; preds = %entry
+  %elem9 = getelementptr i32, ptr %array, i32 9
+  %val9 = load i32, ptr %elem9, align 4
+  %cmp9 = icmp eq i32 %val9, 9
+  br i1 %cmp9, label %BB101, label %BB103
+
+BB10:                                             ; preds = %entry
+  %elem10 = getelementptr i32, ptr %array, i32 10
+  %val10 = load i32, ptr %elem10, align 4
+  %cmp10 = icmp eq i32 %val10, 10
+  br i1 %cmp10, label %BB101, label %BB103
+
+BB11:                                             ; preds = %entry
+  %elem11 = getelementptr i32, ptr %array, i32 11
+  %val11 = load i32, ptr %elem11, align 4
+  %cmp11 = icmp eq i32 %val11, 11
+  br i1 %cmp11, label %BB101, label %BB103
+
+BB12:                                             ; preds = %entry
+  %elem12 = getelementptr i32, ptr %array, i32 12
+  %val12 = load i32, ptr %elem12, align 4
+  %cmp12 = icmp eq i32 %val12, 12
+  br i1 %cmp12, label %BB101, label %BB103
+
+BB13:                                             ; preds = %entry
+  %elem13 = getelementptr i32, ptr %array, i32 13
+  %val13 = load i32, ptr %elem13, align 4
+  %cmp13 = icmp eq i32 %val13, 13
+  br i1 %cmp13, label %BB101, label %BB103
+
+BB14:                                             ; preds = %entry
+  %elem14 = getelementptr i32, ptr %array, i32 14
+  %val14 = load i32, ptr %elem14, align 4
+  %cmp14 = icmp eq i32 %val14, 14
+  br i1 %cmp14, label %BB101, label %BB103
+
+BB15:                                             ; preds = %entry
+  %elem15 = getelementptr i32, ptr %array, i32 15
+  %val15 = load i32, ptr %elem15, align 4
+  %cmp15 = icmp eq i32 %val15, 15
+  br i1 %cmp15, label %BB101, label %BB103
+
+BB16:                                             ; preds = %entry
+  %elem16 = getelementptr i32, ptr %array, i32 16
+  %val16 = load i32, ptr %elem16, align 4
+  %cmp16 = icmp eq i32 %val16, 16
+  br i1 %cmp16, label %BB101, label %BB103
+
+BB17:                                             ; preds = %entry
+  %elem17 = getelementptr i32, ptr %array, i32 17
+  %val17 = load i32, ptr %elem17, align 4
+  %cmp17 = icmp eq i32 %val17, 17
+  br i1 %cmp17, label %BB101, label %BB103
+
+BB18:                                             ; preds = %entry
+  %elem18 = getelementptr i32, ptr %array, i32 18
+  %val18 = load i32, ptr %elem18, align 4
+  %cmp18 = icmp eq i32 %val18, 18
+  br i1 %cmp18, label %BB101, label %BB103
+
+BB19:                                             ; preds = %entry
+  %elem19 = getelementptr i32, ptr %array, i32 19
+  %val19 = load i32, ptr %elem19, align 4
+  %cmp19 = icmp eq i32 %val19, 19
+  br i1 %cmp19, label %BB101, label %BB103
+
+BB20:                                             ; preds = %entry
+  %elem20 = getelementptr i32, ptr %array, i32 20
+  %val20 = load i32, ptr %elem20, align 4
+  %cmp20 = icmp eq i32 %val20, 20
+  br i1 %cmp20, label %BB101, label %BB103
+
+BB21:                                             ; preds = %entry
+  %elem21 = getelementptr i32, ptr %array, i32 21
+  %val21 = load i32, ptr %elem21, align 4
+  %cmp21 = icmp eq i32 %val21, 21
+  br i1 %cmp21, label %BB101, label %BB103
+
+BB22:                                             ; preds = %entry
+  %elem22 = getelementptr i32, ptr %array, i32 22
+  %val22 = load i32, ptr %elem22, align 4
+  %cmp22 = icmp eq i32 %val22, 22
+  br i1 %cmp22, label %BB101, label %BB103
+
+BB23:                                             ; preds = %entry
+  %elem23 = getelementptr i32, ptr %array, i32 23
+  %val23 = load i32, ptr %elem23, align 4
+  %cmp23 = icmp eq i32 %val23, 23
+  br i1 %cmp23, label %BB101, label %BB103
+
+BB24:                                             ; preds = %entry
+  %elem24 = getelementptr i32, ptr %array, i32 24
+  %val24 = load i32, ptr %elem24, align 4
+  %cmp24 = icmp eq i32 %val24, 24
+  br i1 %cmp24, label %BB101, label %BB103
+
+BB25:                                             ; preds = %entry
+  %elem25 = getelementptr i32, ptr %array, i32 25
+  %val25 = load i32, ptr %elem25, align 4
+  %cmp25 = icmp eq i32 %val25, 25
+  br i1 %cmp25, label %BB101, label %BB103
+
+BB26:                                             ; preds = %entry
+  %elem26 = getelementptr i32, ptr %array, i32 26
+  %val26 = load i32, ptr %elem26, align 4
+  %cmp26 = icmp eq i32 %val26, 26
+  br i1 %cmp26, label %BB101, label %BB103
+
+BB27:                                             ; preds = %entry
+  %elem27 = getelementptr i32, ptr %array, i32 27
+  %val27 = load i32, ptr %elem27, align 4
+  %cmp27 = icmp eq i32 %val27, 27
+  br i1 %cmp27, label %BB101, label %BB103
+
+BB28:                                             ; preds = %entry
+  %elem28 = getelementptr i32, ptr %array, i32 28
+  %val28 = load i32, ptr %elem28, align 4
+  %cmp28 = icmp eq i32 %val28, 28
+  br i1 %cmp28, label %BB101, label %BB103
+
+BB29:                                             ; preds = %entry
+  %elem29 = getelementptr i32, ptr %array, i32 29
+  %val29 = load i32, ptr %elem29, align 4
+  %cmp29 = icmp eq i32 %val29, 29
+  br i1 %cmp29, label %BB101, label %BB103
+
+BB30:                                             ; preds = %entry
+  %elem30 = getelementptr i32, ptr %array, i32 30
+  %val30 = load i32, ptr %elem30, align 4
+  %cmp30 = icmp eq i32 %val30, 30
+  br i1 %cmp30, label %BB101, label %BB103
+
+BB31:                                             ; preds = %entry
+  %elem31 = getelementptr i32, ptr %array, i32 31
+  %val31 = load i32, ptr %elem31, align 4
+  %cmp31 = icmp eq i32 %val31, 31
+  br i1 %cmp31, label %BB101, label %BB103
+
+BB32:                                             ; preds = %entry
+  %elem32 = getelementptr i32, ptr %array, i32 32
+  %val32 = load i32, ptr %elem32, align 4
+  %cmp32 = icmp eq i32 %val32, 32
+  br i1 %cmp32, label %BB101, label %BB103
+
+BB33:                                             ; preds = %entry
+  %elem33 = getelementptr i32, ptr %array, i32 33
+  %val33 = load i32, ptr %elem33, align 4
+  %cmp33 = icmp eq i32 %val33, 33
+  br i1 %cmp33, label %BB101, label %BB103
+
+BB34:                                             ; preds = %entry
+  %elem34 = getelementptr i32, ptr %array, i32 34
+  %val34 = load i32, ptr %elem34, align 4
+  %cmp34 = icmp eq i32 %val34, 34
+  br i1 %cmp34, label %BB101, label %BB103
+
+BB35:                                             ; preds = %entry
+  %elem35 = getelementptr i32, ptr %array, i32 35
+  %val35 = load i32, ptr %elem35, align 4
+  %cmp35 = icmp eq i32 %val35, 35
+  br i1 %cmp35, label %BB101, label %BB103
+
+BB36:                                             ; preds = %entry
+  %elem36 = getelementptr i32, ptr %array, i32 36
+  %val36 = load i32, ptr %elem36, align 4
+  %cmp36 = icmp eq i32 %val36, 36
+  br i1 %cmp36, label %BB101, label %BB103
+
+BB37:                                             ; preds = %entry
+  %elem37 = getelementptr i32, ptr %array, i32 37
+  %val37 = load i32, ptr %elem37, align 4
+  %cmp37 = icmp eq i32 %val37, 37
+  br i1 %cmp37, label %BB101, label %BB103
+
+BB38:                                             ; preds = %entry
+  %elem38 = getelementptr i32, ptr %array, i32 38
+  %val38 = load i32, ptr %elem38, align 4
+  %cmp38 = icmp eq i32 %val38, 38
+  br i1 %cmp38, label %BB101, label %BB103
+
+BB39:                                             ; preds = %entry
+  %elem39 = getelementptr i32, ptr %array, i32 39
+  %val39 = load i32, ptr %elem39, align 4
+  %cmp39 = icmp eq i32 %val39, 39
+  br i1 %cmp39, label %BB101, label %BB103
+
+BB40:                                             ; preds = %entry
+  %elem40 = getelementptr i32, ptr %array, i32 40
+  %val40 = load i32, ptr %elem40, align 4
+  %cmp40 = icmp eq i32 %val40, 40
+  br i1 %cmp40, label %BB101, label %BB103
+
+BB41:                                             ; preds = %entry
+  %elem41 = getelementptr i32, ptr %array, i32 41
+  %val41 = load i32, ptr %elem41, align 4
+  %cmp41 = icmp eq i32 %val41, 41
+  br i1 %cmp41, label %BB101, label %BB103
+
+BB42:                                             ; preds = %entry
+  %elem42 = getelementptr i32, ptr %array, i32 42
+  %val42 = load i32, ptr %elem42, align 4
+  %cmp42 = icmp eq i32 %val42, 42
+  br i1 %cmp42, label %BB101, label %BB103
+
+BB43:                                             ; preds = %entry
+  %elem43 = getelementptr i32, ptr %array, i32 43
+  %val43 = load i32, ptr %elem43, align 4
+  %cmp43 = icmp eq i32 %val43, 43
+  br i1 %cmp43, label %BB101, label %BB103
+
+BB44:                                             ; preds = %entry
+  %elem44 = getelementptr i32, ptr %array, i32 44
+  %val44 = load i32, ptr %elem44, align 4
+  %cmp44 = icmp eq i32 %val44, 44
+  br i1 %cmp44, label %BB101, label %BB103
+
+BB45:                                             ; preds = %entry
+  %elem45 = getelementptr i32, ptr %array, i32 45
+  %val45 = load i32, ptr %elem45, align 4
+  %cmp45 = icmp eq i32 %val45, 45
+  br i1 %cmp45, label %BB101, label %BB103
+
+BB46:                                             ; preds = %entry
+  %elem46 = getelementptr i32, ptr %array, i32 46
+  %val46 = load i32, ptr %elem46, align 4
+  %cmp46 = icmp eq i32 %val46, 46
+  br i1 %cmp46, label %BB101, label %BB103
+
+BB47:                                             ; preds = %entry
+  %elem47 = getelementptr i32, ptr %array, i32 47
+  %val47 = load i32, ptr %elem47, align 4
+  %cmp47 = icmp eq i32 %val47, 47
+  br i1 %cmp47, label %BB101, label %BB103
+
+BB48:                                             ; preds = %entry
+  %elem48 = getelementptr i32, ptr %array, i32 48
+  %val48 = load i32, ptr %elem48, align 4
+  %cmp48 = icmp eq i32 %val48, 48
+  br i1 %cmp48, label %BB101, label %BB103
+
+BB49:                                             ; preds = %entry
+  %elem49 = getelementptr i32, ptr %array, i32 49
+  %val49 = load i32, ptr %elem49, align 4
+  %cmp49 = icmp eq i32 %val49, 49
+  br i1 %cmp49, label %BB101, label %BB103
+
+BB50:                                             ; preds = %entry
+  %elem50 = getelementptr i32, ptr %array, i32 50
+  %val50 = load i32, ptr %elem50, align 4
+  %cmp50 = icmp eq i32 %val50, 50
+  br i1 %cmp50, label %BB101, label %BB103
+
+BB51:                                             ; preds = %entry
+  %elem51 = getelementptr i32, ptr %array, i32 51
+  %val51 = load i32, ptr %elem51, align 4
+  %cmp51 = icmp eq i32 %val51, 51
+  br i1 %cmp51, label %BB102, label %BB103
+
+BB52:                                             ; preds = %entry
+  %elem52 = getelementptr i32, ptr %array, i32 52
+  %val52 = load i32, ptr %elem52, align 4
+  %cmp52 = icmp eq i32 %val52, 52
+  br i1 %cmp52, label %BB102, label %BB103
+
+BB53:                                             ; preds = %entry
+  %elem53 = getelementptr i32, ptr %array, i32 53
+  %val53 = load i32, ptr %elem53, align 4
+  %cmp53 = icmp eq i32 %val53, 53
+  br i1 %cmp53, label %BB102, label %BB103
+
+BB54:                                             ; preds = %entry
+  %elem54 = getelementptr i32, ptr %array, i32 54
+  %val54 = load i32, ptr %elem54, align 4
+  %cmp54 = icmp eq i32 %val54, 54
+  br i1 %cmp54, label %BB102, label %BB103
+
+BB55:                                             ; preds = %entry
+  %elem55 = getelementptr i32, ptr %array, i32 55
+  %val55 = load i32, ptr %elem55, align 4
+  %cmp55 = icmp eq i32 %val55, 55
+  br i1 %cmp55, label %BB102, label %BB103
+
+...
[truncated]

Copy link

github-actions bot commented Aug 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general direction of the change makes sense to me.

I'm slightly worried that this may cause regressions for switch transforms, where it may be normal to have very large phis.

A possible alternative heuristic would not look at the number of entries in one phi, but the total number of entries across all phis in the successor, so that we would limit less for one phi and more for a large number of phis. But hard to say in abstract whether that's better...

llvm/lib/Transforms/Utils/Local.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/Local.cpp Outdated Show resolved Hide resolved
static bool introduceTooComplexPhi(BasicBlock *BB) {
// Check BB only has phi and an unconditional branch
BranchInst *Branch = dyn_cast<BranchInst>(BB->getFirstNonPHIOrDbg(true));
assert(Branch && Branch->isUnconditional() && "BB is not an empty block");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch is unused outside the assertion, and getFirstNonPHIOrDbg() may be expensive.

I think it would be better to pass in BasicBlock *Succ to this function, like many other parts of this transform. The fact that it's an empty block with single successor is verified elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for pointing this out. I will update this.

@nikic nikic requested a review from dtcxzyw August 20, 2024 07:59
@nikic nikic changed the title [CFG] Avoid introducing complex phi when removing empty blocks [SimplifyCFG] Avoid introducing complex phi when removing empty blocks Aug 20, 2024
@nikic
Copy link
Contributor

nikic commented Aug 20, 2024

@Chengjunp
Copy link
Contributor Author

Hi, thank you for the review. I think the minor improvement can be expected because this kind of pattern is unusual. I did encounter a very extreme case that by adding this check, it can save more than 20% compile time.

In term of the heuristic, looking at the total number of entries across all Phis in the successor also makes sense to me. But I'm considering how to determine the threshold in that case. Should the threshold be the same regardless of the number of Phis in the successor? Or the threshold should be linear to the number of Phis in the Successor?

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Aug 21, 2024
@Chengjunp
Copy link
Contributor Author

Hi @nikic, @dtcxzyw, I have updated the code. Could you help to have another review? Thanks!

@nikic
Copy link
Contributor

nikic commented Aug 29, 2024

In term of the heuristic, looking at the total number of entries across all Phis in the successor also makes sense to me. But I'm considering how to determine the threshold in that case. Should the threshold be the same regardless of the number of Phis in the successor? Or the threshold should be linear to the number of Phis in the Successor?

If I'm understanding correctly, what we really want to prevent is a quadratic increase in phi entries, as we may be adding up to #phis * #preds new entries. So I think that is the quantity we should be limiting with a fixed limit (though it should be higher than 100 in that case -- would a limit of 1000 work?)

@XChy
Copy link
Member

XChy commented Aug 29, 2024

Good catch. Seem to be related to slowing down in #86312.

@Chengjunp Chengjunp changed the title [SimplifyCFG] Avoid introducing complex phi when removing empty blocks [SimplifyCFG] Avoid increasing too many phi entries when removing empty blocks Sep 5, 2024
@Chengjunp
Copy link
Contributor Author

Hi @nikic, @dtcxzyw, @XChy, I have updated the heuristic so that it will look at the increase of the number phi entries to determine whether removing a block or not.
But this change has one issue that if there are many empty blocks in the IR and removing any of them will not increase the number of phi entries greatly but removing all of them will have a large increase in the total phi entries. For this case, currently heuristic will remove all of them.
Therefore, I'm wondering whether we should have something like a global map in the simplifycfg to track how many new phi entries we have already introduced to each block and stop further removing the blocks if the total new phi entries in one block reach a threshold (for example, 1000).

@Chengjunp
Copy link
Contributor Author

Hi @nikic, I took any look at the code, and I think it could be a little bit complex to track the total number of new phi entries of different blocks in a pass since the function TryToSimplifyUncondBranchFromEmptyBlock could be used in different passes. Could you help to review the current change and do a compile time test on it? Thank you very much!

@Chengjunp
Copy link
Contributor Author

Ping @nikic, @dtcxzyw

Comment on lines 1056 to 1058
// Check whether removing BB will make the phis in its Succ have too
// many incoming entries. This function does not check whether BB is foldable
// or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check whether removing BB will make the phis in its Succ have too
// many incoming entries. This function does not check whether BB is foldable
// or not.
/// Check whether removing \p BB will make the phis in its \p Succ have too
/// many incoming entries. This function does not check whether \p BB is foldable
/// or not.

for (auto &Phi : Succ->phis()) {
// If the incoming value is a phi and the phi is defined in BB,
// then removing BB will not increase the total phi entries of the ir.
if (PHINode *IncomingPhi =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for this case.

@@ -0,0 +1,782 @@
; RUN: opt < %s -passes=simplifycfg -S | FileCheck --check-prefixes=CHECK-1000 %s
; RUN: opt < %s -max-phi-entries-increase-after-removing-empty-block=989 -passes=simplifycfg -S | FileCheck --check-prefixes=CHECK-989 %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pick a smaller threshold to reduce size of this file.

@nikic
Copy link
Contributor

nikic commented Sep 20, 2024

Hi @nikic, @dtcxzyw, @XChy, I have updated the heuristic so that it will look at the increase of the number phi entries to determine whether removing a block or not. But this change has one issue that if there are many empty blocks in the IR and removing any of them will not increase the number of phi entries greatly but removing all of them will have a large increase in the total phi entries. For this case, currently heuristic will remove all of them.

Does this patch still cover your original motivating case, or is the new version not sufficient due to this problem?

@Chengjunp
Copy link
Contributor Author

Does this patch still cover your original motivating case, or is the new version not sufficient due to this problem?

Sorry for the late reply. I have done the tests during the last few days and the current version can help improve the compile time in my original case. If everything looks fine, could you help review and merge this change? Thanks!

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@nikic nikic merged commit e4688b9 into llvm:main Sep 25, 2024
8 checks passed
@Chengjunp Chengjunp deleted the chengjunp/AvoidComplexPhi branch September 26, 2024 05:53
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
…ty blocks (llvm#104887)

Now in the simplifycfg and jumpthreading passes, we will remove the
empty blocks (blocks only have phis and an unconditional branch).
However, in some cases, this will increase size of the IR and slow down
the compile of other passes dramatically. For example, we have the
following CFG:

1. BB1 has 100 predecessors, and unconditionally branches to BB2 (does
not have any other instructions).
2. BB2 has 100 phis.

Then in this case, if we remove BB1, for every phi in BB2, we need to
increase 99 entries (replace the incoming edge from BB1 with 100 edges
from its predecessors). Then in total, we will increase 9900 phi
entries, which can slow down the compile time for many other passes.

Therefore, in this change, we add a check to see whether removing the
empty blocks will increase lots of phi entries. Now, the threshold is
1000 (can be controlled by the command line option
`max-phi-entries-increase-after-removing-empty-block`), which means that
we will not remove an empty block if it will increase the total number
of phi entries by 1000. This threshold is conservative and for most of
the cases, we will not have such a large phi. So, this will only be
triggered in some unusual IRs.
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…ty blocks (llvm#104887)

Now in the simplifycfg and jumpthreading passes, we will remove the
empty blocks (blocks only have phis and an unconditional branch).
However, in some cases, this will increase size of the IR and slow down
the compile of other passes dramatically. For example, we have the
following CFG:

1. BB1 has 100 predecessors, and unconditionally branches to BB2 (does
not have any other instructions).
2. BB2 has 100 phis.

Then in this case, if we remove BB1, for every phi in BB2, we need to
increase 99 entries (replace the incoming edge from BB1 with 100 edges
from its predecessors). Then in total, we will increase 9900 phi
entries, which can slow down the compile time for many other passes.

Therefore, in this change, we add a check to see whether removing the
empty blocks will increase lots of phi entries. Now, the threshold is
1000 (can be controlled by the command line option
`max-phi-entries-increase-after-removing-empty-block`), which means that
we will not remove an empty block if it will increase the total number
of phi entries by 1000. This threshold is conservative and for most of
the cases, we will not have such a large phi. So, this will only be
triggered in some unusual IRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants