Skip to content

[LoongArch] Optimize conditional branches #147885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

heiher
Copy link
Member

@heiher heiher commented Jul 10, 2025

This patch attempts to optimize conditional branches by combinding logical operations within the conditions. This enables the selection of more efficient branch instructions. For example, for integers, blez x can be used instead of blt x, (ori, t, 1); for floating-point comparisons, dedicated floating-point branch instructions can be used to avoid moving the result to an integer register.

This patch attempts to optimize conditional branches by combinding logical
operations within the conditions. This enables the selection of more efficient
branch instructions. For example, for integers, `blez x` can be used instead of
`blt x, (ori, t, 1)`; for floating-point comparisons, dedicated floating-point
branch instructions can be used to avoid moving the result to an integer
register.
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2025

@llvm/pr-subscribers-backend-loongarch

Author: hev (heiher)

Changes

This patch attempts to optimize conditional branches by combinding logical operations within the conditions. This enables the selection of more efficient branch instructions. For example, for integers, blez x can be used instead of blt x, (ori, t, 1); for floating-point comparisons, dedicated floating-point branch instructions can be used to avoid moving the result to an integer register.


Patch is 24.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147885.diff

10 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td (+9-3)
  • (modified) llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td (+2)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+175)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+5)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+25-37)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll (+4-10)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll (+4-10)
  • (modified) llvm/test/CodeGen/LoongArch/merge-base-offset-tlsle.ll (+2-4)
  • (modified) llvm/test/CodeGen/LoongArch/merge-base-offset.ll (+3-6)
  • (modified) llvm/test/CodeGen/LoongArch/preferred-alignments.ll (+3-8)
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
index d5a5f17348e4b..99205fbe023c8 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
@@ -10,6 +10,9 @@
 //
 //===----------------------------------------------------------------------===//
 
+def NotBoolXor : PatFrags<(ops node:$val),
+                          [(xor node:$val, -1), (xor node:$val, 1)]>;
+
 //===----------------------------------------------------------------------===//
 // LoongArch specific DAG Nodes.
 //===----------------------------------------------------------------------===//
@@ -22,6 +25,7 @@ def SDT_LoongArchFTINT : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
 def SDT_LoongArchFRECIPE : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
 def SDT_LoongArchFRSQRTE : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
 
+def loongarch_brcond : SDNode<"LoongArchISD::BRCOND", SDTBrcond, [SDNPHasChain]>;
 def loongarch_movgr2fr_w_la64
     : SDNode<"LoongArchISD::MOVGR2FR_W_LA64", SDT_LoongArchMOVGR2FR_W_LA64>;
 def loongarch_movfr2gr_s_la64
@@ -208,16 +212,18 @@ def : PatFPSetcc<SETUO,  FCMP_CUN_S,  FPR32>;
 def : PatFPSetcc<SETLT,  FCMP_CLT_S,  FPR32>;
 
 multiclass PatFPBrcond<CondCode cc, LAInst CmpInst, RegisterClass RegTy> {
-  def : Pat<(brcond (xor (GRLenVT (setcc RegTy:$fj, RegTy:$fk, cc)), -1),
-                     bb:$imm21),
+  def : Pat<(loongarch_brcond (NotBoolXor (GRLenVT (setcc RegTy:$fj, RegTy:$fk, cc))),
+                              bb:$imm21),
             (BCEQZ (CmpInst RegTy:$fj, RegTy:$fk), bb:$imm21)>;
-  def : Pat<(brcond (GRLenVT (setcc RegTy:$fj, RegTy:$fk, cc)), bb:$imm21),
+  def : Pat<(loongarch_brcond (GRLenVT (setcc RegTy:$fj, RegTy:$fk, cc)), bb:$imm21),
             (BCNEZ (CmpInst RegTy:$fj, RegTy:$fk), bb:$imm21)>;
 }
 
 defm : PatFPBrcond<SETOEQ, FCMP_CEQ_S, FPR32>;
+defm : PatFPBrcond<SETEQ , FCMP_CEQ_S, FPR32>;
 defm : PatFPBrcond<SETOLT, FCMP_CLT_S, FPR32>;
 defm : PatFPBrcond<SETOLE, FCMP_CLE_S, FPR32>;
+defm : PatFPBrcond<SETLE,  FCMP_CLE_S, FPR32>;
 defm : PatFPBrcond<SETONE, FCMP_CNE_S, FPR32>;
 defm : PatFPBrcond<SETO,   FCMP_COR_S, FPR32>;
 defm : PatFPBrcond<SETUEQ, FCMP_CUEQ_S, FPR32>;
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
index 616640152c8d3..965ad8a0a35c6 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
@@ -184,8 +184,10 @@ def : PatFPSetcc<SETUO,  FCMP_CUN_D,  FPR64>;
 def : PatFPSetcc<SETLT,  FCMP_CLT_D,  FPR64>;
 
 defm : PatFPBrcond<SETOEQ, FCMP_CEQ_D, FPR64>;
+defm : PatFPBrcond<SETEQ,  FCMP_CEQ_D, FPR64>;
 defm : PatFPBrcond<SETOLT, FCMP_CLT_D, FPR64>;
 defm : PatFPBrcond<SETOLE, FCMP_CLE_D, FPR64>;
+defm : PatFPBrcond<SETLE,  FCMP_CLE_D, FPR64>;
 defm : PatFPBrcond<SETONE, FCMP_CNE_D, FPR64>;
 defm : PatFPBrcond<SETO,   FCMP_COR_D, FPR64>;
 defm : PatFPBrcond<SETUEQ, FCMP_CUEQ_D, FPR64>;
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index 72dbb44815657..23ec33213c2cf 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -126,6 +126,7 @@ LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
 
   setOperationAction(ISD::BR_JT, MVT::Other, Expand);
   setOperationAction(ISD::BR_CC, GRLenVT, Expand);
+  setOperationAction(ISD::BRCOND, MVT::Other, Custom);
   setOperationAction(ISD::SELECT_CC, GRLenVT, Expand);
   setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
   setOperationAction({ISD::SMUL_LOHI, ISD::UMUL_LOHI}, GRLenVT, Expand);
@@ -509,6 +510,8 @@ SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,
     return lowerPREFETCH(Op, DAG);
   case ISD::SELECT:
     return lowerSELECT(Op, DAG);
+  case ISD::BRCOND:
+    return lowerBRCOND(Op, DAG);
   case ISD::FP_TO_FP16:
     return lowerFP_TO_FP16(Op, DAG);
   case ISD::FP16_TO_FP:
@@ -854,6 +857,35 @@ SDValue LoongArchTargetLowering::lowerSELECT(SDValue Op,
   return DAG.getNode(LoongArchISD::SELECT_CC, DL, VT, Ops);
 }
 
+SDValue LoongArchTargetLowering::lowerBRCOND(SDValue Op,
+                                             SelectionDAG &DAG) const {
+  SDValue CondV = Op.getOperand(1);
+  SDLoc DL(Op);
+  MVT GRLenVT = Subtarget.getGRLenVT();
+
+  if (CondV.getOpcode() == ISD::SETCC) {
+    if (CondV.getOperand(0).getValueType() == GRLenVT) {
+      SDValue LHS = CondV.getOperand(0);
+      SDValue RHS = CondV.getOperand(1);
+      ISD::CondCode CCVal = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+      translateSetCCForBranch(DL, LHS, RHS, CCVal, DAG);
+
+      SDValue TargetCC = DAG.getCondCode(CCVal);
+      return DAG.getNode(LoongArchISD::BR_CC, DL, Op.getValueType(),
+                         Op.getOperand(0), LHS, RHS, TargetCC,
+                         Op.getOperand(2));
+    } else if (CondV.getOperand(0).getValueType().isFloatingPoint()) {
+      return DAG.getNode(LoongArchISD::BRCOND, DL, Op.getValueType(),
+                         Op.getOperand(0), CondV, Op.getOperand(2));
+    }
+  }
+
+  return DAG.getNode(LoongArchISD::BR_CC, DL, Op.getValueType(),
+                     Op.getOperand(0), CondV, DAG.getConstant(0, DL, GRLenVT),
+                     DAG.getCondCode(ISD::SETNE), Op.getOperand(2));
+}
+
 SDValue
 LoongArchTargetLowering::lowerSCALAR_TO_VECTOR(SDValue Op,
                                                SelectionDAG &DAG) const {
@@ -5020,6 +5052,145 @@ static SDValue performBITREV_WCombine(SDNode *N, SelectionDAG &DAG,
                      Src.getOperand(0));
 }
 
+// Perform common combines for BR_CC and SELECT_CC conditions.
+static bool combine_CC(SDValue &LHS, SDValue &RHS, SDValue &CC, const SDLoc &DL,
+                       SelectionDAG &DAG, const LoongArchSubtarget &Subtarget) {
+  ISD::CondCode CCVal = cast<CondCodeSDNode>(CC)->get();
+
+  // As far as arithmetic right shift always saves the sign,
+  // shift can be omitted.
+  // Fold setlt (sra X, N), 0 -> setlt X, 0 and
+  // setge (sra X, N), 0 -> setge X, 0
+  if (isNullConstant(RHS) && (CCVal == ISD::SETGE || CCVal == ISD::SETLT) &&
+      LHS.getOpcode() == ISD::SRA) {
+    LHS = LHS.getOperand(0);
+    return true;
+  }
+
+  if (!ISD::isIntEqualitySetCC(CCVal))
+    return false;
+
+  // Fold ((setlt X, Y), 0, ne) -> (X, Y, lt)
+  // Sometimes the setcc is introduced after br_cc/select_cc has been formed.
+  if (LHS.getOpcode() == ISD::SETCC && isNullConstant(RHS) &&
+      LHS.getOperand(0).getValueType() == Subtarget.getGRLenVT()) {
+    // If we're looking for eq 0 instead of ne 0, we need to invert the
+    // condition.
+    bool Invert = CCVal == ISD::SETEQ;
+    CCVal = cast<CondCodeSDNode>(LHS.getOperand(2))->get();
+    if (Invert)
+      CCVal = ISD::getSetCCInverse(CCVal, LHS.getValueType());
+
+    RHS = LHS.getOperand(1);
+    LHS = LHS.getOperand(0);
+    translateSetCCForBranch(DL, LHS, RHS, CCVal, DAG);
+
+    CC = DAG.getCondCode(CCVal);
+    return true;
+  }
+
+  // If XOR is reused and has an immediate that will fit in XORI,
+  // do not fold.
+  auto isXorImmediate = [](const SDValue &Op) -> bool {
+    if (const auto *XorCnst = dyn_cast<ConstantSDNode>(Op))
+      return isInt<12>(XorCnst->getSExtValue());
+    return false;
+  };
+  // Fold (X(i1) ^ 1) == 0 -> X != 0
+  auto singleBitOp = [&DAG](const SDValue &VarOp,
+                            const SDValue &ConstOp) -> bool {
+    if (const auto *XorCnst = dyn_cast<ConstantSDNode>(ConstOp)) {
+      const APInt Mask = APInt::getBitsSetFrom(VarOp.getValueSizeInBits(), 1);
+      return (XorCnst->getSExtValue() == 1) &&
+             DAG.MaskedValueIsZero(VarOp, Mask);
+    }
+    return false;
+  };
+  auto onlyUsedBySelectOrBR = [](const SDValue &Op) -> bool {
+    for (const SDNode *UserNode : Op->users()) {
+      const unsigned Opcode = UserNode->getOpcode();
+      if (Opcode != LoongArchISD::SELECT_CC && Opcode != LoongArchISD::BR_CC)
+        return false;
+    }
+    return true;
+  };
+  auto isFoldableXorEq = [isXorImmediate, singleBitOp, onlyUsedBySelectOrBR](
+                             const SDValue &LHS, const SDValue &RHS) -> bool {
+    return LHS.getOpcode() == ISD::XOR && isNullConstant(RHS) &&
+           (!isXorImmediate(LHS.getOperand(1)) ||
+            singleBitOp(LHS.getOperand(0), LHS.getOperand(1)) ||
+            onlyUsedBySelectOrBR(LHS));
+  };
+  // Fold ((xor X, Y), 0, eq/ne) -> (X, Y, eq/ne)
+  if (isFoldableXorEq(LHS, RHS)) {
+    RHS = LHS.getOperand(1);
+    LHS = LHS.getOperand(0);
+    return true;
+  }
+  // Fold ((sext (xor X, C)), 0, eq/ne) -> ((sext(X), C, eq/ne)
+  if (LHS.getOpcode() == ISD::SIGN_EXTEND_INREG) {
+    const SDValue LHS0 = LHS.getOperand(0);
+    if (isFoldableXorEq(LHS0, RHS) && isa<ConstantSDNode>(LHS0.getOperand(1))) {
+      // SEXT(XOR(X, Y)) -> XOR(SEXT(X), SEXT(Y)))
+      RHS = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, LHS.getValueType(),
+                        LHS0.getOperand(1), LHS.getOperand(1));
+      LHS = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, LHS.getValueType(),
+                        LHS0.getOperand(0), LHS.getOperand(1));
+      return true;
+    }
+  }
+
+  // Fold ((srl (and X, 1<<C), C), 0, eq/ne) -> ((shl X, GRLen-1-C), 0, ge/lt)
+  if (isNullConstant(RHS) && LHS.getOpcode() == ISD::SRL && LHS.hasOneUse() &&
+      LHS.getOperand(1).getOpcode() == ISD::Constant) {
+    SDValue LHS0 = LHS.getOperand(0);
+    if (LHS0.getOpcode() == ISD::AND &&
+        LHS0.getOperand(1).getOpcode() == ISD::Constant) {
+      uint64_t Mask = LHS0.getConstantOperandVal(1);
+      uint64_t ShAmt = LHS.getConstantOperandVal(1);
+      if (isPowerOf2_64(Mask) && Log2_64(Mask) == ShAmt) {
+        CCVal = CCVal == ISD::SETEQ ? ISD::SETGE : ISD::SETLT;
+        CC = DAG.getCondCode(CCVal);
+
+        ShAmt = LHS.getValueSizeInBits() - 1 - ShAmt;
+        LHS = LHS0.getOperand(0);
+        if (ShAmt != 0)
+          LHS =
+              DAG.getNode(ISD::SHL, DL, LHS.getValueType(), LHS0.getOperand(0),
+                          DAG.getConstant(ShAmt, DL, LHS.getValueType()));
+        return true;
+      }
+    }
+  }
+
+  // (X, 1, setne) -> (X, 0, seteq) if we can prove X is 0/1.
+  // This can occur when legalizing some floating point comparisons.
+  APInt Mask = APInt::getBitsSetFrom(LHS.getValueSizeInBits(), 1);
+  if (isOneConstant(RHS) && DAG.MaskedValueIsZero(LHS, Mask)) {
+    CCVal = ISD::getSetCCInverse(CCVal, LHS.getValueType());
+    CC = DAG.getCondCode(CCVal);
+    RHS = DAG.getConstant(0, DL, LHS.getValueType());
+    return true;
+  }
+
+  return false;
+}
+
+static SDValue performBR_CCCombine(SDNode *N, SelectionDAG &DAG,
+                                   TargetLowering::DAGCombinerInfo &DCI,
+                                   const LoongArchSubtarget &Subtarget) {
+  SDValue LHS = N->getOperand(1);
+  SDValue RHS = N->getOperand(2);
+  SDValue CC = N->getOperand(3);
+  SDLoc DL(N);
+
+  if (combine_CC(LHS, RHS, CC, DL, DAG, Subtarget))
+    return DAG.getNode(LoongArchISD::BR_CC, DL, N->getValueType(0),
+                       N->getOperand(0), LHS, RHS, CC, N->getOperand(4));
+
+  return SDValue();
+}
+
 template <unsigned N>
 static SDValue legalizeIntrinsicImmArg(SDNode *Node, unsigned ImmOp,
                                        SelectionDAG &DAG,
@@ -5712,6 +5883,8 @@ SDValue LoongArchTargetLowering::PerformDAGCombine(SDNode *N,
     return performBITCASTCombine(N, DAG, DCI, Subtarget);
   case LoongArchISD::BITREV_W:
     return performBITREV_WCombine(N, DAG, DCI, Subtarget);
+  case LoongArchISD::BR_CC:
+    return performBR_CCCombine(N, DAG, DCI, Subtarget);
   case ISD::INTRINSIC_WO_CHAIN:
     return performINTRINSIC_WO_CHAINCombine(N, DAG, DCI, Subtarget);
   case LoongArchISD::MOVGR2FR_W_LA64:
@@ -6435,6 +6608,8 @@ const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
     NODE_NAME_CASE(TAIL_MEDIUM)
     NODE_NAME_CASE(TAIL_LARGE)
     NODE_NAME_CASE(SELECT_CC)
+    NODE_NAME_CASE(BR_CC)
+    NODE_NAME_CASE(BRCOND)
     NODE_NAME_CASE(SLL_W)
     NODE_NAME_CASE(SRA_W)
     NODE_NAME_CASE(SRL_W)
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index 60dc2b385a75c..3f849ef05845b 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -37,6 +37,10 @@ enum NodeType : unsigned {
   // Select
   SELECT_CC,
 
+  // Branch
+  BR_CC,
+  BRCOND,
+
   // 32-bit shifts, directly matching the semantics of the named LoongArch
   // instructions.
   SLL_W,
@@ -381,6 +385,7 @@ class LoongArchTargetLowering : public TargetLowering {
   SDValue lowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerPREFETCH(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerSELECT(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerFP16_TO_FP(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerFP_TO_BF16(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
index 2b94e65cac0e5..20ccc622f58dc 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
@@ -31,6 +31,10 @@ def SDT_LoongArchSelectCC : SDTypeProfile<1, 5, [SDTCisSameAs<1, 2>,
                                                  SDTCisSameAs<0, 4>,
                                                  SDTCisSameAs<4, 5>]>;
 
+def SDT_LoongArchBrCC : SDTypeProfile<0, 4, [SDTCisSameAs<0, 1>,
+                                             SDTCisVT<2, OtherVT>,
+                                             SDTCisVT<3, OtherVT>]>;
+
 def SDT_LoongArchBStrIns: SDTypeProfile<1, 4, [
   SDTCisInt<0>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<3>,
   SDTCisSameAs<3, 4>
@@ -94,6 +98,8 @@ def loongarch_tail_large : SDNode<"LoongArchISD::TAIL_LARGE", SDT_LoongArchCall,
                                   [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue,
                                    SDNPVariadic]>;
 def loongarch_selectcc : SDNode<"LoongArchISD::SELECT_CC", SDT_LoongArchSelectCC>;
+def loongarch_brcc : SDNode<"LoongArchISD::BR_CC", SDT_LoongArchBrCC,
+                            [SDNPHasChain]>;
 def loongarch_sll_w : SDNode<"LoongArchISD::SLL_W", SDT_LoongArchIntBinOpW>;
 def loongarch_sra_w : SDNode<"LoongArchISD::SRA_W", SDT_LoongArchIntBinOpW>;
 def loongarch_srl_w : SDNode<"LoongArchISD::SRL_W", SDT_LoongArchIntBinOpW>;
@@ -1537,47 +1543,29 @@ def : Pat<(select GPR:$cond, GPR:$t, GPR:$f),
 
 /// Branches and jumps
 
-class BccPat<PatFrag CondOp, LAInst Inst>
-    : Pat<(brcond (GRLenVT (CondOp GPR:$rj, GPR:$rd)), bb:$imm16),
-          (Inst GPR:$rj, GPR:$rd, bb:$imm16)>;
-
-def : BccPat<seteq, BEQ>;
-def : BccPat<setne, BNE>;
-def : BccPat<setlt, BLT>;
-def : BccPat<setge, BGE>;
-def : BccPat<setult, BLTU>;
-def : BccPat<setuge, BGEU>;
-
-class BccSwapPat<PatFrag CondOp, LAInst InstBcc>
-    : Pat<(brcond (GRLenVT (CondOp GPR:$rd, GPR:$rj)), bb:$imm16),
-          (InstBcc GPR:$rj, GPR:$rd, bb:$imm16)>;
-
-// Condition codes that don't have matching LoongArch branch instructions, but
-// are trivially supported by swapping the two input operands.
-def : BccSwapPat<setgt, BLT>;
-def : BccSwapPat<setle, BGE>;
-def : BccSwapPat<setugt, BLTU>;
-def : BccSwapPat<setule, BGEU>;
-
 let Predicates = [Has32S] in {
-// An extra pattern is needed for a brcond without a setcc (i.e. where the
-// condition was calculated elsewhere).
-def : Pat<(brcond GPR:$rj, bb:$imm21), (BNEZ GPR:$rj, bb:$imm21)>;
-
-def : Pat<(brcond (GRLenVT (seteq GPR:$rj, 0)), bb:$imm21),
-          (BEQZ GPR:$rj, bb:$imm21)>;
-def : Pat<(brcond (GRLenVT (setne GPR:$rj, 0)), bb:$imm21),
-          (BNEZ GPR:$rj, bb:$imm21)>;
+class BccZeroPat<CondCode Cond, LAInst Inst>
+    : Pat<(loongarch_brcc (GRLenVT GPR:$rj), 0, Cond, bb:$imm21),
+           (Inst GPR:$rj, bb:$imm21)>;
+
+def : BccZeroPat<SETEQ, BEQZ>;
+def : BccZeroPat<SETNE, BNEZ>;
 } // Predicates = [Has32S]
 
-// An extra pattern is needed for a brcond without a setcc (i.e. where the
-// condition was calculated elsewhere).
-def : Pat<(brcond GPR:$rj, bb:$imm16), (BNE GPR:$rj, R0, bb:$imm16)>;
+multiclass BccPat<CondCode Cond, LAInst Inst> {
+  def : Pat<(loongarch_brcc (GRLenVT GPR:$rj), GPR:$rd, Cond, bb:$imm16),
+            (Inst GPR:$rj, GPR:$rd, bb:$imm16)>;
+  // Explicitly select 0 to R0. The register coalescer doesn't always do it.
+  def : Pat<(loongarch_brcc (GRLenVT GPR:$rj), 0, Cond, bb:$imm16),
+            (Inst GPR:$rj, (GRLenVT R0), bb:$imm16)>;
+}
 
-def : Pat<(brcond (GRLenVT (seteq GPR:$rj, 0)), bb:$imm16),
-          (BEQ GPR:$rj, R0, bb:$imm16)>;
-def : Pat<(brcond (GRLenVT (setne GPR:$rj, 0)), bb:$imm16),
-          (BNE GPR:$rj, R0, bb:$imm16)>;
+defm : BccPat<SETEQ, BEQ>;
+defm : BccPat<SETNE, BNE>;
+defm : BccPat<SETLT, BLT>;
+defm : BccPat<SETGE, BGE>;
+defm : BccPat<SETULT, BLTU>;
+defm : BccPat<SETUGE, BGEU>;
 
 let isBarrier = 1, isBranch = 1, isTerminator = 1 in
 def PseudoBR : Pseudo<(outs), (ins simm26_b:$imm26), [(br bb:$imm26)]>,
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll
index cff3484934214..713af3fd9c84d 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll
@@ -263,8 +263,7 @@ define i1 @fcmp_fast_olt(double %a, double %b, i1 %c) nounwind {
 ; LA32-NEXT:    movgr2fr.w $fa1, $zero
 ; LA32-NEXT:    movgr2frh.w $fa1, $zero
 ; LA32-NEXT:    fcmp.cle.d $fcc0, $fa1, $fa0
-; LA32-NEXT:    movcf2gr $a1, $fcc0
-; LA32-NEXT:    bnez $a1, .LBB16_2
+; LA32-NEXT:    bcnez $fcc0, .LBB16_2
 ; LA32-NEXT:  # %bb.1: # %if.then
 ; LA32-NEXT:    ret
 ; LA32-NEXT:  .LBB16_2: # %if.else
@@ -276,8 +275,7 @@ define i1 @fcmp_fast_olt(double %a, double %b, i1 %c) nounwind {
 ; LA64:       # %bb.0:
 ; LA64-NEXT:    movgr2fr.d $fa1, $zero
 ; LA64-NEXT:    fcmp.cle.d $fcc0, $fa1, $fa0
-; LA64-NEXT:    movcf2gr $a1, $fcc0
-; LA64-NEXT:    bnez $a1, .LBB16_2
+; LA64-NEXT:    bcnez $fcc0, .LBB16_2
 ; LA64-NEXT:  # %bb.1: # %if.then
 ; LA64-NEXT:    ret
 ; LA64-NEXT:  .LBB16_2: # %if.else
@@ -300,9 +298,7 @@ define i1 @fcmp_fast_oeq(double %a, double %b, i1 %c) nounwind {
 ; LA32-NEXT:    movgr2fr.w $fa1, $zero
 ; LA32-NEXT:    movgr2frh.w $fa1, $zero
 ; LA32-NEXT:    fcmp.ceq.d $fcc0, $fa0, $fa1
-; LA32-NEXT:    movcf2gr $a1, $fcc0
-; LA32-NEXT:    xori $a1, $a1, 1
-; LA32-NEXT:    bnez $a1, .LBB17_2
+; LA32-NEXT:    bceqz $fcc0, .LBB17_2
 ; LA32-NEXT:  # %bb.1: # %if.then
 ; LA32-NEXT:    ret
 ; LA32-NEXT:  .LBB17_2: # %if.else
@@ -313,9 +309,7 @@ define i1 @fcmp_fast_oeq(double %a, double %b, i1 %c) nounwind {
 ; LA64:       # %bb.0:
 ; LA64-NEXT:    movgr2fr.d $fa1, $zero
 ; LA64-NEXT:    fcmp.ceq.d $fcc0, $fa0, $fa1
-; LA64-NEXT:    movcf2gr $a1, $fcc0
-; LA64-NEXT:    xori $a1, $a1, 1
-; LA64-NEXT:    bnez $a1, .LBB17_2
+; LA64-NEXT:    bceqz $fcc0, .LBB17_2
 ; LA64-NEXT:  # %bb.1: # %if.then
 ; LA64-NEXT:    ret
 ; LA64-NEXT:  .LBB17_2: # %if.else
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll
index 8b682ecac50f5..4a97f693fafd7 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll
@@ -262,8 +262,7 @@ define i1 @fcmp_fast_olt(float %a, float %b, i1 %c) nounwind {
 ; LA32:       # %bb.0:
 ; LA32-NEXT:    movgr2fr.w $fa1, $zero
 ; LA32-NEXT:    fcmp.cle.s $fcc0, $fa1, $fa0
-; LA32-NEXT:    movcf2gr $a1, $fcc0
-; LA32-NEXT:    bnez $a1, .LBB16_2
+; LA32-NEXT:    bcnez $fcc0, .LBB16_2
 ; LA32-NEXT:  # %bb.1: # %if.then
 ; LA32-NEXT:    ret
 ; LA32-NEXT:  .LBB16_2: # %if.else
@@ -275,8 +274,7 @@ define i1 @fcmp_fast_olt(float %a, float %b, i1 %c) nounwind {
 ; LA64:       # %bb.0:
 ; LA64-NEXT:    movgr2fr.w $fa1, $zero
 ; LA64-NEXT:    fcmp.cle.s $fcc0, $fa1, $fa0
-; LA64-NEXT:    movcf2gr $a1, $fcc0
-; LA64-NEXT:    bnez $a1, .LBB16_2
+; LA64-NEXT:    bcnez $fcc0, .LBB16_2
 ; LA64-NEXT:  # %bb.1: # %if.then
 ; LA64-NEXT:    ret
 ...
[truncated]

(BCNEZ (CmpInst RegTy:$fj, RegTy:$fk), bb:$imm21)>;
}

defm : PatFPBrcond<SETOEQ, FCMP_CEQ_S, FPR32>;
defm : PatFPBrcond<SETEQ , FCMP_CEQ_S, FPR32>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this was added after encountering a "cannot select instruction" error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def : PatFPSetcc<SETEQ, FCMP_CEQ_S, FPR32>;

def : PatFPSetcc<SETLE, FCMP_CLE_S, FPR32>;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants