[arm64] Add tan intrinsic lowering #94545

farzonl · 2024-06-05T23:05:26Z

This change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This PR is just for Tan.

Now that x86 tan backend landed: #90503 we can add other backends since the shared pieces are in tree now.

Changes:

llvm/include/llvm/Analysis/VecFuncs.def - vectorization of tan for arm64 backends.
llvm/lib/Target/AArch64/AArch64FastISel.cpp - Add tan to the libcall table
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp - Add tan expansion for f128, f16, and vector\neon operations
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp define G_FTAN as a legal arm64 instruction

resolves #94755

llvmbot · 2024-06-05T23:05:53Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Farzon Lotfi (farzonl)

Changes

This change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This PR is just for Tan.

Now that x86 tan backend landed: #90503 we are one step closer to supporting constraint intrinsics for Tan that will be durable.

Changes:

llvm/include/llvm/Analysis/VecFuncs.def - vectorization of tan for arm64 backends.
llvm/lib/Target/AArch64/AArch64FastISel.cpp - Add tan to the libcall table
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp - Add tan expansion for f128, f16, and vector\neon operations
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp define G_FTAN as a legal arm64 instruction

Patch is 39.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94545.diff

14 Files Affected:

(modified) llvm/include/llvm/Analysis/VecFuncs.def (+14)
(modified) llvm/lib/Target/AArch64/AArch64FastISel.cpp (+10-6)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+15-10)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+2-3)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll (+8)
(added) llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir (+227)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+3-2)
(modified) llvm/test/CodeGen/AArch64/f16-instructions.ll (+26)
(modified) llvm/test/CodeGen/AArch64/fast-isel-runtime-libcall.ll (+24)
(modified) llvm/test/CodeGen/AArch64/illegal-float-ops.ll (+22)
(modified) llvm/test/CodeGen/AArch64/replace-with-veclib-armpl.ll (+45-1)
(modified) llvm/test/CodeGen/AArch64/vec-libcalls.ll (+32)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/veclib-calls-libsystem-darwin.ll (+144)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/veclib-intrinsic-calls.ll (+74-1)

diff --git a/llvm/include/llvm/Analysis/VecFuncs.def b/llvm/include/llvm/Analysis/VecFuncs.def
index 402189769c20c..ffdf8b8c3bc79 100644
--- a/llvm/include/llvm/Analysis/VecFuncs.def
+++ b/llvm/include/llvm/Analysis/VecFuncs.def
@@ -92,6 +92,11 @@ TLI_DEFINE_VECFUNC("llvm.sin.f64", "_simd_sin_d2", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("sinf", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("llvm.sin.f32", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
 
+TLI_DEFINE_VECFUNC("tan", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("tanf", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+
 // Floating-Point Arithmetic and Auxiliary Functions
 TLI_DEFINE_VECFUNC("cbrt", "_simd_cbrt_d2", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("cbrtf", "_simd_cbrt_f4", FIXED(4), "_ZGV_LLVM_N4v")
@@ -584,6 +589,7 @@ TLI_DEFINE_VECFUNC("sinpi", "_ZGVnN2v_sinpi", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("sqrt", "_ZGVnN2v_sqrt", FIXED(2), "_ZGV_LLVM_N2v")
 
 TLI_DEFINE_VECFUNC("tan", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
 
 TLI_DEFINE_VECFUNC("tanh", "_ZGVnN2v_tanh", FIXED(2), "_ZGV_LLVM_N2v")
 
@@ -681,6 +687,7 @@ TLI_DEFINE_VECFUNC("sinpif", "_ZGVnN4v_sinpif", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("sqrtf", "_ZGVnN4v_sqrtf", FIXED(4), "_ZGV_LLVM_N4v")
 
 TLI_DEFINE_VECFUNC("tanf", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
 
 TLI_DEFINE_VECFUNC("tanhf", "_ZGVnN4v_tanhf", FIXED(4), "_ZGV_LLVM_N4v")
 
@@ -828,6 +835,8 @@ TLI_DEFINE_VECFUNC("sqrtf", "_ZGVsMxv_sqrtf", SCALABLE(4), MASKED, "_ZGVsMxv")
 
 TLI_DEFINE_VECFUNC("tan", "_ZGVsMxv_tan",  SCALABLE(2), MASKED, "_ZGVsMxv")
 TLI_DEFINE_VECFUNC("tanf", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVsMxv_tan", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
 
 TLI_DEFINE_VECFUNC("tanh", "_ZGVsMxv_tanh",  SCALABLE(2), MASKED, "_ZGVsMxv")
 TLI_DEFINE_VECFUNC("tanhf", "_ZGVsMxv_tanhf", SCALABLE(4), MASKED, "_ZGVsMxv")
@@ -1087,6 +1096,11 @@ TLI_DEFINE_VECFUNC("tanf", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("tan", "armpl_svtan_f64_x",  SCALABLE(2), MASKED, "_ZGVsMxv")
 TLI_DEFINE_VECFUNC("tanf", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
 
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_vtanq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_svtan_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
+
 TLI_DEFINE_VECFUNC("tanh", "armpl_vtanhq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("tanhf", "armpl_vtanhq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("tanh", "armpl_svtanh_f64_x",  SCALABLE(2), MASKED, "_ZGVsMxv")
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index 62cf6a2c47acc..56fd15f23363d 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -3534,6 +3534,7 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
   }
   case Intrinsic::sin:
   case Intrinsic::cos:
+  case Intrinsic::tan:
   case Intrinsic::pow: {
     MVT RetVT;
     if (!isTypeLegal(II->getType(), RetVT))
@@ -3542,11 +3543,11 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
     if (RetVT != MVT::f32 && RetVT != MVT::f64)
       return false;
 
-    static const RTLIB::Libcall LibCallTable[3][2] = {
-      { RTLIB::SIN_F32, RTLIB::SIN_F64 },
-      { RTLIB::COS_F32, RTLIB::COS_F64 },
-      { RTLIB::POW_F32, RTLIB::POW_F64 }
-    };
+    static const RTLIB::Libcall LibCallTable[4][2] = {
+        {RTLIB::SIN_F32, RTLIB::SIN_F64},
+        {RTLIB::COS_F32, RTLIB::COS_F64},
+        {RTLIB::TAN_F32, RTLIB::TAN_F64},
+        {RTLIB::POW_F32, RTLIB::POW_F64}};
     RTLIB::Libcall LC;
     bool Is64Bit = RetVT == MVT::f64;
     switch (II->getIntrinsicID()) {
@@ -3558,9 +3559,12 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
     case Intrinsic::cos:
       LC = LibCallTable[1][Is64Bit];
       break;
-    case Intrinsic::pow:
+    case Intrinsic::tan:
       LC = LibCallTable[2][Is64Bit];
       break;
+    case Intrinsic::pow:
+      LC = LibCallTable[3][Is64Bit];
+      break;
     }
 
     ArgListTy Args;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 360a841bdade4..a8cf8f65029d3 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -543,6 +543,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::FSINCOS, MVT::f128, Expand);
   setOperationAction(ISD::FSQRT, MVT::f128, Expand);
   setOperationAction(ISD::FSUB, MVT::f128, LibCall);
+  setOperationAction(ISD::FTAN, MVT::f128, Expand);
   setOperationAction(ISD::FTRUNC, MVT::f128, Expand);
   setOperationAction(ISD::SETCC, MVT::f128, Custom);
   setOperationAction(ISD::STRICT_FSETCC, MVT::f128, Custom);
@@ -727,14 +728,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Promote);
   }
 
-  for (auto Op : {ISD::FREM,        ISD::FPOW,         ISD::FPOWI,
-                  ISD::FCOS,        ISD::FSIN,         ISD::FSINCOS,
-                  ISD::FEXP,        ISD::FEXP2,        ISD::FEXP10,
-                  ISD::FLOG,        ISD::FLOG2,        ISD::FLOG10,
-                  ISD::STRICT_FREM,
-                  ISD::STRICT_FPOW, ISD::STRICT_FPOWI, ISD::STRICT_FCOS,
-                  ISD::STRICT_FSIN, ISD::STRICT_FEXP,  ISD::STRICT_FEXP2,
-                  ISD::STRICT_FLOG, ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
+  for (auto Op : {ISD::FREM,         ISD::FPOW,         ISD::FPOWI,
+                  ISD::FCOS,         ISD::FSIN,         ISD::FSINCOS,
+                  ISD::FTAN,         ISD::FEXP,         ISD::FEXP2,
+                  ISD::FEXP10,       ISD::FLOG,         ISD::FLOG2,
+                  ISD::FLOG10,       ISD::STRICT_FREM,  ISD::STRICT_FPOW,
+                  ISD::STRICT_FPOWI, ISD::STRICT_FCOS,  ISD::STRICT_FSIN,
+                  ISD::STRICT_FEXP,  ISD::STRICT_FEXP2, ISD::STRICT_FLOG,
+                  ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
     setOperationAction(Op, MVT::f16, Promote);
     setOperationAction(Op, MVT::v4f16, Expand);
     setOperationAction(Op, MVT::v8f16, Expand);
@@ -1171,13 +1172,15 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
   if (Subtarget->isNeonAvailable()) {
     // FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
     // silliness like this:
+    // clang-format off
     for (auto Op :
          {ISD::SELECT,         ISD::SELECT_CC,
           ISD::BR_CC,          ISD::FADD,           ISD::FSUB,
           ISD::FMUL,           ISD::FDIV,           ISD::FMA,
           ISD::FNEG,           ISD::FABS,           ISD::FCEIL,
           ISD::FSQRT,          ISD::FFLOOR,         ISD::FNEARBYINT,
-          ISD::FSIN,           ISD::FCOS,           ISD::FPOW,
+          ISD::FSIN,           ISD::FCOS,           ISD::FTAN,
+          ISD::FPOW,
           ISD::FLOG,           ISD::FLOG2,          ISD::FLOG10,
           ISD::FEXP,           ISD::FEXP2,          ISD::FEXP10,
           ISD::FRINT,          ISD::FROUND,         ISD::FROUNDEVEN,
@@ -1190,7 +1193,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
           ISD::STRICT_FMINNUM, ISD::STRICT_FMAXNUM, ISD::STRICT_FMINIMUM,
           ISD::STRICT_FMAXIMUM})
       setOperationAction(Op, MVT::v1f64, Expand);
-
+    // clang-format on
     for (auto Op :
          {ISD::FP_TO_SINT, ISD::FP_TO_UINT, ISD::SINT_TO_FP, ISD::UINT_TO_FP,
           ISD::FP_ROUND, ISD::FP_TO_SINT_SAT, ISD::FP_TO_UINT_SAT, ISD::MUL,
@@ -1622,6 +1625,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::FCOS, VT, Expand);
       setOperationAction(ISD::FSIN, VT, Expand);
       setOperationAction(ISD::FSINCOS, VT, Expand);
+      setOperationAction(ISD::FTAN, VT, Expand);
       setOperationAction(ISD::FEXP, VT, Expand);
       setOperationAction(ISD::FEXP2, VT, Expand);
       setOperationAction(ISD::FEXP10, VT, Expand);
@@ -1803,6 +1807,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
   if (VT == MVT::v2f32 || VT == MVT::v4f32 || VT == MVT::v2f64) {
     setOperationAction(ISD::FSIN, VT, Expand);
     setOperationAction(ISD::FCOS, VT, Expand);
+    setOperationAction(ISD::FTAN, VT, Expand);
     setOperationAction(ISD::FPOW, VT, Expand);
     setOperationAction(ISD::FLOG, VT, Expand);
     setOperationAction(ISD::FLOG2, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 07a0473888ee5..42cd43c3afa37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -267,9 +267,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .libcallFor({{s64, s128}})
       .minScalarOrElt(1, MinFPScalar);
 
-  getActionDefinitionsBuilder(
-      {G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2, G_FLOG10,
-       G_FEXP, G_FEXP2, G_FEXP10})
+  getActionDefinitionsBuilder({G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2,
+                               G_FLOG10, G_FTAN, G_FEXP, G_FEXP2, G_FEXP10})
       // We need a call for these, so we always need to scalarize.
       .scalarize(0)
       // Regardless of FP16 support, widen 16-bit elements to 32-bits.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
index a61931b898aea..eb94cc5d0fb61 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -2313,6 +2313,14 @@ define float @test_sin_f32(float %x) {
   ret float %y
 }
 
+declare float @llvm.tan.f32(float)
+define float @test_tan_f32(float %x) {
+  ; CHECK-LABEL: name:            test_tan_f32
+  ; CHECK: %{{[0-9]+}}:_(s32) = G_FTAN %{{[0-9]+}}
+  %y = call float @llvm.tan.f32(float %x)
+  ret float %y
+}
+
 declare float @llvm.sqrt.f32(float)
 define float @test_sqrt_f32(float %x) {
   ; CHECK-LABEL: name:            test_sqrt_f32
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
new file mode 100644
index 0000000000000..455aae0128364
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
@@ -0,0 +1,227 @@
+# RUN: llc -verify-machineinstrs -mtriple aarch64--- \
+# RUN: -run-pass=legalizer -mattr=+fullfp16 -global-isel %s -o - \
+# RUN: | FileCheck %s
+...
+---
+name:            test_v4f16.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $d0
+    ; CHECK-LABEL: name:            test_v4f16.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s16), [[V2:%[0-9]+]]:_(s16), [[V3:%[0-9]+]]:_(s16), [[V4:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s16>)
+
+    ; CHECK-DAG: [[V1_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V1]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V1_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT1_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT1:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT1_S32]](s32)
+
+    ; CHECK-DAG: [[V2_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V2]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V2_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT2_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT2:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT2_S32]](s32)
+
+    ; CHECK-DAG: [[V3_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V3]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V3_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT3_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT3:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT3_S32]](s32)
+
+    ; CHECK-DAG: [[V4_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V4]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V4_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT4_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT4:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT4_S32]](s32)
+
+    ; CHECK-DAG: %{{[0-9]+}}:_(<4 x s16>) = G_BUILD_VECTOR [[ELT1]](s16), [[ELT2]](s16), [[ELT3]](s16), [[ELT4]](s16)
+
+    %0:_(<4 x s16>) = COPY $d0
+    %1:_(<4 x s16>) = G_FTAN %0
+    $d0 = COPY %1(<4 x s16>)
+    RET_ReallyLR implicit $d0
+
+...
+---
+name:            test_v8f16.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $q0
+
+    ; CHECK-LABEL: name:            test_v8f16.tan
+
+    ; This is big, so let's just check for the 8 calls to tanf, the the
+    ; G_UNMERGE_VALUES, and the G_BUILD_VECTOR. The other instructions ought
+    ; to be covered by the other tests.
+
+    ; CHECK: G_UNMERGE_VALUES
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: G_BUILD_VECTOR
+
+    %0:_(<8 x s16>) = COPY $q0
+    %1:_(<8 x s16>) = G_FTAN %0
+    $q0 = COPY %1(<8 x s16>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            test_v2f32.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $d0
+
+    ; CHECK-LABEL: name:            test_v2f32.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s32>)
+
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: %1:_(<2 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32)
+
+    %0:_(<2 x s32>) = COPY $d0
+    %1:_(<2 x s32>) = G_FTAN %0
+    $d0 = COPY %1(<2 x s32>)
+    RET_ReallyLR implicit $d0
+
+...
+---
+name:            test_v4f32.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $q0
+    ; CHECK-LABEL: name:            test_v4f32.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32), [[V3:%[0-9]+]]:_(s32), [[V4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s32>)
+
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V3]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT3:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V4]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT4:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: %1:_(<4 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32), [[ELT3]](s32), [[ELT4]](s32)
+
+    %0:_(<4 x s32>) = COPY $q0
+    %1:_(<4 x s32>) = G_FTAN %0
+    $q0 = COPY %1(<4 x s32>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            test_v2f64.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $q0
+
+    ; CHECK-LABEL: name:            test_v2f64.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s64), [[V2:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s64>)
+
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $d0 = COPY [[V1]](s64)
+    ; CHECK-DAG: BL &tan
+    ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s64) = COPY $d0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $d0 = COPY [[V2]](s64)
+    ; CHECK-DAG: BL &tan
+    ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s64) = COPY $d0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: %1:_(<2 x s64>) = G_BUILD_VECTOR [[ELT1]](s64), [[ELT2]](s64)
+
+    %0:_(<2 x s64>) = COPY $q0
+    %1:_(<2 x s64>) = G_FTAN %0
+    $q0 = COPY %1(<2 x s64>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            test_tan_half
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $h0
+    ; CHECK-LABEL: name:            test_tan_half
+    ; CHECK: [[REG1:%[0-9]+]]:_(s32) = G_FPEXT %0(s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[REG1]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[REG2:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[RES:%[0-9]+]]:_(s16) = G_FPTRUNC [[REG2]](s32)
+
+    %0:_(s16) = COPY $h0
+    %1:_(s16) = G_FTAN %0
+    $h0 = COPY %1(s16)
+    RET_ReallyLR implicit $h0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index d71111b57efe5..04c9a08c50038 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -675,8 +675,9 @@
 # DEBUG-NEXT: .. the first uncovered type index: 1, OK
 # DEBUG-NEXT: .. the first uncovered imm index: 0, OK
 # DEBUG-NEXT: G_FTAN (opcode {{[0-9]+}}): 1 type index, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. the first uncovered type index: 1, OK
+# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
 # DEBUG-NEXT: G_FSQRT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
 # DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
 # DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
diff --git a/llvm/test/CodeGen/AArch64/f16-instructions.ll b/llvm/test/CodeGen/AArch64/f16-instructions.ll
index fa362ae798aa8..998f7dd38c3f5 100644
--- a/llvm/test/CodeGen/AArch64/f16-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/f16-instructions.ll
@@ -759,6 +759,7 @@ declare half @llvm.sqrt.f16(half %a) #0
 declare half @llvm.powi.f16.i32(half %a, i32 %b) #0
 declare half @llvm.sin.f16(half %a) #0
 declare half @llvm.cos.f16(half %a) #0
+declare half @llvm.tan.f16(half %a) #0
 declare half @llvm.pow.f16(half %a, half %b) #0
 declare half @llvm.exp.f16(half %a) #0
 declare half @llvm.exp2.f16(half %a) #0
@@ -870,6 +871,31 @@ define half @test_cos(half %a) #0 {
   ret half %r
 }
 
+; FALLBACK-NOT: remark:{{.*}}test_tan
+; FALLBACK-FP16-NOT: remark:{{.*}}test_tan
+
+; CHECK-COMMON-LABEL: test_tan:
+; CHECK-COMMON-NEXT: stp x29, x30, [sp, #-16]!
+; CHECK-COMMON-NEXT: mov  x29, sp
+; CHECK-COMMON-NEXT: fcvt s0, h0
+; CHECK-COMMON-NEXT: bl {{_?}}tanf
+; CHECK-COMMON-NEXT: fcvt h0, s0
+; CHECK-COMMON-NEXT: ldp x29, x30, [sp], #16
+; CHECK-COMMON-NEXT: ret
+
+; GISEL-LABEL: test_tan:
+; GISEL-NEXT: stp x29, x30, [sp, #-16]!
+; GISEL-NEXT: mov  x29, sp
+; GISEL-NEXT: fcvt s0, h0
+; GISEL-NEXT: bl {{_?}}tanf
+; GISEL-NEXT: fcvt h0, s0
+; GISEL-NEXT: ldp x29, ...
[truncated]

llvmbot · 2024-06-05T23:05:53Z

@llvm/pr-subscribers-llvm-globalisel

Author: Farzon Lotfi (farzonl)

Changes

This change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This PR is just for Tan.

Now that x86 tan backend landed: #90503 we are one step closer to supporting constraint intrinsics for Tan that will be durable.

Changes:

llvm/include/llvm/Analysis/VecFuncs.def - vectorization of tan for arm64 backends.
llvm/lib/Target/AArch64/AArch64FastISel.cpp - Add tan to the libcall table
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp - Add tan expansion for f128, f16, and vector\neon operations
llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp define G_FTAN as a legal arm64 instruction

Patch is 39.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94545.diff

14 Files Affected:

(modified) llvm/include/llvm/Analysis/VecFuncs.def (+14)
(modified) llvm/lib/Target/AArch64/AArch64FastISel.cpp (+10-6)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+15-10)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+2-3)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll (+8)
(added) llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir (+227)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+3-2)
(modified) llvm/test/CodeGen/AArch64/f16-instructions.ll (+26)
(modified) llvm/test/CodeGen/AArch64/fast-isel-runtime-libcall.ll (+24)
(modified) llvm/test/CodeGen/AArch64/illegal-float-ops.ll (+22)
(modified) llvm/test/CodeGen/AArch64/replace-with-veclib-armpl.ll (+45-1)
(modified) llvm/test/CodeGen/AArch64/vec-libcalls.ll (+32)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/veclib-calls-libsystem-darwin.ll (+144)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/veclib-intrinsic-calls.ll (+74-1)

diff --git a/llvm/include/llvm/Analysis/VecFuncs.def b/llvm/include/llvm/Analysis/VecFuncs.def
index 402189769c20c..ffdf8b8c3bc79 100644
--- a/llvm/include/llvm/Analysis/VecFuncs.def
+++ b/llvm/include/llvm/Analysis/VecFuncs.def
@@ -92,6 +92,11 @@ TLI_DEFINE_VECFUNC("llvm.sin.f64", "_simd_sin_d2", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("sinf", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("llvm.sin.f32", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
 
+TLI_DEFINE_VECFUNC("tan", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("tanf", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+
 // Floating-Point Arithmetic and Auxiliary Functions
 TLI_DEFINE_VECFUNC("cbrt", "_simd_cbrt_d2", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("cbrtf", "_simd_cbrt_f4", FIXED(4), "_ZGV_LLVM_N4v")
@@ -584,6 +589,7 @@ TLI_DEFINE_VECFUNC("sinpi", "_ZGVnN2v_sinpi", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("sqrt", "_ZGVnN2v_sqrt", FIXED(2), "_ZGV_LLVM_N2v")
 
 TLI_DEFINE_VECFUNC("tan", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
 
 TLI_DEFINE_VECFUNC("tanh", "_ZGVnN2v_tanh", FIXED(2), "_ZGV_LLVM_N2v")
 
@@ -681,6 +687,7 @@ TLI_DEFINE_VECFUNC("sinpif", "_ZGVnN4v_sinpif", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("sqrtf", "_ZGVnN4v_sqrtf", FIXED(4), "_ZGV_LLVM_N4v")
 
 TLI_DEFINE_VECFUNC("tanf", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
 
 TLI_DEFINE_VECFUNC("tanhf", "_ZGVnN4v_tanhf", FIXED(4), "_ZGV_LLVM_N4v")
 
@@ -828,6 +835,8 @@ TLI_DEFINE_VECFUNC("sqrtf", "_ZGVsMxv_sqrtf", SCALABLE(4), MASKED, "_ZGVsMxv")
 
 TLI_DEFINE_VECFUNC("tan", "_ZGVsMxv_tan",  SCALABLE(2), MASKED, "_ZGVsMxv")
 TLI_DEFINE_VECFUNC("tanf", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVsMxv_tan", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
 
 TLI_DEFINE_VECFUNC("tanh", "_ZGVsMxv_tanh",  SCALABLE(2), MASKED, "_ZGVsMxv")
 TLI_DEFINE_VECFUNC("tanhf", "_ZGVsMxv_tanhf", SCALABLE(4), MASKED, "_ZGVsMxv")
@@ -1087,6 +1096,11 @@ TLI_DEFINE_VECFUNC("tanf", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("tan", "armpl_svtan_f64_x",  SCALABLE(2), MASKED, "_ZGVsMxv")
 TLI_DEFINE_VECFUNC("tanf", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
 
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_vtanq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_svtan_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
+
 TLI_DEFINE_VECFUNC("tanh", "armpl_vtanhq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("tanhf", "armpl_vtanhq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("tanh", "armpl_svtanh_f64_x",  SCALABLE(2), MASKED, "_ZGVsMxv")
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index 62cf6a2c47acc..56fd15f23363d 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -3534,6 +3534,7 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
   }
   case Intrinsic::sin:
   case Intrinsic::cos:
+  case Intrinsic::tan:
   case Intrinsic::pow: {
     MVT RetVT;
     if (!isTypeLegal(II->getType(), RetVT))
@@ -3542,11 +3543,11 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
     if (RetVT != MVT::f32 && RetVT != MVT::f64)
       return false;
 
-    static const RTLIB::Libcall LibCallTable[3][2] = {
-      { RTLIB::SIN_F32, RTLIB::SIN_F64 },
-      { RTLIB::COS_F32, RTLIB::COS_F64 },
-      { RTLIB::POW_F32, RTLIB::POW_F64 }
-    };
+    static const RTLIB::Libcall LibCallTable[4][2] = {
+        {RTLIB::SIN_F32, RTLIB::SIN_F64},
+        {RTLIB::COS_F32, RTLIB::COS_F64},
+        {RTLIB::TAN_F32, RTLIB::TAN_F64},
+        {RTLIB::POW_F32, RTLIB::POW_F64}};
     RTLIB::Libcall LC;
     bool Is64Bit = RetVT == MVT::f64;
     switch (II->getIntrinsicID()) {
@@ -3558,9 +3559,12 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
     case Intrinsic::cos:
       LC = LibCallTable[1][Is64Bit];
       break;
-    case Intrinsic::pow:
+    case Intrinsic::tan:
       LC = LibCallTable[2][Is64Bit];
       break;
+    case Intrinsic::pow:
+      LC = LibCallTable[3][Is64Bit];
+      break;
     }
 
     ArgListTy Args;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 360a841bdade4..a8cf8f65029d3 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -543,6 +543,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::FSINCOS, MVT::f128, Expand);
   setOperationAction(ISD::FSQRT, MVT::f128, Expand);
   setOperationAction(ISD::FSUB, MVT::f128, LibCall);
+  setOperationAction(ISD::FTAN, MVT::f128, Expand);
   setOperationAction(ISD::FTRUNC, MVT::f128, Expand);
   setOperationAction(ISD::SETCC, MVT::f128, Custom);
   setOperationAction(ISD::STRICT_FSETCC, MVT::f128, Custom);
@@ -727,14 +728,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Promote);
   }
 
-  for (auto Op : {ISD::FREM,        ISD::FPOW,         ISD::FPOWI,
-                  ISD::FCOS,        ISD::FSIN,         ISD::FSINCOS,
-                  ISD::FEXP,        ISD::FEXP2,        ISD::FEXP10,
-                  ISD::FLOG,        ISD::FLOG2,        ISD::FLOG10,
-                  ISD::STRICT_FREM,
-                  ISD::STRICT_FPOW, ISD::STRICT_FPOWI, ISD::STRICT_FCOS,
-                  ISD::STRICT_FSIN, ISD::STRICT_FEXP,  ISD::STRICT_FEXP2,
-                  ISD::STRICT_FLOG, ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
+  for (auto Op : {ISD::FREM,         ISD::FPOW,         ISD::FPOWI,
+                  ISD::FCOS,         ISD::FSIN,         ISD::FSINCOS,
+                  ISD::FTAN,         ISD::FEXP,         ISD::FEXP2,
+                  ISD::FEXP10,       ISD::FLOG,         ISD::FLOG2,
+                  ISD::FLOG10,       ISD::STRICT_FREM,  ISD::STRICT_FPOW,
+                  ISD::STRICT_FPOWI, ISD::STRICT_FCOS,  ISD::STRICT_FSIN,
+                  ISD::STRICT_FEXP,  ISD::STRICT_FEXP2, ISD::STRICT_FLOG,
+                  ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
     setOperationAction(Op, MVT::f16, Promote);
     setOperationAction(Op, MVT::v4f16, Expand);
     setOperationAction(Op, MVT::v8f16, Expand);
@@ -1171,13 +1172,15 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
   if (Subtarget->isNeonAvailable()) {
     // FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
     // silliness like this:
+    // clang-format off
     for (auto Op :
          {ISD::SELECT,         ISD::SELECT_CC,
           ISD::BR_CC,          ISD::FADD,           ISD::FSUB,
           ISD::FMUL,           ISD::FDIV,           ISD::FMA,
           ISD::FNEG,           ISD::FABS,           ISD::FCEIL,
           ISD::FSQRT,          ISD::FFLOOR,         ISD::FNEARBYINT,
-          ISD::FSIN,           ISD::FCOS,           ISD::FPOW,
+          ISD::FSIN,           ISD::FCOS,           ISD::FTAN,
+          ISD::FPOW,
           ISD::FLOG,           ISD::FLOG2,          ISD::FLOG10,
           ISD::FEXP,           ISD::FEXP2,          ISD::FEXP10,
           ISD::FRINT,          ISD::FROUND,         ISD::FROUNDEVEN,
@@ -1190,7 +1193,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
           ISD::STRICT_FMINNUM, ISD::STRICT_FMAXNUM, ISD::STRICT_FMINIMUM,
           ISD::STRICT_FMAXIMUM})
       setOperationAction(Op, MVT::v1f64, Expand);
-
+    // clang-format on
     for (auto Op :
          {ISD::FP_TO_SINT, ISD::FP_TO_UINT, ISD::SINT_TO_FP, ISD::UINT_TO_FP,
           ISD::FP_ROUND, ISD::FP_TO_SINT_SAT, ISD::FP_TO_UINT_SAT, ISD::MUL,
@@ -1622,6 +1625,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::FCOS, VT, Expand);
       setOperationAction(ISD::FSIN, VT, Expand);
       setOperationAction(ISD::FSINCOS, VT, Expand);
+      setOperationAction(ISD::FTAN, VT, Expand);
       setOperationAction(ISD::FEXP, VT, Expand);
       setOperationAction(ISD::FEXP2, VT, Expand);
       setOperationAction(ISD::FEXP10, VT, Expand);
@@ -1803,6 +1807,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
   if (VT == MVT::v2f32 || VT == MVT::v4f32 || VT == MVT::v2f64) {
     setOperationAction(ISD::FSIN, VT, Expand);
     setOperationAction(ISD::FCOS, VT, Expand);
+    setOperationAction(ISD::FTAN, VT, Expand);
     setOperationAction(ISD::FPOW, VT, Expand);
     setOperationAction(ISD::FLOG, VT, Expand);
     setOperationAction(ISD::FLOG2, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 07a0473888ee5..42cd43c3afa37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -267,9 +267,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .libcallFor({{s64, s128}})
       .minScalarOrElt(1, MinFPScalar);
 
-  getActionDefinitionsBuilder(
-      {G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2, G_FLOG10,
-       G_FEXP, G_FEXP2, G_FEXP10})
+  getActionDefinitionsBuilder({G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2,
+                               G_FLOG10, G_FTAN, G_FEXP, G_FEXP2, G_FEXP10})
       // We need a call for these, so we always need to scalarize.
       .scalarize(0)
       // Regardless of FP16 support, widen 16-bit elements to 32-bits.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
index a61931b898aea..eb94cc5d0fb61 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -2313,6 +2313,14 @@ define float @test_sin_f32(float %x) {
   ret float %y
 }
 
+declare float @llvm.tan.f32(float)
+define float @test_tan_f32(float %x) {
+  ; CHECK-LABEL: name:            test_tan_f32
+  ; CHECK: %{{[0-9]+}}:_(s32) = G_FTAN %{{[0-9]+}}
+  %y = call float @llvm.tan.f32(float %x)
+  ret float %y
+}
+
 declare float @llvm.sqrt.f32(float)
 define float @test_sqrt_f32(float %x) {
   ; CHECK-LABEL: name:            test_sqrt_f32
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
new file mode 100644
index 0000000000000..455aae0128364
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
@@ -0,0 +1,227 @@
+# RUN: llc -verify-machineinstrs -mtriple aarch64--- \
+# RUN: -run-pass=legalizer -mattr=+fullfp16 -global-isel %s -o - \
+# RUN: | FileCheck %s
+...
+---
+name:            test_v4f16.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $d0
+    ; CHECK-LABEL: name:            test_v4f16.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s16), [[V2:%[0-9]+]]:_(s16), [[V3:%[0-9]+]]:_(s16), [[V4:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s16>)
+
+    ; CHECK-DAG: [[V1_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V1]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V1_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT1_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT1:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT1_S32]](s32)
+
+    ; CHECK-DAG: [[V2_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V2]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V2_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT2_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT2:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT2_S32]](s32)
+
+    ; CHECK-DAG: [[V3_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V3]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V3_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT3_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT3:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT3_S32]](s32)
+
+    ; CHECK-DAG: [[V4_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V4]](s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[V4_S32]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[ELT4_S32:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[ELT4:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT4_S32]](s32)
+
+    ; CHECK-DAG: %{{[0-9]+}}:_(<4 x s16>) = G_BUILD_VECTOR [[ELT1]](s16), [[ELT2]](s16), [[ELT3]](s16), [[ELT4]](s16)
+
+    %0:_(<4 x s16>) = COPY $d0
+    %1:_(<4 x s16>) = G_FTAN %0
+    $d0 = COPY %1(<4 x s16>)
+    RET_ReallyLR implicit $d0
+
+...
+---
+name:            test_v8f16.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $q0
+
+    ; CHECK-LABEL: name:            test_v8f16.tan
+
+    ; This is big, so let's just check for the 8 calls to tanf, the the
+    ; G_UNMERGE_VALUES, and the G_BUILD_VECTOR. The other instructions ought
+    ; to be covered by the other tests.
+
+    ; CHECK: G_UNMERGE_VALUES
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: BL &tanf
+    ; CHECK: G_BUILD_VECTOR
+
+    %0:_(<8 x s16>) = COPY $q0
+    %1:_(<8 x s16>) = G_FTAN %0
+    $q0 = COPY %1(<8 x s16>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            test_v2f32.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $d0
+
+    ; CHECK-LABEL: name:            test_v2f32.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s32>)
+
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: %1:_(<2 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32)
+
+    %0:_(<2 x s32>) = COPY $d0
+    %1:_(<2 x s32>) = G_FTAN %0
+    $d0 = COPY %1(<2 x s32>)
+    RET_ReallyLR implicit $d0
+
+...
+---
+name:            test_v4f32.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $q0
+    ; CHECK-LABEL: name:            test_v4f32.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32), [[V3:%[0-9]+]]:_(s32), [[V4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s32>)
+
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V3]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT3:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $s0 = COPY [[V4]](s32)
+    ; CHECK-DAG: BL &tanf
+    ; CHECK-DAG: [[ELT4:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: %1:_(<4 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32), [[ELT3]](s32), [[ELT4]](s32)
+
+    %0:_(<4 x s32>) = COPY $q0
+    %1:_(<4 x s32>) = G_FTAN %0
+    $q0 = COPY %1(<4 x s32>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            test_v2f64.tan
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $q0
+
+    ; CHECK-LABEL: name:            test_v2f64.tan
+    ; CHECK: [[V1:%[0-9]+]]:_(s64), [[V2:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s64>)
+
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $d0 = COPY [[V1]](s64)
+    ; CHECK-DAG: BL &tan
+    ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s64) = COPY $d0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: ADJCALLSTACKDOWN
+    ; CHECK-DAG: $d0 = COPY [[V2]](s64)
+    ; CHECK-DAG: BL &tan
+    ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s64) = COPY $d0
+    ; CHECK-DAG: ADJCALLSTACKUP
+
+    ; CHECK-DAG: %1:_(<2 x s64>) = G_BUILD_VECTOR [[ELT1]](s64), [[ELT2]](s64)
+
+    %0:_(<2 x s64>) = COPY $q0
+    %1:_(<2 x s64>) = G_FTAN %0
+    $q0 = COPY %1(<2 x s64>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            test_tan_half
+alignment:       4
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: _ }
+  - { id: 1, class: _ }
+body:             |
+  bb.0:
+    liveins: $h0
+    ; CHECK-LABEL: name:            test_tan_half
+    ; CHECK: [[REG1:%[0-9]+]]:_(s32) = G_FPEXT %0(s16)
+    ; CHECK-NEXT: ADJCALLSTACKDOWN
+    ; CHECK-NEXT: $s0 = COPY [[REG1]](s32)
+    ; CHECK-NEXT: BL &tanf
+    ; CHECK-NEXT: ADJCALLSTACKUP
+    ; CHECK-NEXT: [[REG2:%[0-9]+]]:_(s32) = COPY $s0
+    ; CHECK-NEXT: [[RES:%[0-9]+]]:_(s16) = G_FPTRUNC [[REG2]](s32)
+
+    %0:_(s16) = COPY $h0
+    %1:_(s16) = G_FTAN %0
+    $h0 = COPY %1(s16)
+    RET_ReallyLR implicit $h0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index d71111b57efe5..04c9a08c50038 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -675,8 +675,9 @@
 # DEBUG-NEXT: .. the first uncovered type index: 1, OK
 # DEBUG-NEXT: .. the first uncovered imm index: 0, OK
 # DEBUG-NEXT: G_FTAN (opcode {{[0-9]+}}): 1 type index, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. the first uncovered type index: 1, OK
+# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
 # DEBUG-NEXT: G_FSQRT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
 # DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
 # DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
diff --git a/llvm/test/CodeGen/AArch64/f16-instructions.ll b/llvm/test/CodeGen/AArch64/f16-instructions.ll
index fa362ae798aa8..998f7dd38c3f5 100644
--- a/llvm/test/CodeGen/AArch64/f16-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/f16-instructions.ll
@@ -759,6 +759,7 @@ declare half @llvm.sqrt.f16(half %a) #0
 declare half @llvm.powi.f16.i32(half %a, i32 %b) #0
 declare half @llvm.sin.f16(half %a) #0
 declare half @llvm.cos.f16(half %a) #0
+declare half @llvm.tan.f16(half %a) #0
 declare half @llvm.pow.f16(half %a, half %b) #0
 declare half @llvm.exp.f16(half %a) #0
 declare half @llvm.exp2.f16(half %a) #0
@@ -870,6 +871,31 @@ define half @test_cos(half %a) #0 {
   ret half %r
 }
 
+; FALLBACK-NOT: remark:{{.*}}test_tan
+; FALLBACK-FP16-NOT: remark:{{.*}}test_tan
+
+; CHECK-COMMON-LABEL: test_tan:
+; CHECK-COMMON-NEXT: stp x29, x30, [sp, #-16]!
+; CHECK-COMMON-NEXT: mov  x29, sp
+; CHECK-COMMON-NEXT: fcvt s0, h0
+; CHECK-COMMON-NEXT: bl {{_?}}tanf
+; CHECK-COMMON-NEXT: fcvt h0, s0
+; CHECK-COMMON-NEXT: ldp x29, x30, [sp], #16
+; CHECK-COMMON-NEXT: ret
+
+; GISEL-LABEL: test_tan:
+; GISEL-NEXT: stp x29, x30, [sp, #-16]!
+; GISEL-NEXT: mov  x29, sp
+; GISEL-NEXT: fcvt s0, h0
+; GISEL-NEXT: bl {{_?}}tanf
+; GISEL-NEXT: fcvt h0, s0
+; GISEL-NEXT: ldp x29, ...
[truncated]

davemgreen

I assume these are very similar to cos? It all looks sensible to me.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

davemgreen · 2024-06-06T06:50:03Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+                  ISD::FCOS,         ISD::FSIN,         ISD::FSINCOS,
+                  ISD::FTAN,         ISD::FEXP,         ISD::FEXP2,
+                  ISD::FEXP10,       ISD::FLOG,         ISD::FLOG2,
+                  ISD::FLOG10,       ISD::STRICT_FREM,  ISD::STRICT_FPOW,


Is there a llvm.experimental.constrained.tan.f64 yet?

Thats in this PR: #94559

OK I see, great. And will that eventually involve adding testing for AArch64? Or is it best to add a quick STRICT_FTAN in here now? I know we have some tests for constrained.cos, and it doesn't look like much is needed to support them.

Looks likere there are two places to add tests:

llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll

llvm/test/CodeGen/AArch64/fp-intrinsics.ll

fp-intrinsics.ll I added now (ce4e2a0) to PR #94559 and everything works.

fp-intrinsics-fp16.ll will need to wait for this pr to merge before I add the tests because f16 is handled on a per target basis.

tan-aarch64-f16.patch.txt

paschalis-mpeis

Can you please add the sleef tests as well, in the files below?

replace-with-veclib-sleef-scalable.ll
replace-with-veclib-sleef.ll

This should be the equivalent you did for ArmPL (replace-with-veclib-armpl.ll).

Regarding the TLI mappings, the symbols do exist in the libraries SLEEF and ArmPL (libsleefgnuabi.so and libamath.so respectively).

The code changes look reasonable to me as well, but no strong opinion on areas I haven't touched (fast/global isels, darwin changes).

davemgreen

Thanks. LGTM if there are no other comments from @paschalis-mpeis on the tests.

paschalis-mpeis

@farzonl thanks a lot for adding the replace-with-veclib SLEEF tests. LGTM.

farzonl · 2024-06-07T13:41:54Z

@davemgreen @paschalis-mpeis thank you both for the review!

farzonl added 3 commits June 5, 2024 15:46

add aarch64 support

1274a08

add arm loop vectorization tests

5eb62da

run clang format

5f39da0

farzonl requested review from bogner and llvm-beanz June 5, 2024 23:05

farzonl self-assigned this Jun 5, 2024

llvmbot added backend:AArch64 llvm:globalisel llvm:analysis llvm:transforms labels Jun 5, 2024

farzonl changed the title ~~[arm64] Add tan intrinsic arm64 part 5~~ [arm64] Add tan intrinsic Jun 5, 2024

farzonl changed the title ~~[arm64] Add tan intrinsic~~ [arm64] Add tan intrinsic lowering Jun 5, 2024

farzonl requested a review from TNorthover June 6, 2024 02:10

davemgreen reviewed Jun 6, 2024

View reviewed changes

davemgreen requested a review from paschalis-mpeis June 6, 2024 06:54

paschalis-mpeis requested changes Jun 6, 2024

View reviewed changes

add sleef tests requested in PR.

0719e0f

farzonl requested a review from paschalis-mpeis June 6, 2024 20:47

davemgreen approved these changes Jun 7, 2024

View reviewed changes

paschalis-mpeis approved these changes Jun 7, 2024

View reviewed changes

farzonl merged commit 2f0308e into llvm:main Jun 7, 2024
7 checks passed

farzonl deleted the add-tan-intrinsic-arm64 branch June 7, 2024 15:15

paschalis-mpeis mentioned this pull request Jun 7, 2024

[TLI] ReplaceWithVecLib: drop Instruction support #94365

Merged

HerrCai0907 mentioned this pull request Jun 13, 2024

tidy #95384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[arm64] Add tan intrinsic lowering #94545

[arm64] Add tan intrinsic lowering #94545

farzonl commented Jun 5, 2024 •

edited

Loading

llvmbot commented Jun 5, 2024 •

edited

Loading

llvmbot commented Jun 5, 2024

davemgreen left a comment

davemgreen Jun 6, 2024

farzonl Jun 6, 2024

davemgreen Jun 6, 2024

farzonl Jun 6, 2024 •

edited

Loading

paschalis-mpeis left a comment •

edited

Loading

davemgreen left a comment

paschalis-mpeis left a comment

farzonl commented Jun 7, 2024

[arm64] Add tan intrinsic lowering #94545

[arm64] Add tan intrinsic lowering #94545

Conversation

farzonl commented Jun 5, 2024 • edited Loading

llvmbot commented Jun 5, 2024 • edited Loading

llvmbot commented Jun 5, 2024

davemgreen left a comment

Choose a reason for hiding this comment

davemgreen Jun 6, 2024

Choose a reason for hiding this comment

farzonl Jun 6, 2024

Choose a reason for hiding this comment

davemgreen Jun 6, 2024

Choose a reason for hiding this comment

farzonl Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

paschalis-mpeis left a comment • edited Loading

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

paschalis-mpeis left a comment

Choose a reason for hiding this comment

farzonl commented Jun 7, 2024

farzonl commented Jun 5, 2024 •

edited

Loading

llvmbot commented Jun 5, 2024 •

edited

Loading

farzonl Jun 6, 2024 •

edited

Loading

paschalis-mpeis left a comment •

edited

Loading