Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalISel] Fall back for bf16 conversions. #71470

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

aemerson
Copy link
Contributor

@aemerson aemerson commented Nov 7, 2023

We don't support these correctly since we don't yet have FP types.
AMDGPU tests were silently miscompiling bf16 as if they were fp16.

We don't support these correctly since we don't yet have FP types.
AMDGPU tests were silently miscompiling bf16 as if they were fp16.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 7, 2023

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Amara Emerson (aemerson)

Changes

We don't support these correctly since we don't yet have FP types.
AMDGPU tests were silently miscompiling bf16 as if they were fp16.


Patch is 68.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71470.diff

7 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+3)
  • (modified) llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll (+120-266)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.ll (+72-147)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.ll (+16-23)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+95-164)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+95-164)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (+29-44)
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index d8f9e30b2599779..3098c8ea468a9d1 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1484,6 +1484,9 @@ bool IRTranslator::translateBitCast(const User &U,
 
 bool IRTranslator::translateCast(unsigned Opcode, const User &U,
                                  MachineIRBuilder &MIRBuilder) {
+  if (U.getType()->getScalarType()->isBFloatTy() ||
+      U.getOperand(0)->getType()->getScalarType()->isBFloatTy())
+    return false;
   Register Op = getOrCreateVReg(*U.getOperand(0));
   Register Res = getOrCreateVReg(U);
   MIRBuilder.buildInstr(Opcode, {Res}, {Op});
diff --git a/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll b/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
index e3457421a49036e..e9bf515daabca9f 100644
--- a/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
+++ b/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
@@ -1,15 +1,15 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
 ; Test no legal f16. Should just keep the cast to f32 and
 ; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri < %s | FileCheck -check-prefixes=GCN,GFX7,GFX7-SDAG %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri < %s | FileCheck -check-prefixes=GCN,GFX7,GFX7-GISEL %s
+; RUN: llc -global-isel=1 -global-isel-abort=2 -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri < %s | FileCheck -check-prefixes=GCN,GFX7,GFX7-GISEL %s
 
 ; Test legal f16, no f16 fmed3. Should expand to min/max sequence
 ; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX8,GFX8-SDAG %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX8,GFX8-GISEL %s
+; RUN: llc -global-isel=1 -global-isel-abort=2 -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX8,GFX8-GISEL %s
 
 ; Legal f16 med3. InstCombine ought to shrink the f32 op to f16 so the codegen doesn't really matter for this.
 ; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9,GFX9-SDAG %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9,GFX9-GISEL %s
+; RUN: llc -global-isel=1 -global-isel-abort=2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9,GFX9-GISEL %s
 
 
 declare float @llvm.amdgcn.fmed3.f32(float, float, float) #0
@@ -773,61 +773,32 @@ define half @fmed3_fneg_fabs_f32_fpext_f16(half %arg0, half %arg1, half %arg2) #
 ; --------------------------------------------------------------------------------
 
 define bfloat @fmed3_f32_fpext_f16_fptrunc_bf16(half %arg0, half %arg1, half %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX7-SDAG:       ; %bb.0:
-; GFX7-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT:    v_and_b32_e32 v0, 0xffff0000, v0
-; GFX7-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX7-GISEL:       ; %bb.0:
-; GFX7-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX8-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT:    v_and_b32_e32 v0, 0xffff0000, v0
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT:    v_and_b32_e32 v0, 0xffff0000, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT:    v_and_b32_e32 v0, 0xffff0000, v0
+; GFX7-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v1, v1
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; GFX8-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT:    v_and_b32_e32 v0, 0xffff0000, v0
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v1, v1
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; GFX9-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT:    v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %arg0.ext = fpext half %arg0 to float
   %arg1.ext = fpext half %arg1 to float
   %arg2.ext = fpext half %arg2 to float
@@ -1039,56 +1010,27 @@ define half @fmed3_f32_fpext_f16_multi_use_2(half %arg0, half %arg1, half %arg2,
 }
 
 define half @fmed3_f32_fpext_bf16(bfloat %arg0, bfloat %arg1, bfloat %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_bf16:
-; GFX7-SDAG:       ; %bb.0:
-; GFX7-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_bf16:
-; GFX7-GISEL:       ; %bb.0:
-; GFX7-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_bf16:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_bf16:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_bf16:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_bf16:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_bf16:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_bf16:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_bf16:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %arg0.ext = fpext bfloat %arg0 to float
   %arg1.ext = fpext bfloat %arg1 to float
   %arg2.ext = fpext bfloat %arg2 to float
@@ -1098,60 +1040,31 @@ define half @fmed3_f32_fpext_bf16(bfloat %arg0, bfloat %arg1, bfloat %arg2) #1 {
 }
 
 define half @fmed3_f32_fpext_f16_bf16_0(bfloat %arg0, half %arg1, half %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX7-SDAG:       ; %bb.0:
-; GFX7-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX7-GISEL:       ; %bb.0:
-; GFX7-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX8-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_bf16_0:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_bf16_0:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v1, v1
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; GFX8-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_bf16_0:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v1, v1
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; GFX9-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %arg0.ext = fpext bfloat %arg0 to float
   %arg1.ext = fpext half %arg1 to float
   %arg2.ext = fpext half %arg2 to float
@@ -1161,60 +1074,31 @@ define half @fmed3_f32_fpext_f16_bf16_0(bfloat %arg0, half %arg1, half %arg2) #1
 }
 
 define half @fmed3_f32_fpext_f16_bf16_1(half %arg0, bfloat %arg1, half %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX7-SDAG:       ; %bb.0:
-; GFX7-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX7-GISEL:       ; %bb.0:
-; GFX7-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX8-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_bf16_1:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_bf16_1:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; GFX8-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_bf16_1:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; GFX9-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %arg0.ext = fpext half %arg0 to float
   %arg1.ext = fpext bfloat %arg1 to float
   %arg2.ext = fpext half %arg2 to float
@@ -1224,60 +1108,31 @@ define half @fmed3_f32_fpext_f16_bf16_1(half %arg0, bfloat %arg1, half %arg2) #1
 }
 
 define half @fmed3_f32_fpext_f16_bf16_2(half %arg0, half %arg1, bfloat %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX7-SDAG:       ; %bb.0:
-; GFX7-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX7-GISEL:       ; %bb.0:
-; GFX7-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX8-SDAG-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX8-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT:    v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-SDAG-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-SDAG-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_bf16_2:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_bf16_2:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX8-NEXT:    v_cvt_f32_f16_e32 v1, v1
+; GFX8-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT:    v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT:    v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_bf16_2:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; GFX9-NEXT:    v_cvt_f32_f16_e32 v1, v1
+; GFX9-NEXT:    v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT:    v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
   %arg0.ext = fpext half %arg0 to float
   %arg1.ext = fpext half %arg1 to float
   %arg2.ext = fpext bfloat %arg2 to float
@@ -1488,4 +1343,3 @@ attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn me...
[truncated]

@aemerson aemerson requested a review from arsenm November 7, 2023 01:09
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also hidden conversions in all the ABI contexts ideally would skip as well

@aemerson aemerson merged commit 6b69584 into llvm:main Nov 7, 2023
5 checks passed
@aemerson aemerson deleted the amdgpu-bfloat-fallback branch November 7, 2023 05:19
@davemgreen
Copy link
Collaborator

Nice. I was thinking of adding this check for aarch64 at least, it makes sense to do it generically.

How do we think bf16 should be supported? As far as I understand the options would be

  • to have it as part of the type information (bf16 as opposed to s16)
  • to have it as a different operation (G_FADDBF16 as opposed to G_FADD)
  • or to have it as an flag on the existing operations (G_FADD bf16 as opposed to G_FADD [ieee] maybe).

Maybe there are other options too? There are other (multiple) fp8 types being proposed currently. Any thought so far on what the best way to support them would be?

@arsenm
Copy link
Contributor

arsenm commented Nov 7, 2023

Nice. I was thinking of adding this check for aarch64 at least, it makes sense to do it generically.

  • to have it as part of the type information (bf16 as opposed to s16)

We were talking about adding it to LLT. I think we should steal some bits from the size or element count and turn it into an enum for the class, such that any FP type would still be isScalar but identifiable

@davemgreen
Copy link
Collaborator

OK. Sounds like that should work, so long as it was still simple to keep the types agnostic where its beneficial (load and stores and shuffles and whatnot). There are multiple types of fp8 already, two that get used in Arm, maybe more that 4 in total but I'm not sure how many will need to be supported. There is still a few bits in vector LLTs that could be used currently.

@erickq
Copy link
Member

erickq commented Jan 15, 2024

Now an error is reported for a very simple bf16 data addition. Is this modification too violent?

#include <arm_neon.h>
bfloat16_t test(bfloat16_t gpara1, bfloat16_t gpara2)
{
    return gpara1+gpara2;
}

@arsenm
Copy link
Contributor

arsenm commented Jan 15, 2024

Now an error is reported for a very simple bf16 data addition. Is this modification too violent?

#include <arm_neon.h>
bfloat16_t test(bfloat16_t gpara1, bfloat16_t gpara2)
{
    return gpara1+gpara2;
}

No, this is exactly the kind of case that is supposed to be rejected

@erickq
Copy link
Member

erickq commented Jan 15, 2024

Now an error is reported for a very simple bf16 data addition. Is this modification too violent?

#include <arm_neon.h>
bfloat16_t test(bfloat16_t gpara1, bfloat16_t gpara2)
{
    return gpara1+gpara2;
}

No, this is exactly the kind of case that is supposed to be rejected

clang -O0 -march=armv8.2-a+bf16 test.c

I don't really understand that at this point our hardware already supports the bf16 data type.

@aemerson
Copy link
Contributor Author

Now an error is reported for a very simple bf16 data addition. Is this modification too violent?

#include <arm_neon.h>
bfloat16_t test(bfloat16_t gpara1, bfloat16_t gpara2)
{
    return gpara1+gpara2;
}

No, this is exactly the kind of case that is supposed to be rejected

clang -O0 -march=armv8.2-a+bf16 test.c

I don't really understand that at this point our hardware already supports the bf16 data type.

GlobalISel doesn't currently support bf16 properly until we add FP types. This change is supposed to trigger an abort or fallback to SDAG so that we don't silently miscompile bf16 code as if it were FP16.

@aemerson
Copy link
Contributor Author

Now an error is reported for a very simple bf16 data addition. Is this modification too violent?

#include <arm_neon.h>
bfloat16_t test(bfloat16_t gpara1, bfloat16_t gpara2)
{
    return gpara1+gpara2;
}

No, this is exactly the kind of case that is supposed to be rejected

clang -O0 -march=armv8.2-a+bf16 test.c

I don't really understand that at this point our hardware already supports the bf16 data type.

I just tried that test case and I see the crash. The crash is in SelectionDAG because there's incomplete support for bf16 in the backend. Previously clang was silently miscompiling this to FP16 because GlobalISel didn't know it was bf16.

Also, bf16 in ARM doesn't provide you native support for general purpose FP arithmetic with bf16. It only adds some MLA/dotprod and conversions IIRC, so trying to add two bf16s would have to be done by promoting to float32 and then truncating back down.

@davemgreen
Copy link
Collaborator

It was previously giving an error in the frontend that you are using an operation that is not supported, but that was broken at some point without making the backend work correctly.

@erickq
Copy link
Member

erickq commented Jan 16, 2024

It was previously giving an error in the frontend that you are using an operation that is not supported, but that was broken at some point without making the backend work correctly.

Is this the kind of error? https://godbolt.org/z/v8d4foaef
image

@arsenm
Copy link
Contributor

arsenm commented Jan 16, 2024

It was previously giving an error in the frontend that you are using an operation that is not supported, but that was broken at some point without making the backend work correctly.

Is this the kind of error?

No, that error doesn't make much sense

@davemgreen
Copy link
Collaborator

It was previously giving an error in the frontend that you are using an operation that is not supported, but that was broken at some point without making the backend work correctly.

Is this the kind of error?

Yep, that's the error. Operations like Add were previously not defined as they were not supported by the instruction set and there were no promotions for them. Someone changed how the frontend worked whilst only making the backend work for X86.

@erickq
Copy link
Member

erickq commented Jan 16, 2024

It was previously giving an error in the frontend that you are using an operation that is not supported, but that was broken at some point without making the backend work correctly.

Is this the kind of error?

Yep, that's the error. Operations like Add were previously not defined as they were not supported by the instruction set and there were no promotions for them. Someone changed how the frontend worked whilst only making the backend work for X86.

Okay. thank you. I think I see what you mean. So, are there any plans to support the follow-up?@davemgreen @arsenm

@davemgreen
Copy link
Collaborator

I don't know of anyone working on it, but it would be good to get it fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants