Wrong floating point promotion during type legalisation #73805

hvdijk · 2023-11-29T15:31:32Z

#include <stdio.h>

__attribute__((noinline))
float f(_Float16 a, _Float16 b) {
  a += b;
  return a;
}

int main(void) {
  printf("%f\n", f(32, 65504));
}

float is capable of representing 65536, but _Float16 is not, and the assignment to a means that holding the value in a larger type is not permitted: casts and assignments are supposed to discard excess precision and range. This is supposed to print inf on all targets where _Float16 is supported.

However, on ARM (testing on QEmu), I am seeing:

$ bin/clang -O0 --target=armv7a-linux-gnueabihf test.c && qemu-arm -L /usr/arm-linux-gnueabihf a.out
inf
$ bin/clang -O1 --target=armv7a-linux-gnueabihf test.c && qemu-arm -L /usr/arm-linux-gnueabihf a.out
65536.000000

Same as -O1 for other optimisation levels too.

Clang initially generates:

; Function Attrs: noinline nounwind
define dso_local float @f(half noundef %a, half noundef %b) #0 {
entry:
  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  store half %a, ptr %a.addr, align 2, !tbaa !5
  store half %b, ptr %b.addr, align 2, !tbaa !5
  %0 = load half, ptr %b.addr, align 2, !tbaa !5
  %ext = fpext half %0 to float
  %1 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv = fpext half %1 to float
  %add = fadd float %conv, %ext
  %conv1 = fptrunc float %add to half
  store half %conv1, ptr %a.addr, align 2, !tbaa !5
  %2 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv2 = fpext half %2 to float
  ret float %conv2
}

That is, it performs the addition in float, and converts the result to half. This is a perfectly valid way of doing this.

InstCombine then transforms that as

-*** IR Dump Before InstCombinePass on f ***
+*** IR Dump After InstCombinePass on f ***
 ; Function Attrs: noinline nounwind
 define dso_local float @f(half noundef %a, half noundef %b) local_unnamed_addr #0 {
 entry:
-  %ext = fpext half %b to float
-  %conv = fpext half %a to float
-  %add = fadd float %conv, %ext
-  %conv1 = fptrunc float %add to half
+  %conv1 = fadd half %a, %b
   %conv2 = fpext half %conv1 to float
   ret float %conv2
 }

Turning it into an fadd on type half directly. This is not necessarily a useful transformation when fadd half is not a legal operation, but it is a valid transformation that preserves the behaviour.

During type legalisation, though, because of it not being a legal operation, it is again converted back to an fadd on type float:

@@ -1,13 +1,12 @@
   t0: ch,glue = EntryToken
             t2: f32,ch = CopyFromReg t0, Register:f32 %0
           t5: i32 = bitcast t2
-          t6: i16 = truncate t5
-        t7: f16 = bitcast t6
+        t18: i32 = and t5, Constant:i32<65535>
+      t16: f32 = fp16_to_fp t18
             t4: f32,ch = CopyFromReg t0, Register:f32 %1
           t8: i32 = bitcast t4
-          t9: i16 = truncate t8
-        t10: f16 = bitcast t9
-      t11: f16 = fadd t7, t10
-    t12: f32 = fp_extend t11
-  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t12
+        t21: i32 = and t8, Constant:i32<65535>
+      t19: f32 = fp16_to_fp t21
+    t20: f32 = fadd t16, t19
+  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t20
   t15: ch = ARMISD::RET_GLUE t14, Register:f32 $s0, t14:1

However, in this promotion back to float, the truncation is lost, hence the changed result. There seem to be no relevant flags (beyond general "don't optimise") to control this.

I initially thought this was another instance of #44218 but this seems like a completely different part of the compiler where things go wrong. But attempts to fix #44218 may, depending on how they are done, result in this issue.

The text was updated successfully, but these errors were encountered:

topperc · 2023-11-29T16:58:20Z

Does ARM use the Promote action or the SoftPromoteHalf action in the type legalizer? I know X86 and RISC-V use SoftPromoteHalf.

I added SoftPromoteHalf a few years ago to fix issues with Promote.

hvdijk · 2023-11-29T17:03:36Z

$ git grep softPromoteHalfType
llvm/include/llvm/CodeGen/TargetLowering.h:  virtual bool softPromoteHalfType() const { return false; }
llvm/lib/CodeGen/TargetLoweringBase.cpp:    if (softPromoteHalfType()) {
llvm/lib/Target/RISCV/RISCVISelLowering.h:  bool softPromoteHalfType() const override { return true; }
llvm/lib/Target/X86/X86ISelLowering.h:    bool softPromoteHalfType() const override { return true; }

I think that means ARM doesn't use it.

Thanks for that. I'll look through the history there, it seems a bit weird at first glance to leave something broken, and then for specific targets, implement it differently in a correct way, rather than just fixing what's broken, but presumably there's a good reason for that that will be clearer after seeing that history.

topperc · 2023-11-29T17:22:17Z

$ git grep softPromoteHalfType
llvm/include/llvm/CodeGen/TargetLowering.h:  virtual bool softPromoteHalfType() const { return false; }
llvm/lib/CodeGen/TargetLoweringBase.cpp:    if (softPromoteHalfType()) {
llvm/lib/Target/RISCV/RISCVISelLowering.h:  bool softPromoteHalfType() const override { return true; }
llvm/lib/Target/X86/X86ISelLowering.h:    bool softPromoteHalfType() const override { return true; }
I think that means ARM doesn't use it.

Thanks for that. I'll look through the history there, it seems a bit weird at first glance to leave something broken, and then for specific targets, implement it differently in a correct way, rather than just fixing what's broken, but presumably there's a good reason for that that will be clearer after seeing that history.

It required a lot of test updates that I didn't have the bandwidth to be responsible for at the time.

hvdijk · 2023-11-30T00:12:07Z

It required a lot of test updates that I didn't have the bandwidth to be responsible for at the time.

Thanks. If that is the reason, I think this means, in theory, we want

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 77ee6b89ed8a..3cccf99f5c19 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -509,7 +509,7 @@ public:
   // to float around arithmetic. The default behavior is to pass around as
   // float and convert around loads/stores/bitcasts and other places where
   // the size matters.
-  virtual bool softPromoteHalfType() const { return false; }
+  virtual bool softPromoteHalfType() const { return true; }
 
   // There are two general methods for expanding a BUILD_VECTOR node:
   //  1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 45200b54595a..a8eeba8d579d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -474,8 +474,6 @@ public:
 
   bool preferScalarizeSplat(SDNode *N) const override;
 
-  bool softPromoteHalfType() const override { return true; }
-
   /// Return the register type for a given MVT, ensuring vectors are treated
   /// as a series of gpr sized integers.
   MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 3b1b2603fd8f..e05ca92195d9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1518,8 +1518,6 @@ namespace llvm {
     /// Customize the preferred legalization strategy for certain types.
     LegalizeTypeAction getPreferredVectorAction(MVT VT) const override;
 
-    bool softPromoteHalfType() const override { return true; }
-
     MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
                                       EVT VT) const override;

and later on remove softPromoteHalfType entirely. In practice, it's never that simple and this results in a lot of errors for different targets that need target-specific knowledge to fix. Plus it would alter the ABI. I'll have a look at what's required to get it working.

hvdijk · 2023-11-30T03:03:45Z

For ARM, things almost work with

diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 2648c16bcd8d..490c5d64b76d 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1426,13 +1426,12 @@ void TargetLoweringBase::computeRegisterProperties(
       NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::i16];
       RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::i16];
       TransformToType[MVT::f16] = MVT::f32;
-      ValueTypeActions.setTypeAction(MVT::f16, TypeSoftPromoteHalf);
     } else {
       NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::f32];
       RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::f32];
       TransformToType[MVT::f16] = MVT::f32;
-      ValueTypeActions.setTypeAction(MVT::f16, TypePromoteFloat);
     }
+    ValueTypeActions.setTypeAction(MVT::f16, TypeSoftPromoteHalf);
   }
 
   // Decide how to handle bf16. If the target does not have native bf16 support,

A whole lot of tests need updating, but overall, the generated code looks good.

What doesn't work is frexp. frexp was added knowing full well that it would break things: D149715 was landed with a bunch of FIXME disabled tests, where those tests result in internal compiler errors. Making other targets use TypeSoftPromoteHalf means they encounter the same internal compiler errors that X86 already encounters.

D149715 should, in my opinion, never have been even submitted for review in that state, let alone pushed. Aside from the issues already known at the time of the commit, it also likely resulted in #64426 and #67735. If this had been caught and investigated at the time, the commit would have been reverted promptly. I think it is too late for that now.

Since the decision of D149715 was to not bother testing, the obvious "solution" here is to do the same and disable the affected tests for ARM and any other platform that sees the same errors. (Edit: To be clear, I am not seriously proposing that.)

hvdijk · 2023-12-01T21:45:15Z

The D149715 problems are sufficiently improved by #74076. After that, #74147 and hvdijk@softpromotehalf-fpreg-arm fix this for ARM.

llvmbot · 2023-12-02T01:32:48Z

@llvm/issue-subscribers-backend-arm

Author: Harald van Dijk (hvdijk)

```c++ #include <stdio.h>

attribute((noinline))
float f(_Float16 a, _Float16 b) {
a += b;
return a;
}

int main(void) {
printf("%f\n", f(32, 65504));
}

`float` is capable of representing 65536, but `_Float16` is not, and the assignment to `a` means that holding the value in a larger type is not permitted: casts and assignments are supposed to discard excess precision and range. This is supposed to print `inf` on all targets where `_Float16` is supported.

However, on ARM (testing on QEmu), I am seeing:

```sh
$ bin/clang -O0 --target=armv7a-linux-gnueabihf test.c &amp;&amp; qemu-arm -L /usr/arm-linux-gnueabihf a.out
inf
$ bin/clang -O1 --target=armv7a-linux-gnueabihf test.c &amp;&amp; qemu-arm -L /usr/arm-linux-gnueabihf a.out
65536.000000

Same as -O1 for other optimisation levels too.

Clang initially generates:

; Function Attrs: noinline nounwind
define dso_local float @<!-- -->f(half noundef %a, half noundef %b) #<!-- -->0 {
entry:
  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  store half %a, ptr %a.addr, align 2, !tbaa !5
  store half %b, ptr %b.addr, align 2, !tbaa !5
  %0 = load half, ptr %b.addr, align 2, !tbaa !5
  %ext = fpext half %0 to float
  %1 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv = fpext half %1 to float
  %add = fadd float %conv, %ext
  %conv1 = fptrunc float %add to half
  store half %conv1, ptr %a.addr, align 2, !tbaa !5
  %2 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv2 = fpext half %2 to float
  ret float %conv2
}

That is, it performs the addition in float, and converts the result to half. This is a perfectly valid way of doing this.

InstCombine then transforms that as

-*** IR Dump Before InstCombinePass on f ***
+*** IR Dump After InstCombinePass on f ***
 ; Function Attrs: noinline nounwind
 define dso_local float @<!-- -->f(half noundef %a, half noundef %b) local_unnamed_addr #<!-- -->0 {
 entry:
-  %ext = fpext half %b to float
-  %conv = fpext half %a to float
-  %add = fadd float %conv, %ext
-  %conv1 = fptrunc float %add to half
+  %conv1 = fadd half %a, %b
   %conv2 = fpext half %conv1 to float
   ret float %conv2
 }

Turning it into an fadd on type half directly. This is not necessarily a useful transformation when fadd half is not a legal operation, but it is a valid transformation that preserves the behaviour.

During type legalisation, though, because of it not being a legal operation, it is again converted back to an fadd on type float:

@@ -1,13 +1,12 @@
   t0: ch,glue = EntryToken
             t2: f32,ch = CopyFromReg t0, Register:f32 %0
           t5: i32 = bitcast t2
-          t6: i16 = truncate t5
-        t7: f16 = bitcast t6
+        t18: i32 = and t5, Constant:i32&lt;65535&gt;
+      t16: f32 = fp16_to_fp t18
             t4: f32,ch = CopyFromReg t0, Register:f32 %1
           t8: i32 = bitcast t4
-          t9: i16 = truncate t8
-        t10: f16 = bitcast t9
-      t11: f16 = fadd t7, t10
-    t12: f32 = fp_extend t11
-  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t12
+        t21: i32 = and t8, Constant:i32&lt;65535&gt;
+      t19: f32 = fp16_to_fp t21
+    t20: f32 = fadd t16, t19
+  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t20
   t15: ch = ARMISD::RET_GLUE t14, Register:f32 $s0, t14:1

However, in this promotion back to float, the truncation is lost, hence the changed result. There seem to be no relevant flags (beyond general "don't optimise") to control this.

I initially thought this was another instance of #44218 but this seems like a completely different part of the compiler where things go wrong. But attempts to fix #44218 may, depending on how they are done, result in this issue.

momchil-velikov · 2023-12-18T11:26:58Z

IMHO we have UB in the original test case (notwithstanding issues down the line)
There are two options:

the addition in a += b is executed in _Float16 precision: then if the result is not representable in the type, we have UB (C11 6.5 p5)
the addition in a += b is executed in higher precision (say, float or double): then the assignment does a conversion (C11 6.5.16.1 p3) . If the value being converted is outside the representable range we have UB (C11 6.3.1.5 p1)

hvdijk · 2023-12-18T11:30:14Z

IMHO we have UB in the original test case (notwithstanding issues down the line) There are two options:

* the addition in `a += b` is executed in `_Float16` precision: then if the result is not representable in the type, we have UB (C11 6.5 p5)

* the addition in `a += b` is executed in higher precision (say, `float` or `double`): then the assignment does a conversion (C11 6.5.16.1 p3) . If the value being converted is outside the representable range we have UB (C11 6.3.1.5 p1)

The rule about being outside the representable range applies when the result (after rounding) is less than the minimum value representable in _Float16, or more than the maximum value. Since _Float16 supports infinities, neither is ever the case, regardless of whether the addition was performed in higher precision.

momchil-velikov · 2023-12-18T11:58:50Z

Fair enough, C11 5.2.4.2.2 p5.

The traditional promotion is known to generate wrong code. Fixes llvm#73805.

The traditional promotion is known to generate wrong code. Fixes #73805.

The traditional promotion is known to generate wrong code. Fixes llvm#73805.

github-actions bot added the new issue label Nov 29, 2023

EugeneZelenko added llvm:instcombine and removed new issue labels Nov 29, 2023

EugeneZelenko added the floating-point Floating-point math label Dec 1, 2023

dtcxzyw added backend:ARM and removed llvm:instcombine labels Dec 2, 2023

hvdijk mentioned this issue Feb 2, 2024

[ARM] Switch to soft promoting half types. #80440

Merged

hvdijk added a commit to hvdijk/llvm-project that referenced this issue Feb 2, 2024

[ARM] Switch to soft promoting half types.

b545687

The traditional promotion is known to generate wrong code. Fixes llvm#73805.

hvdijk closed this as completed in #80440 Feb 2, 2024

hvdijk added a commit that referenced this issue Feb 2, 2024

[ARM] Switch to soft promoting half types. (#80440)

52864d9

The traditional promotion is known to generate wrong code. Fixes #73805.

agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024

[ARM] Switch to soft promoting half types. (llvm#80440)

1c0b3d5

The traditional promotion is known to generate wrong code. Fixes llvm#73805.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong floating point promotion during type legalisation #73805

Wrong floating point promotion during type legalisation #73805

hvdijk commented Nov 29, 2023

topperc commented Nov 29, 2023

hvdijk commented Nov 29, 2023

topperc commented Nov 29, 2023

hvdijk commented Nov 30, 2023 •

edited

Loading

hvdijk commented Nov 30, 2023 •

edited

Loading

hvdijk commented Dec 1, 2023

llvmbot commented Dec 2, 2023

momchil-velikov commented Dec 18, 2023

hvdijk commented Dec 18, 2023

momchil-velikov commented Dec 18, 2023

Wrong floating point promotion during type legalisation #73805

Wrong floating point promotion during type legalisation #73805

Comments

hvdijk commented Nov 29, 2023

topperc commented Nov 29, 2023

hvdijk commented Nov 29, 2023

topperc commented Nov 29, 2023

hvdijk commented Nov 30, 2023 • edited Loading

hvdijk commented Nov 30, 2023 • edited Loading

hvdijk commented Dec 1, 2023

llvmbot commented Dec 2, 2023

momchil-velikov commented Dec 18, 2023

hvdijk commented Dec 18, 2023

momchil-velikov commented Dec 18, 2023

hvdijk commented Nov 30, 2023 •

edited

Loading

hvdijk commented Nov 30, 2023 •

edited

Loading