Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong floating point promotion during type legalisation #73805

Closed
hvdijk opened this issue Nov 29, 2023 · 10 comments · Fixed by #80440
Closed

Wrong floating point promotion during type legalisation #73805

hvdijk opened this issue Nov 29, 2023 · 10 comments · Fixed by #80440
Labels

Comments

@hvdijk
Copy link
Contributor

hvdijk commented Nov 29, 2023

#include <stdio.h>

__attribute__((noinline))
float f(_Float16 a, _Float16 b) {
  a += b;
  return a;
}

int main(void) {
  printf("%f\n", f(32, 65504));
}

float is capable of representing 65536, but _Float16 is not, and the assignment to a means that holding the value in a larger type is not permitted: casts and assignments are supposed to discard excess precision and range. This is supposed to print inf on all targets where _Float16 is supported.

However, on ARM (testing on QEmu), I am seeing:

$ bin/clang -O0 --target=armv7a-linux-gnueabihf test.c && qemu-arm -L /usr/arm-linux-gnueabihf a.out
inf
$ bin/clang -O1 --target=armv7a-linux-gnueabihf test.c && qemu-arm -L /usr/arm-linux-gnueabihf a.out
65536.000000

Same as -O1 for other optimisation levels too.

Clang initially generates:

; Function Attrs: noinline nounwind
define dso_local float @f(half noundef %a, half noundef %b) #0 {
entry:
  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  store half %a, ptr %a.addr, align 2, !tbaa !5
  store half %b, ptr %b.addr, align 2, !tbaa !5
  %0 = load half, ptr %b.addr, align 2, !tbaa !5
  %ext = fpext half %0 to float
  %1 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv = fpext half %1 to float
  %add = fadd float %conv, %ext
  %conv1 = fptrunc float %add to half
  store half %conv1, ptr %a.addr, align 2, !tbaa !5
  %2 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv2 = fpext half %2 to float
  ret float %conv2
}

That is, it performs the addition in float, and converts the result to half. This is a perfectly valid way of doing this.

InstCombine then transforms that as

-*** IR Dump Before InstCombinePass on f ***
+*** IR Dump After InstCombinePass on f ***
 ; Function Attrs: noinline nounwind
 define dso_local float @f(half noundef %a, half noundef %b) local_unnamed_addr #0 {
 entry:
-  %ext = fpext half %b to float
-  %conv = fpext half %a to float
-  %add = fadd float %conv, %ext
-  %conv1 = fptrunc float %add to half
+  %conv1 = fadd half %a, %b
   %conv2 = fpext half %conv1 to float
   ret float %conv2
 }

Turning it into an fadd on type half directly. This is not necessarily a useful transformation when fadd half is not a legal operation, but it is a valid transformation that preserves the behaviour.

During type legalisation, though, because of it not being a legal operation, it is again converted back to an fadd on type float:

@@ -1,13 +1,12 @@
   t0: ch,glue = EntryToken
             t2: f32,ch = CopyFromReg t0, Register:f32 %0
           t5: i32 = bitcast t2
-          t6: i16 = truncate t5
-        t7: f16 = bitcast t6
+        t18: i32 = and t5, Constant:i32<65535>
+      t16: f32 = fp16_to_fp t18
             t4: f32,ch = CopyFromReg t0, Register:f32 %1
           t8: i32 = bitcast t4
-          t9: i16 = truncate t8
-        t10: f16 = bitcast t9
-      t11: f16 = fadd t7, t10
-    t12: f32 = fp_extend t11
-  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t12
+        t21: i32 = and t8, Constant:i32<65535>
+      t19: f32 = fp16_to_fp t21
+    t20: f32 = fadd t16, t19
+  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t20
   t15: ch = ARMISD::RET_GLUE t14, Register:f32 $s0, t14:1

However, in this promotion back to float, the truncation is lost, hence the changed result. There seem to be no relevant flags (beyond general "don't optimise") to control this.

I initially thought this was another instance of #44218 but this seems like a completely different part of the compiler where things go wrong. But attempts to fix #44218 may, depending on how they are done, result in this issue.

@topperc
Copy link
Collaborator

topperc commented Nov 29, 2023

Does ARM use the Promote action or the SoftPromoteHalf action in the type legalizer? I know X86 and RISC-V use SoftPromoteHalf.

I added SoftPromoteHalf a few years ago to fix issues with Promote.

@hvdijk
Copy link
Contributor Author

hvdijk commented Nov 29, 2023

$ git grep softPromoteHalfType
llvm/include/llvm/CodeGen/TargetLowering.h:  virtual bool softPromoteHalfType() const { return false; }
llvm/lib/CodeGen/TargetLoweringBase.cpp:    if (softPromoteHalfType()) {
llvm/lib/Target/RISCV/RISCVISelLowering.h:  bool softPromoteHalfType() const override { return true; }
llvm/lib/Target/X86/X86ISelLowering.h:    bool softPromoteHalfType() const override { return true; }

I think that means ARM doesn't use it.

Thanks for that. I'll look through the history there, it seems a bit weird at first glance to leave something broken, and then for specific targets, implement it differently in a correct way, rather than just fixing what's broken, but presumably there's a good reason for that that will be clearer after seeing that history.

@topperc
Copy link
Collaborator

topperc commented Nov 29, 2023

$ git grep softPromoteHalfType
llvm/include/llvm/CodeGen/TargetLowering.h:  virtual bool softPromoteHalfType() const { return false; }
llvm/lib/CodeGen/TargetLoweringBase.cpp:    if (softPromoteHalfType()) {
llvm/lib/Target/RISCV/RISCVISelLowering.h:  bool softPromoteHalfType() const override { return true; }
llvm/lib/Target/X86/X86ISelLowering.h:    bool softPromoteHalfType() const override { return true; }

I think that means ARM doesn't use it.

Thanks for that. I'll look through the history there, it seems a bit weird at first glance to leave something broken, and then for specific targets, implement it differently in a correct way, rather than just fixing what's broken, but presumably there's a good reason for that that will be clearer after seeing that history.

It required a lot of test updates that I didn't have the bandwidth to be responsible for at the time.

@hvdijk
Copy link
Contributor Author

hvdijk commented Nov 30, 2023

It required a lot of test updates that I didn't have the bandwidth to be responsible for at the time.

Thanks. If that is the reason, I think this means, in theory, we want

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 77ee6b89ed8a..3cccf99f5c19 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -509,7 +509,7 @@ public:
   // to float around arithmetic. The default behavior is to pass around as
   // float and convert around loads/stores/bitcasts and other places where
   // the size matters.
-  virtual bool softPromoteHalfType() const { return false; }
+  virtual bool softPromoteHalfType() const { return true; }
 
   // There are two general methods for expanding a BUILD_VECTOR node:
   //  1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 45200b54595a..a8eeba8d579d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -474,8 +474,6 @@ public:
 
   bool preferScalarizeSplat(SDNode *N) const override;
 
-  bool softPromoteHalfType() const override { return true; }
-
   /// Return the register type for a given MVT, ensuring vectors are treated
   /// as a series of gpr sized integers.
   MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 3b1b2603fd8f..e05ca92195d9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1518,8 +1518,6 @@ namespace llvm {
     /// Customize the preferred legalization strategy for certain types.
     LegalizeTypeAction getPreferredVectorAction(MVT VT) const override;
 
-    bool softPromoteHalfType() const override { return true; }
-
     MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
                                       EVT VT) const override;
 

and later on remove softPromoteHalfType entirely. In practice, it's never that simple and this results in a lot of errors for different targets that need target-specific knowledge to fix. Plus it would alter the ABI. I'll have a look at what's required to get it working.

@hvdijk
Copy link
Contributor Author

hvdijk commented Nov 30, 2023

For ARM, things almost work with

diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 2648c16bcd8d..490c5d64b76d 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1426,13 +1426,12 @@ void TargetLoweringBase::computeRegisterProperties(
       NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::i16];
       RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::i16];
       TransformToType[MVT::f16] = MVT::f32;
-      ValueTypeActions.setTypeAction(MVT::f16, TypeSoftPromoteHalf);
     } else {
       NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::f32];
       RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::f32];
       TransformToType[MVT::f16] = MVT::f32;
-      ValueTypeActions.setTypeAction(MVT::f16, TypePromoteFloat);
     }
+    ValueTypeActions.setTypeAction(MVT::f16, TypeSoftPromoteHalf);
   }
 
   // Decide how to handle bf16. If the target does not have native bf16 support,

A whole lot of tests need updating, but overall, the generated code looks good.

What doesn't work is frexp. frexp was added knowing full well that it would break things: D149715 was landed with a bunch of FIXME disabled tests, where those tests result in internal compiler errors. Making other targets use TypeSoftPromoteHalf means they encounter the same internal compiler errors that X86 already encounters.

D149715 should, in my opinion, never have been even submitted for review in that state, let alone pushed. Aside from the issues already known at the time of the commit, it also likely resulted in #64426 and #67735. If this had been caught and investigated at the time, the commit would have been reverted promptly. I think it is too late for that now.

Since the decision of D149715 was to not bother testing, the obvious "solution" here is to do the same and disable the affected tests for ARM and any other platform that sees the same errors. (Edit: To be clear, I am not seriously proposing that.)

@hvdijk
Copy link
Contributor Author

hvdijk commented Dec 1, 2023

The D149715 problems are sufficiently improved by #74076. After that, #74147 and hvdijk@softpromotehalf-fpreg-arm fix this for ARM.

@EugeneZelenko EugeneZelenko added the floating-point Floating-point math label Dec 1, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2023

@llvm/issue-subscribers-backend-arm

Author: Harald van Dijk (hvdijk)

```c++ #include <stdio.h>

attribute((noinline))
float f(_Float16 a, _Float16 b) {
a += b;
return a;
}

int main(void) {
printf("%f\n", f(32, 65504));
}

`float` is capable of representing 65536, but `_Float16` is not, and the assignment to `a` means that holding the value in a larger type is not permitted: casts and assignments are supposed to discard excess precision and range. This is supposed to print `inf` on all targets where `_Float16` is supported.

However, on ARM (testing on QEmu), I am seeing:

```sh
$ bin/clang -O0 --target=armv7a-linux-gnueabihf test.c &amp;&amp; qemu-arm -L /usr/arm-linux-gnueabihf a.out
inf
$ bin/clang -O1 --target=armv7a-linux-gnueabihf test.c &amp;&amp; qemu-arm -L /usr/arm-linux-gnueabihf a.out
65536.000000

Same as -O1 for other optimisation levels too.

Clang initially generates:

; Function Attrs: noinline nounwind
define dso_local float @<!-- -->f(half noundef %a, half noundef %b) #<!-- -->0 {
entry:
  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  store half %a, ptr %a.addr, align 2, !tbaa !5
  store half %b, ptr %b.addr, align 2, !tbaa !5
  %0 = load half, ptr %b.addr, align 2, !tbaa !5
  %ext = fpext half %0 to float
  %1 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv = fpext half %1 to float
  %add = fadd float %conv, %ext
  %conv1 = fptrunc float %add to half
  store half %conv1, ptr %a.addr, align 2, !tbaa !5
  %2 = load half, ptr %a.addr, align 2, !tbaa !5
  %conv2 = fpext half %2 to float
  ret float %conv2
}

That is, it performs the addition in float, and converts the result to half. This is a perfectly valid way of doing this.

InstCombine then transforms that as

-*** IR Dump Before InstCombinePass on f ***
+*** IR Dump After InstCombinePass on f ***
 ; Function Attrs: noinline nounwind
 define dso_local float @<!-- -->f(half noundef %a, half noundef %b) local_unnamed_addr #<!-- -->0 {
 entry:
-  %ext = fpext half %b to float
-  %conv = fpext half %a to float
-  %add = fadd float %conv, %ext
-  %conv1 = fptrunc float %add to half
+  %conv1 = fadd half %a, %b
   %conv2 = fpext half %conv1 to float
   ret float %conv2
 }

Turning it into an fadd on type half directly. This is not necessarily a useful transformation when fadd half is not a legal operation, but it is a valid transformation that preserves the behaviour.

During type legalisation, though, because of it not being a legal operation, it is again converted back to an fadd on type float:

@@ -1,13 +1,12 @@
   t0: ch,glue = EntryToken
             t2: f32,ch = CopyFromReg t0, Register:f32 %0
           t5: i32 = bitcast t2
-          t6: i16 = truncate t5
-        t7: f16 = bitcast t6
+        t18: i32 = and t5, Constant:i32&lt;65535&gt;
+      t16: f32 = fp16_to_fp t18
             t4: f32,ch = CopyFromReg t0, Register:f32 %1
           t8: i32 = bitcast t4
-          t9: i16 = truncate t8
-        t10: f16 = bitcast t9
-      t11: f16 = fadd t7, t10
-    t12: f32 = fp_extend t11
-  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t12
+        t21: i32 = and t8, Constant:i32&lt;65535&gt;
+      t19: f32 = fp16_to_fp t21
+    t20: f32 = fadd t16, t19
+  t14: ch,glue = CopyToReg t0, Register:f32 $s0, t20
   t15: ch = ARMISD::RET_GLUE t14, Register:f32 $s0, t14:1

However, in this promotion back to float, the truncation is lost, hence the changed result. There seem to be no relevant flags (beyond general "don't optimise") to control this.

I initially thought this was another instance of #44218 but this seems like a completely different part of the compiler where things go wrong. But attempts to fix #44218 may, depending on how they are done, result in this issue.

@momchil-velikov
Copy link
Collaborator

IMHO we have UB in the original test case (notwithstanding issues down the line)
There are two options:

  • the addition in a += b is executed in _Float16 precision: then if the result is not representable in the type, we have UB (C11 6.5 p5)
  • the addition in a += b is executed in higher precision (say, float or double): then the assignment does a conversion (C11 6.5.16.1 p3) . If the value being converted is outside the representable range we have UB (C11 6.3.1.5 p1)

@hvdijk
Copy link
Contributor Author

hvdijk commented Dec 18, 2023

IMHO we have UB in the original test case (notwithstanding issues down the line) There are two options:

* the addition in `a += b` is executed in `_Float16` precision: then if the result is not representable in the type, we have UB (C11 6.5 p5)

* the addition in `a += b` is executed in higher precision (say, `float` or `double`): then the assignment does a conversion (C11 6.5.16.1 p3) . If the value being converted is outside the representable range we have UB (C11 6.3.1.5 p1)

The rule about being outside the representable range applies when the result (after rounding) is less than the minimum value representable in _Float16, or more than the maximum value. Since _Float16 supports infinities, neither is ever the case, regardless of whether the addition was performed in higher precision.

@momchil-velikov
Copy link
Collaborator

Fair enough, C11 5.2.4.2.2 p5.

hvdijk added a commit to hvdijk/llvm-project that referenced this issue Feb 2, 2024
The traditional promotion is known to generate wrong code.

Fixes llvm#73805.
hvdijk added a commit that referenced this issue Feb 2, 2024
The traditional promotion is known to generate wrong code.

Fixes #73805.
agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
The traditional promotion is known to generate wrong code.

Fixes llvm#73805.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants