-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong floating point promotion during type legalisation #73805
Comments
Does ARM use the Promote action or the SoftPromoteHalf action in the type legalizer? I know X86 and RISC-V use SoftPromoteHalf. I added SoftPromoteHalf a few years ago to fix issues with Promote. |
$ git grep softPromoteHalfType
llvm/include/llvm/CodeGen/TargetLowering.h: virtual bool softPromoteHalfType() const { return false; }
llvm/lib/CodeGen/TargetLoweringBase.cpp: if (softPromoteHalfType()) {
llvm/lib/Target/RISCV/RISCVISelLowering.h: bool softPromoteHalfType() const override { return true; }
llvm/lib/Target/X86/X86ISelLowering.h: bool softPromoteHalfType() const override { return true; } I think that means ARM doesn't use it. Thanks for that. I'll look through the history there, it seems a bit weird at first glance to leave something broken, and then for specific targets, implement it differently in a correct way, rather than just fixing what's broken, but presumably there's a good reason for that that will be clearer after seeing that history. |
It required a lot of test updates that I didn't have the bandwidth to be responsible for at the time. |
Thanks. If that is the reason, I think this means, in theory, we want diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 77ee6b89ed8a..3cccf99f5c19 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -509,7 +509,7 @@ public:
// to float around arithmetic. The default behavior is to pass around as
// float and convert around loads/stores/bitcasts and other places where
// the size matters.
- virtual bool softPromoteHalfType() const { return false; }
+ virtual bool softPromoteHalfType() const { return true; }
// There are two general methods for expanding a BUILD_VECTOR node:
// 1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 45200b54595a..a8eeba8d579d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -474,8 +474,6 @@ public:
bool preferScalarizeSplat(SDNode *N) const override;
- bool softPromoteHalfType() const override { return true; }
-
/// Return the register type for a given MVT, ensuring vectors are treated
/// as a series of gpr sized integers.
MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 3b1b2603fd8f..e05ca92195d9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1518,8 +1518,6 @@ namespace llvm {
/// Customize the preferred legalization strategy for certain types.
LegalizeTypeAction getPreferredVectorAction(MVT VT) const override;
- bool softPromoteHalfType() const override { return true; }
-
MVT getRegisterTypeForCallingConv(LLVMContext &Context, CallingConv::ID CC,
EVT VT) const override;
and later on remove |
For ARM, things almost work with diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 2648c16bcd8d..490c5d64b76d 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1426,13 +1426,12 @@ void TargetLoweringBase::computeRegisterProperties(
NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::i16];
RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::i16];
TransformToType[MVT::f16] = MVT::f32;
- ValueTypeActions.setTypeAction(MVT::f16, TypeSoftPromoteHalf);
} else {
NumRegistersForVT[MVT::f16] = NumRegistersForVT[MVT::f32];
RegisterTypeForVT[MVT::f16] = RegisterTypeForVT[MVT::f32];
TransformToType[MVT::f16] = MVT::f32;
- ValueTypeActions.setTypeAction(MVT::f16, TypePromoteFloat);
}
+ ValueTypeActions.setTypeAction(MVT::f16, TypeSoftPromoteHalf);
}
// Decide how to handle bf16. If the target does not have native bf16 support, A whole lot of tests need updating, but overall, the generated code looks good. What doesn't work is D149715 should, in my opinion, never have been even submitted for review in that state, let alone pushed. Aside from the issues already known at the time of the commit, it also likely resulted in #64426 and #67735. If this had been caught and investigated at the time, the commit would have been reverted promptly. I think it is too late for that now. Since the decision of D149715 was to not bother testing, the obvious "solution" here is to do the same and disable the affected tests for ARM and any other platform that sees the same errors. (Edit: To be clear, I am not seriously proposing that.) |
The D149715 problems are sufficiently improved by #74076. After that, #74147 and hvdijk@softpromotehalf-fpreg-arm fix this for ARM. |
@llvm/issue-subscribers-backend-arm Author: Harald van Dijk (hvdijk)
```c++
#include <stdio.h>
attribute((noinline)) int main(void) {
Same as Clang initially generates: ; Function Attrs: noinline nounwind
define dso_local float @<!-- -->f(half noundef %a, half noundef %b) #<!-- -->0 {
entry:
%a.addr = alloca half, align 2
%b.addr = alloca half, align 2
store half %a, ptr %a.addr, align 2, !tbaa !5
store half %b, ptr %b.addr, align 2, !tbaa !5
%0 = load half, ptr %b.addr, align 2, !tbaa !5
%ext = fpext half %0 to float
%1 = load half, ptr %a.addr, align 2, !tbaa !5
%conv = fpext half %1 to float
%add = fadd float %conv, %ext
%conv1 = fptrunc float %add to half
store half %conv1, ptr %a.addr, align 2, !tbaa !5
%2 = load half, ptr %a.addr, align 2, !tbaa !5
%conv2 = fpext half %2 to float
ret float %conv2
} That is, it performs the addition in InstCombine then transforms that as -*** IR Dump Before InstCombinePass on f ***
+*** IR Dump After InstCombinePass on f ***
; Function Attrs: noinline nounwind
define dso_local float @<!-- -->f(half noundef %a, half noundef %b) local_unnamed_addr #<!-- -->0 {
entry:
- %ext = fpext half %b to float
- %conv = fpext half %a to float
- %add = fadd float %conv, %ext
- %conv1 = fptrunc float %add to half
+ %conv1 = fadd half %a, %b
%conv2 = fpext half %conv1 to float
ret float %conv2
} Turning it into an During type legalisation, though, because of it not being a legal operation, it is again converted back to an @@ -1,13 +1,12 @@
t0: ch,glue = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
t5: i32 = bitcast t2
- t6: i16 = truncate t5
- t7: f16 = bitcast t6
+ t18: i32 = and t5, Constant:i32<65535>
+ t16: f32 = fp16_to_fp t18
t4: f32,ch = CopyFromReg t0, Register:f32 %1
t8: i32 = bitcast t4
- t9: i16 = truncate t8
- t10: f16 = bitcast t9
- t11: f16 = fadd t7, t10
- t12: f32 = fp_extend t11
- t14: ch,glue = CopyToReg t0, Register:f32 $s0, t12
+ t21: i32 = and t8, Constant:i32<65535>
+ t19: f32 = fp16_to_fp t21
+ t20: f32 = fadd t16, t19
+ t14: ch,glue = CopyToReg t0, Register:f32 $s0, t20
t15: ch = ARMISD::RET_GLUE t14, Register:f32 $s0, t14:1 However, in this promotion back to I initially thought this was another instance of #44218 but this seems like a completely different part of the compiler where things go wrong. But attempts to fix #44218 may, depending on how they are done, result in this issue. |
IMHO we have UB in the original test case (notwithstanding issues down the line)
|
The rule about being outside the representable range applies when the result (after rounding) is less than the minimum value representable in |
Fair enough, C11 5.2.4.2.2 p5. |
The traditional promotion is known to generate wrong code. Fixes llvm#73805.
The traditional promotion is known to generate wrong code. Fixes #73805.
The traditional promotion is known to generate wrong code. Fixes llvm#73805.
float
is capable of representing 65536, but_Float16
is not, and the assignment toa
means that holding the value in a larger type is not permitted: casts and assignments are supposed to discard excess precision and range. This is supposed to printinf
on all targets where_Float16
is supported.However, on ARM (testing on QEmu), I am seeing:
Same as
-O1
for other optimisation levels too.Clang initially generates:
That is, it performs the addition in
float
, and converts the result tohalf
. This is a perfectly valid way of doing this.InstCombine then transforms that as
Turning it into an
fadd
on typehalf
directly. This is not necessarily a useful transformation whenfadd half
is not a legal operation, but it is a valid transformation that preserves the behaviour.During type legalisation, though, because of it not being a legal operation, it is again converted back to an
fadd
on typefloat
:However, in this promotion back to
float
, the truncation is lost, hence the changed result. There seem to be no relevant flags (beyond general "don't optimise") to control this.I initially thought this was another instance of #44218 but this seems like a completely different part of the compiler where things go wrong. But attempts to fix #44218 may, depending on how they are done, result in this issue.
The text was updated successfully, but these errors were encountered: