LLVM miscompiles passing/returning `half` on several backends by using lossy conversions #97981

beetrees · 2024-07-08T02:57:20Z

Consider the following IR:

define half @to_half(i16 %bits) {
    %f = bitcast i16 %bits to half
    ret half %f
}

define i16 @from_half(half %f) {
    %bits = bitcast half %f to i16
    ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from to_half and from_half should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return half as a float. On these targets, LLVM will use the default float conversion builtins (such as __gnu_h2f_ieee and __gnu_f2h_ieee) to convert between half and float. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of half -> float -> half is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

C-SKY (csky-unknown-linux-gnuabiv2 with -mcpu=ck860fv -mattr=+hard-float)
Hexagon (hexagon-unknown-linux-musl)
LoongArch (loongarch64-unknown-linux-gnu with -mattr=+f): Fixed by [loongarch][DAG][FREEZE] Fix crash when FREEZE a half(f16) type on loongarch #107791
MIPS (mips64el-unknown-linux-gnuabi64)
PowerPC (powerpc64le-unknown-linux-gnu)
SPARC (sparc64-unknown-linux-gnu)
WASM (wasm32-unknown-wasi): Already reported in wasm32 queitens half signalling NaNs in some situations when passing/returning them #96438

As none of these target's ABI specifications (that I've been able to find) specify how half should be passed (nor does Clang support _Float16 on any of these targets), and given that these targets are a subset of those affected by #97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between half and float when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return half without converting it to float (probably using the same ABI as i16, but some targets might have better options).

Related to #97975.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-07-08T03:01:35Z

@llvm/issue-subscribers-backend-powerpc

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

llvmbot · 2024-07-08T03:01:36Z

@llvm/issue-subscribers-backend-mips

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

llvmbot · 2024-07-08T03:01:36Z

@llvm/issue-subscribers-backend-hexagon

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

llvmbot · 2024-07-08T03:01:37Z

@llvm/issue-subscribers-backend-webassembly

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

vchuravy · 2024-07-08T18:32:39Z

FYI: On PPC atleast you need to add zext I think. (On PPC i16 is being passed as i32 and that can cause issue)

define half @to_half(zext i16 %bits) {
    %f = bitcast i16 %bits to half
    ret half %f
}

programmerjake · 2024-07-08T20:31:51Z

FYI: On PPC atleast you need to add zext I think. (On PPC i16 is being passed as i32 and that can cause issue)

I remember seeing somewhere that s390x requires either zext or sext for all integer arguments and returns, since they have different ABIs and the generated code assumes they've been sign/zero extended properly.

arsenm · 2024-07-10T08:33:09Z

The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of half -> float -> half is not lossless and causes the generated ASM to violate the semantics of LLVM IR.

The handling here treats the ABI as implicitly adding an fpext/fptrunc in the argument/return lowering (or a strict_fpext in the strictfp case). LLVM semantics do not guarantee signaling nan quieting, but ideally there wouldn't be an implicit cast in the first place. This is a consequence of SelectionDAG's type legalization logic, where without a legal f16 types assumes promotion to float in all contexts.

I agree this isn't the most sensible behavior. AMDGPU suffers the same issue for the antique targets without legal f16, which I've always found annoying. I think it would make more sense to treat it as i16, or if we really want to keep it in float registers, as the low bits (or high on big endian) that just happen to be stored in a float that need to be extracted as an integer.

beetrees · 2024-08-03T13:21:45Z

I've confirmed that the experimental C-SKY backend also appears to experience this issue.

beetrees · 2024-08-22T04:30:26Z

The LoongArch backend also appears to have this issue.

github-actions bot added the new issue label Jul 8, 2024

EugeneZelenko added backend:Hexagon backend:MIPS backend:PowerPC backend:Sparc backend:WebAssembly and removed new issue labels Jul 8, 2024

beetrees mentioned this issue Jul 8, 2024

Tracking Issue for f16 and f128 float types rust-lang/rust#116909

Open

82 tasks

dtcxzyw added miscompilation floating-point Floating-point math labels Jul 8, 2024

This was referenced Aug 2, 2024

Infinite recursion in __extendhfsf2 and __truncsfhf2 on "no-f16-f128" platforms rust-lang/compiler-builtins#651

Open

Presence of f16 in signatures causes missing symbols on PPC rust-lang/compiler-builtins#655

Closed

beetrees mentioned this issue Aug 3, 2024

LLVM miscompiles consecutive half operations by using too much precision on several backends #97975

Open

This was referenced Aug 22, 2024

Enable f16 tests on loongarch rust-lang/rust#129384

Closed

Enable f16 tests on platforms that were missing conversion symbols rust-lang/rust#129385

Open

heiher mentioned this issue Sep 20, 2024

[loongarch][DAG][FREEZE] Fix crash when FREEZE a half(f16) type on loongarch #107791

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM miscompiles passing/returning `half` on several backends by using lossy conversions #97981

LLVM miscompiles passing/returning `half` on several backends by using lossy conversions #97981

beetrees commented Jul 8, 2024 •

edited by heiher

Loading

llvmbot commented Jul 8, 2024

llvmbot commented Jul 8, 2024

llvmbot commented Jul 8, 2024

llvmbot commented Jul 8, 2024

vchuravy commented Jul 8, 2024

programmerjake commented Jul 8, 2024

arsenm commented Jul 10, 2024

beetrees commented Aug 3, 2024

beetrees commented Aug 22, 2024

LLVM miscompiles passing/returning half on several backends by using lossy conversions #97981

LLVM miscompiles passing/returning half on several backends by using lossy conversions #97981

Comments

beetrees commented Jul 8, 2024 • edited by heiher Loading

llvmbot commented Jul 8, 2024

llvmbot commented Jul 8, 2024

llvmbot commented Jul 8, 2024

llvmbot commented Jul 8, 2024

vchuravy commented Jul 8, 2024

programmerjake commented Jul 8, 2024

arsenm commented Jul 10, 2024

beetrees commented Aug 3, 2024

beetrees commented Aug 22, 2024

LLVM miscompiles passing/returning `half` on several backends by using lossy conversions #97981

LLVM miscompiles passing/returning `half` on several backends by using lossy conversions #97981

beetrees commented Jul 8, 2024 •

edited by heiher

Loading