Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially unsound uses of simd_select_bitmask in stdarch #137942

Open
RalfJung opened this issue Mar 3, 2025 · 11 comments
Open

Potentially unsound uses of simd_select_bitmask in stdarch #137942

RalfJung opened this issue Mar 3, 2025 · 11 comments
Labels
E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Mar 3, 2025

Looking for potential violations of simd_* intrinsic preconditions, I found this in stdarch:

/// Compute dot-product of BF16 (16-bit) floating-point pairs in a and b,
/// accumulating the intermediate single-precision (32-bit) floating-point elements
/// with elements in src, and store the results in dst using zeromask k
/// (elements are zeroed out when the corresponding mask bit is not set).
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=1769,1651,1654,1657,1660&avx512techs=AVX512_BF16&text=_mm_maskz_dpbf16_ps)
#[inline]
#[target_feature(enable = "avx512bf16,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr("vdpbf16ps"))]
pub fn _mm_maskz_dpbf16_ps(k: __mmask8, src: __m128, a: __m128bh, b: __m128bh) -> __m128 {
    unsafe {
        let rst = _mm_dpbf16_ps(src, a, b).as_f32x4();
        let zero = _mm_set1_ps(0.0_f32).as_f32x4();
        transmute(simd_select_bitmask(k, rst, zero))
    }
}

simd_select_bitmask is documented to require that all the "extra"/"padding" bits in the mask (not corresponding to a vector element) must be 0. Here, rst and zero are vectors of length 4, and the mask k is a u8, meaning there are 4 bits in k that must be 0. However, nothing in the function actually ensures that.

I don't know the intended behavior of the intrinsic for that case (probably intel promises to just ignore the extra bits?), but this function recently got marked as safe (in rust-lang/stdarch#1714) and that is clearly in contradiction with our intrinsic docs. I assume the safety is correct as probably the intrinsic should have no precondition; in that case we have to

  • either explicitly mask out the higher bits
  • or figure out if we can remove the UB from simd_select_bitmask

Cc @usamoi @Amanieu @workingjubilee

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 3, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2025

This is a similar case:

/// For each packed 32-bit integer maps the value to the number of logical 1 bits.
///
/// Uses the writemask in k - elements are zeroed in the result if the corresponding mask bit is not set.
/// Otherwise the computation result is written into the result.
///
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_maskz_popcnt_epi32)
#[inline]
#[target_feature(enable = "avx512vpopcntdq,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr(vpopcntd))]
pub fn _mm_maskz_popcnt_epi32(k: __mmask8, a: __m128i) -> __m128i {
    unsafe {
        transmute(simd_select_bitmask(
            k,
            simd_ctpop(a.as_i32x4()),
            i32x4::ZERO,
        ))
    }
}

I did not do an exhaustive search.

@Amanieu
Copy link
Member

Amanieu commented Mar 3, 2025

__mmask8 is type alias for u8 where each bit represents one vector element.

(misread your comment)

@Amanieu
Copy link
Member

Amanieu commented Mar 3, 2025

Right, I believe the intent is to ignore the unused bits here.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2025

Right, I believe the intent is to ignore the unused bits here.

A quick look at Intel's docs confirms this.

So, the question is, can we change the implementation to use k & 0xF to mask out the higher bits (assuming I got the bit order right)? Will LLVM know that it can contract simd_select_bitmask(k & 0xF, ...) into a single instruction on x86 based on how that architecture behaves?

Or do we have to dig into the simd_select_bitmask implementation and see if we can remove the UB? If I understand correctly what our LLVM backend does, it truncs the i8 to an i4 and then bitcasts that to <4 x i1>, so it does indeed ignore the other bits. But that also means it is likely the bitwise-and followed by trunc would get optimized to the Right Thing by LLVM.

@jhorstmann
Copy link
Contributor

That is also my understanding of the llvm implementation. It will truncate to an integer with the number of bits corresponding to the number of lanes, then bitcast that to a vector:

let m_im = bx.trunc(mask, im);

So any higher bits will be ignored.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2025

I wonder if our other backends work the same; Cc @bjorn3 @GuillaumeGomez

@bjorn3
Copy link
Member

bjorn3 commented Mar 3, 2025

cg_clif currently just checks if the respective lane in the mask is equal to 0 or not: https://github.com/rust-lang/rustc_codegen_cranelift/blob/0f9c09fb3a64ff11ea81446a96907cd5e86490c2/src/intrinsics/simd.rs#L788-L790

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2025

Okay so that is also fine with arbitrary data in the "extra" bits.

@bjorn3
Copy link
Member

bjorn3 commented Mar 3, 2025

Yeah, extra bits are ignored.

@lolbinarycat lolbinarycat added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status labels Mar 4, 2025
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 4, 2025
@Amanieu
Copy link
Member

Amanieu commented Mar 5, 2025

So it seems we could just change the simd_select_bitmask definition to ignore padding bits instead of requiring them to be 0?

@RalfJung
Copy link
Member Author

RalfJung commented Mar 5, 2025

We haven't heard about the GCC backend yet, but yeah it seems like that should work. Only Miri and the docs would need adjustments for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants