Potentially unsound uses of simd_select_bitmask in stdarch #137942

RalfJung · 2025-03-03T14:21:09Z

Looking for potential violations of simd_* intrinsic preconditions, I found this in stdarch:

/// Compute dot-product of BF16 (16-bit) floating-point pairs in a and b,
/// accumulating the intermediate single-precision (32-bit) floating-point elements
/// with elements in src, and store the results in dst using zeromask k
/// (elements are zeroed out when the corresponding mask bit is not set).
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=1769,1651,1654,1657,1660&avx512techs=AVX512_BF16&text=_mm_maskz_dpbf16_ps)
#[inline]
#[target_feature(enable = "avx512bf16,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr("vdpbf16ps"))]
pub fn _mm_maskz_dpbf16_ps(k: __mmask8, src: __m128, a: __m128bh, b: __m128bh) -> __m128 {
    unsafe {
        let rst = _mm_dpbf16_ps(src, a, b).as_f32x4();
        let zero = _mm_set1_ps(0.0_f32).as_f32x4();
        transmute(simd_select_bitmask(k, rst, zero))
    }
}

simd_select_bitmask is documented to require that all the "extra"/"padding" bits in the mask (not corresponding to a vector element) must be 0. Here, rst and zero are vectors of length 4, and the mask k is a u8, meaning there are 4 bits in k that must be 0. However, nothing in the function actually ensures that.

I don't know the intended behavior of the intrinsic for that case (probably intel promises to just ignore the extra bits?), but this function recently got marked as safe (in rust-lang/stdarch#1714) and that is clearly in contradiction with our intrinsic docs. I assume the safety is correct as probably the intrinsic should have no precondition; in that case we have to

either explicitly mask out the higher bits
or figure out if we can remove the UB from simd_select_bitmask

Cc @usamoi @Amanieu @workingjubilee

The text was updated successfully, but these errors were encountered:

RalfJung · 2025-03-03T14:24:21Z

This is a similar case:

/// For each packed 32-bit integer maps the value to the number of logical 1 bits.
///
/// Uses the writemask in k - elements are zeroed in the result if the corresponding mask bit is not set.
/// Otherwise the computation result is written into the result.
///
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_maskz_popcnt_epi32)
#[inline]
#[target_feature(enable = "avx512vpopcntdq,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr(vpopcntd))]
pub fn _mm_maskz_popcnt_epi32(k: __mmask8, a: __m128i) -> __m128i {
    unsafe {
        transmute(simd_select_bitmask(
            k,
            simd_ctpop(a.as_i32x4()),
            i32x4::ZERO,
        ))
    }
}

I did not do an exhaustive search.

Amanieu · 2025-03-03T14:25:15Z

~~__mmask8 is type alias for u8 where each bit represents one vector element.~~

(misread your comment)

Amanieu · 2025-03-03T14:26:21Z

Right, I believe the intent is to ignore the unused bits here.

RalfJung · 2025-03-03T14:39:06Z

Right, I believe the intent is to ignore the unused bits here.

A quick look at Intel's docs confirms this.

So, the question is, can we change the implementation to use k & 0xF to mask out the higher bits (assuming I got the bit order right)? Will LLVM know that it can contract simd_select_bitmask(k & 0xF, ...) into a single instruction on x86 based on how that architecture behaves?

Or do we have to dig into the simd_select_bitmask implementation and see if we can remove the UB? If I understand correctly what our LLVM backend does, it truncs the i8 to an i4 and then bitcasts that to <4 x i1>, so it does indeed ignore the other bits. But that also means it is likely the bitwise-and followed by trunc would get optimized to the Right Thing by LLVM.

jhorstmann · 2025-03-03T16:38:01Z

That is also my understanding of the llvm implementation. It will truncate to an integer with the number of bits corresponding to the number of lanes, then bitcast that to a vector:

rust/compiler/rustc_codegen_llvm/src/intrinsic.rs

Line 1284 in 81d8edc

let m_im = bx.trunc(mask, im);

So any higher bits will be ignored.

RalfJung · 2025-03-03T16:42:36Z

I wonder if our other backends work the same; Cc @bjorn3 @GuillaumeGomez

bjorn3 · 2025-03-03T17:20:32Z

cg_clif currently just checks if the respective lane in the mask is equal to 0 or not: https://github.com/rust-lang/rustc_codegen_cranelift/blob/0f9c09fb3a64ff11ea81446a96907cd5e86490c2/src/intrinsics/simd.rs#L788-L790

RalfJung · 2025-03-03T18:16:41Z

Okay so that is also fine with arbitrary data in the "extra" bits.

bjorn3 · 2025-03-03T19:02:00Z

Yeah, extra bits are ignored.

Amanieu · 2025-03-05T17:06:10Z

So it seems we could just change the simd_select_bitmask definition to ignore padding bits instead of requiring them to be 0?

RalfJung · 2025-03-05T17:10:44Z

We haven't heard about the GCC backend yet, but yeah it seems like that should work. Only Miri and the docs would need adjustments for that.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 3, 2025

RalfJung mentioned this issue Mar 3, 2025

mark x86 intrinsics as safe rust-lang/stdarch#1714

Merged

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially unsound uses of simd_select_bitmask in stdarch #137942

Potentially unsound uses of simd_select_bitmask in stdarch #137942

RalfJung commented Mar 3, 2025 •

edited

Loading

RalfJung commented Mar 3, 2025

Amanieu commented Mar 3, 2025 •

edited

Loading

Amanieu commented Mar 3, 2025

RalfJung commented Mar 3, 2025 •

edited

Loading

jhorstmann commented Mar 3, 2025

RalfJung commented Mar 3, 2025

bjorn3 commented Mar 3, 2025

RalfJung commented Mar 3, 2025

bjorn3 commented Mar 3, 2025

Amanieu commented Mar 5, 2025

RalfJung commented Mar 5, 2025

Potentially unsound uses of simd_select_bitmask in stdarch #137942

Potentially unsound uses of simd_select_bitmask in stdarch #137942

Comments

RalfJung commented Mar 3, 2025 • edited Loading

RalfJung commented Mar 3, 2025

Amanieu commented Mar 3, 2025 • edited Loading

Amanieu commented Mar 3, 2025

RalfJung commented Mar 3, 2025 • edited Loading

jhorstmann commented Mar 3, 2025

RalfJung commented Mar 3, 2025

bjorn3 commented Mar 3, 2025

RalfJung commented Mar 3, 2025

bjorn3 commented Mar 3, 2025

Amanieu commented Mar 5, 2025

RalfJung commented Mar 5, 2025

RalfJung commented Mar 3, 2025 •

edited

Loading

Amanieu commented Mar 3, 2025 •

edited

Loading

RalfJung commented Mar 3, 2025 •

edited

Loading