Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simd_relaxed_fma intrinsic #133395

Merged
merged 2 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion compiler/rustc_codegen_cranelift/src/intrinsics/simd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,8 @@ pub(super) fn codegen_simd_intrinsic_call<'tcx>(
});
}

sym::simd_fma => {
// FIXME: simd_relaxed_fma doesn't relax to non-fused multiply-add
sym::simd_fma | sym::simd_relaxed_fma => {
intrinsic_args!(fx, args => (a, b, c); intrinsic);

if !a.layout().ty.is_simd() {
Expand Down
1 change: 1 addition & 0 deletions compiler/rustc_codegen_gcc/src/intrinsic/simd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,7 @@ pub fn generic_simd_intrinsic<'a, 'gcc, 'tcx>(
sym::simd_flog => "log",
sym::simd_floor => "floor",
sym::simd_fma => "fma",
sym::simd_relaxed_fma => "fma", // FIXME: this should relax to non-fused multiply-add when necessary
sym::simd_fpowi => "__builtin_powi",
sym::simd_fpow => "pow",
sym::simd_fsin => "sin",
Expand Down
2 changes: 2 additions & 0 deletions compiler/rustc_codegen_llvm/src/intrinsic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1534,6 +1534,7 @@ fn generic_simd_intrinsic<'ll, 'tcx>(
sym::simd_flog => ("log", bx.type_func(&[vec_ty], vec_ty)),
sym::simd_floor => ("floor", bx.type_func(&[vec_ty], vec_ty)),
sym::simd_fma => ("fma", bx.type_func(&[vec_ty, vec_ty, vec_ty], vec_ty)),
sym::simd_relaxed_fma => ("fmuladd", bx.type_func(&[vec_ty, vec_ty, vec_ty], vec_ty)),
sym::simd_fpowi => ("powi", bx.type_func(&[vec_ty, bx.type_i32()], vec_ty)),
sym::simd_fpow => ("pow", bx.type_func(&[vec_ty, vec_ty], vec_ty)),
sym::simd_fsin => ("sin", bx.type_func(&[vec_ty], vec_ty)),
Expand Down Expand Up @@ -1572,6 +1573,7 @@ fn generic_simd_intrinsic<'ll, 'tcx>(
| sym::simd_fpowi
| sym::simd_fsin
| sym::simd_fsqrt
| sym::simd_relaxed_fma
| sym::simd_round
| sym::simd_trunc
) {
Expand Down
4 changes: 3 additions & 1 deletion compiler/rustc_hir_analysis/src/check/intrinsic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -641,7 +641,9 @@ pub fn check_intrinsic_type(
| sym::simd_round
| sym::simd_trunc => (1, 0, vec![param(0)], param(0)),
sym::simd_fpowi => (1, 0, vec![param(0), tcx.types.i32], param(0)),
sym::simd_fma => (1, 0, vec![param(0), param(0), param(0)], param(0)),
sym::simd_fma | sym::simd_relaxed_fma => {
(1, 0, vec![param(0), param(0), param(0)], param(0))
}
sym::simd_gather => (3, 0, vec![param(0), param(1), param(2)], param(0)),
sym::simd_masked_load => (3, 0, vec![param(0), param(1), param(2)], param(2)),
sym::simd_masked_store => (3, 0, vec![param(0), param(1), param(2)], tcx.types.unit),
Expand Down
1 change: 1 addition & 0 deletions compiler/rustc_span/src/symbol.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1840,6 +1840,7 @@ symbols! {
simd_reduce_mul_unordered,
simd_reduce_or,
simd_reduce_xor,
simd_relaxed_fma,
simd_rem,
simd_round,
simd_saturating_add,
Expand Down
14 changes: 14 additions & 0 deletions library/core/src/intrinsics/simd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,20 @@ extern "rust-intrinsic" {
#[rustc_nounwind]
pub fn simd_fma<T>(x: T, y: T, z: T) -> T;

/// Computes `(x*y) + z` for each element, non-deterministically executing either
/// a fused multiply-add or two operations with rounding of the intermediate result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we guarantee that the choice made is consistent across all lanes? Or could it happen that some lanes get fused and others not?

I assume this can happen because we want to allow the backend to split a big SIMD op into multiple smaller SIMD ops, and then some may be fused and some may not.

///
/// The operation is fused if the code generator determines that target instruction
/// set has support for a fused operation, and that the fused operation is more efficient
/// than the equivalent, separate pair of mul and add instructions. It is unspecified
/// whether or not a fused operation is selected, and that may depend on optimization
/// level and context, for example.
///
/// `T` must be a vector of floats.
#[cfg(not(bootstrap))]
#[rustc_nounwind]
pub fn simd_relaxed_fma<T>(x: T, y: T, z: T) -> T;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is like fmuladd on scalars? That should be mentioned, and probably it makes sense to copy the doc comment from there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't realize there was a scalar version... is it used anywhere?

Copy link
Member

@programmerjake programmerjake Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to match the scalar version, imo it should be renamed to simd_fmuladd, also to avoid confusion with any fast-math semantics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used yet, it was added in preparation for exposing corresponding methods on the float types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to match the scalar version, imo it should be renamed to simd_fmuladd, also to avoid confusion with any fast-math semantics

That's a pretty bad name though IMO, it is used for the scalar version only because that's how LLVM calls them.

I like relaxed_fma. I don't think it is confusing with fast-math semantics, we don't call those "relaxed" after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe...though I want to say I've seen relaxed suggested for fast math functions...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though, since it isn't always fused, I think simd_something_mul_add is better than simd_something_fma

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_mul_add? simd_relaxed_mul_add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, either of those would be ok-ish -- simd_mul_add makes one think of f32::mul_add which is always fused (imo naming f32::mul_add mul_add is a mistake, but there's nothing we can do now...).


// Computes the sine of each element.
///
/// `T` must be a vector of floats.
Expand Down
4 changes: 4 additions & 0 deletions tests/ui/simd/intrinsic/float-math-pass.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ extern "rust-intrinsic" {
fn simd_fexp<T>(x: T) -> T;
fn simd_fexp2<T>(x: T) -> T;
fn simd_fma<T>(x: T, y: T, z: T) -> T;
fn simd_relaxed_fma<T>(x: T, y: T, z: T) -> T;
fn simd_flog<T>(x: T) -> T;
fn simd_flog10<T>(x: T) -> T;
fn simd_flog2<T>(x: T) -> T;
Expand Down Expand Up @@ -77,6 +78,9 @@ fn main() {
let r = simd_fma(x, h, h);
assert_approx_eq!(x, r);

let r = simd_relaxed_fma(x, h, h);
assert_approx_eq!(x, r);

let r = simd_fsqrt(x);
assert_approx_eq!(x, r);

Expand Down
Loading