Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nontemporal_store: make sure that the intrinsic is truly just a hint #128149

Merged
merged 3 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion compiler/rustc_codegen_cranelift/src/intrinsics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,8 @@ fn codegen_regular_intrinsic_call<'tcx>(

// Cranelift treats stores as volatile by default
// FIXME correctly handle unaligned_volatile_store
// FIXME actually do nontemporal stores if requested
// FIXME actually do nontemporal stores if requested (but do not just emit MOVNT on x86;
// see the LLVM backend for details)
let dest = CPlace::for_ptr(Pointer::new(ptr), val.layout());
dest.write_cvalue(fx, val);
}
Expand Down
2 changes: 2 additions & 0 deletions compiler/rustc_codegen_gcc/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,8 @@ impl<'a, 'gcc, 'tcx> BuilderMethods<'a, 'tcx> for Builder<'a, 'gcc, 'tcx> {
self.llbb().add_assignment(self.location, aligned_destination, val);
// TODO(antoyo): handle align and flags.
// NOTE: dummy value here since it's never used. FIXME(antoyo): API should not return a value here?
// When adding support for NONTEMPORAL, make sure to not just emit MOVNT on x86; see the
// LLVM backend for details.
self.cx.context.new_rvalue_zero(self.type_i32())
}

Expand Down
31 changes: 24 additions & 7 deletions compiler/rustc_codegen_llvm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -727,13 +727,30 @@ impl<'a, 'll, 'tcx> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
llvm::LLVMSetVolatile(store, llvm::True);
}
if flags.contains(MemFlags::NONTEMPORAL) {
// According to LLVM [1] building a nontemporal store must
// *always* point to a metadata value of the integer 1.
//
// [1]: https://llvm.org/docs/LangRef.html#store-instruction
let one = self.cx.const_i32(1);
let node = llvm::LLVMMDNodeInContext(self.cx.llcx, &one, 1);
llvm::LLVMSetMetadata(store, llvm::MD_nontemporal as c_uint, node);
// Make sure that the current target architectures supports "sane" non-temporal
// stores, i.e., non-temporal stores that are equivalent to regular stores except
// for performance. LLVM doesn't seem to care about this, and will happily treat
// `!nontemporal` stores as-if they were normal stores (for reordering optimizations
// etc) even on x86, despite later lowering them to MOVNT which do *not* behave like
// regular stores but require special fences.
// So we keep a list of architectures where `!nontemporal` is known to be truly just
// a hint, and use regular stores everywhere else.
// (In the future, we could alternatively ensure that an sfence gets emitted after a sequence of movnt
// before any kind of synchronizing operation. But it's not clear how to do that with LLVM.)
const WELL_BEHAVED_NONTEMPORAL_ARCHS: &[&str] =
&["aarch64", "arm", "riscv32", "riscv64"];
Comment on lines +742 to +743
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question [REV1 (2/2)]: If I understand it correctly, the codegen test is modified such that it only codegens for aarch64-unknown-linux-gnu target, but here we have 4 "well-behaved" architectures that we do emit !nontemporal for.


let use_nontemporal =
WELL_BEHAVED_NONTEMPORAL_ARCHS.contains(&&*self.cx.tcx.sess.target.arch);
if use_nontemporal {
// According to LLVM [1] building a nontemporal store must
// *always* point to a metadata value of the integer 1.
//
// [1]: https://llvm.org/docs/LangRef.html#store-instruction
let one = self.cx.const_i32(1);
let node = llvm::LLVMMDNodeInContext(self.cx.llcx, &one, 1);
llvm::LLVMSetMetadata(store, llvm::MD_nontemporal as c_uint, node);
}
}
store
}
Expand Down
10 changes: 5 additions & 5 deletions library/core/src/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2386,12 +2386,12 @@ extern "rust-intrinsic" {
#[rustc_nounwind]
pub fn catch_unwind(try_fn: fn(*mut u8), data: *mut u8, catch_fn: fn(*mut u8, *mut u8)) -> i32;

/// Emits a `!nontemporal` store according to LLVM (see their docs).
/// Probably will never become stable.
/// Emits a `nontemporal` store, which gives a hint to the CPU that the data should not be held
/// in cache. Except for performance, this is fully equivalent to `ptr.write(val)`.
///
/// Do NOT use this intrinsic; "nontemporal" operations do not exist in our memory model!
/// It exists to support current stdarch, but the plan is to change stdarch and remove this intrinsic.
/// See <https://github.com/rust-lang/rust/issues/114582> for some more discussion.
/// Not all architectures provide such an operation. For instance, x86 does not: while `MOVNT`
/// exists, that operation is *not* equivalent to `ptr.write(val)` (`MOVNT` writes can be reordered
/// in ways that are not allowed for regular writes).
#[rustc_nounwind]
pub fn nontemporal_store<T>(ptr: *mut T, val: T);

Expand Down
19 changes: 17 additions & 2 deletions tests/codegen/intrinsics/nontemporal.rs
jieyouxu marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,13 +1,28 @@
//@ compile-flags: -O
//@ compile-flags: --target aarch64-unknown-linux-gnu
//@ needs-llvm-components: aarch64
jieyouxu marked this conversation as resolved.
Show resolved Hide resolved

#![feature(core_intrinsics)]
#![feature(no_core, lang_items, intrinsics)]
#![no_core]
#![crate_type = "lib"]

#[lang = "sized"]
pub trait Sized {}
#[lang = "copy"]
pub trait Copy {}

impl Copy for u32 {}
impl<T> Copy for *mut T {}

extern "rust-intrinsic" {
pub fn nontemporal_store<T>(ptr: *mut T, val: T);
}

#[no_mangle]
pub fn a(a: &mut u32, b: u32) {
// CHECK-LABEL: define{{.*}}void @a
// CHECK: store i32 %b, ptr %a, align 4, !nontemporal
unsafe {
std::intrinsics::nontemporal_store(a, b);
nontemporal_store(a, b);
}
}
Loading