Skip to content

Commit

Permalink
Fix nonlocal unsoundness vs. atomic store
Browse files Browse the repository at this point in the history
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, so LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states. If using `&mut [u8]` to write uninitialized
memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it
could even result in safe code incorrectly reading uninitialized memory!

To guarantee program soundness, code using nontemporal stores must
currently use sfence in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and a motivation to choose lowerings that do not require explicit sfence.
  • Loading branch information
workingjubilee committed Aug 8, 2023
1 parent 748a7a0 commit d55c6b7
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/x86_64.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,12 @@ pub fn memset(slice: &mut [u8], byte: u8) {

let ptr = get_impl().load(Ordering::Relaxed);
let ptr = unsafe { mem::transmute::<usize, FnSig>(ptr) };
ptr(slice, byte)
ptr(slice, byte);

// Required before returning to code that may set atomic flags that invite concurrent reads,
// as LLVM will not lower `atomic store ... release`, thus `AtomicBool::store(true, Release)`
// on x86-64 to emit sfence, even though it is required in the presence of nontemporal stores.
unsafe { _mm_sfence() };
}

#[repr(packed)]
Expand Down

0 comments on commit d55c6b7

Please sign in to comment.