-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nontemporal stores behave incorrectly in their interaction with concurrency primitives #64521
Comments
To be clear, this is slightly more than just a documentation bug, as
And Ralf's primary concern is about load and store reorderings due to optimizations not being mindful of these weaker-than-TSO orderings described by MOVNTDQ, MOVNTI, and so on, and the need for SFENCE to not be moved in some cases. It's not even clear how much we can rely on inline assembly (when emitted as separate statements or present in separate functions) not being reordered or elided by such compiler optimizations. Thus, Ralf saying, "The LLVM LangRef doesn't document how so-and-so interacts with concurrency..." might, because he is a concurrency expert yet does not feel he knows how to resolve the concurrency problems here, be rendered in a more conversational dialect of English more like, "So, how fucked are we? Because this situation seems fucked up beyond all recognition." Ideally, there would be a way to force |
Yeah, the current status seems to be that a
I haven't looked into that at all -- how "strange" do nontemporal stores behave on those ISAs? Are they somehow weaker than regular non-atomic accesses, or is it truly just a hint to the CPU to not keep this cacheline cached? |
I also haven't looked into nontemporal loads at all, and whether they can cause any havoc. |
re: nontemporal stores on AArch64:
re: nontemporal loads on AArch64:
|
Wasn't it noted that it wasn't just dependencies that don't interact properly, but a stale value of the address register could be used? |
…ouxu,Amanieu,Jubilee nontemporal_store: make sure that the intrinsic is truly just a hint The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores. ~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~ Fixes rust-lang#114582 Cc `@Amanieu` `@workingjubilee`
Rollup merge of rust-lang#128149 - RalfJung:nontemporal_store, r=jieyouxu,Amanieu,Jubilee nontemporal_store: make sure that the intrinsic is truly just a hint The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores. ~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~ Fixes rust-lang#114582 Cc `@Amanieu` `@workingjubilee`
…ieu,Jubilee nontemporal_store: make sure that the intrinsic is truly just a hint The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores. ~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~ Fixes rust-lang/rust#114582 Cc `@Amanieu` `@workingjubilee`
We definitely need fences there, see: https://doc.rust-lang.org/core/arch/x86/fn._mm_sfence.html Even more interesting, the discussion on NT stores in Rust: rust-lang/rust#114582 And on broken NT stores in LLVM: llvm/llvm-project#64521 Oh boy ...
…ieu,Jubilee nontemporal_store: make sure that the intrinsic is truly just a hint The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores. ~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~ Fixes rust-lang/rust#114582 Cc `@Amanieu` `@workingjubilee`
The LLVM LangRef doesn't document how
!nontemporal
stores are intended to interact with concurrency primitives. The current interactions are extremely surprising, basically making!nontemporal
stores even less ordered than "non-atomic" stores:According to all the usual concurrency rules, that last load must see the store. However, the way LLVM compiles this program on x86, it has a data race: the fences become NOPs and the relaxed accesses become regular MOV, so we end up with
MOVNT; MOV
in thread A, which the CPU is allowed to reorder (see e.g. this long and detailed post on MOVNT) -- meaning that thread B might see the flag write but then fail to see the data store!In other words,
MOVNT
violates TSO, but the compilation scheme LLVM (and everyone else) uses for release/acquire synchronization relies on TSO. Together this leads to rather unpredictable semantics. Are nontemporal stores meant to completely bypass normal memory model rules (in which case they are super dangerous to use anywhere), or are they meant to follow the usual rules (in which case LLVM needs to ensure there is an sfence between each nontemporal store and later release operations)?The text was updated successfully, but these errors were encountered: