-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safe wrapper for atomic_compilerfence intrinsics #41092
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
4deac84
to
2598e45
Compare
Can these be added in a normal function? It seems like the intrinsic itself would have to be placed inline, right? Will the inline pass do the right thing? |
That's a good point. You may be right that |
Also, travis fails because I added this under a new feature, and it says
I'll hold on with adding it there until we've figured out the basics. |
Looks good to me! I'd be surprised if these were required to be inlined, but I also am unfamiliar with where you'd use them and the LLVM implementation. |
I wouldn't think it needs to be always inlined, since if LLVM doesn't have visibility into this function it would be incorrect to optimize memory accesses around the function since it couldn't prove there are no barriers inside it (which is true of any function). If it does have visibility this will probably get inlined, which is fine. |
@tari Yeah, that matches my expectations too. |
807f236
to
f6d262a
Compare
`v1 == 1` (since `thread2`'s store happened before its `load`, and its | ||
load did not see `thread1`'s store). However, the code as written does | ||
*not* guarantee this, because the compiler is allowed to re-order the | ||
store and load within each thread. To enforce this particular behavior, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardware can still make this reordering regardless of what the compiler does. A better example might be something like
static FLAG: AtomicUsize = ATOMIC_USIZE_INIT;
// thread1:
// write some stuff non atomically
// need single thread fence here
FLAG.store(1, Ordering::Relaxed);
// thread1 signal handler
if (FLAG.load(Ordering::Relaxed) == 1) {
// need single thread fence here
// read stuff
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. In fact, even the example above is wrong for the same reason (you would need to use volatile_*
). Maybe I'm finding myself agreeing more with the original RFC's conclusion that this is not something that people should generally need (at least until Rust start supporting signal handlers)...
@alexcrichton Following his comments, I'm pretty sure all the examples I gave in the entry for the book are actually wrong, because the hardware is still free to re-arrange the loads and stores (you need to use |
Excellent docs @jonhoo. Thanks for writing them. |
Ultimately it might actually be good to move some of the explanation/examples from the unstable book entry into the documentation of |
While most (possibly all?) operations you might do that need a single-threaded fence will involve unsafe code, this change also implies a future state where use of these functions is stable whereas current use of them requires a nightly compiler- that alone makes it worthwhile in my view. Whether the new function should be A few other situations where you might have preemption on a single thread might be if you're implementing green threads or handling interrupts (say, with an |
Once it's made stable, these docs would have to go somewhere. Large docs on a function is fine with me, personally. You could also move some of it to the module itself. This seems fine for now though. |
@steveklabnik is there a documented process for how we transition from an unstable to a stable feature? It seems Travis is also happy. Unless there are further objections (@tari? @parched?), I think this is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct in your original first example that people would normally use volatile_*
but that relies on the implicit assumption that the locations are in strongly ordered memory which is usually the case for memory mapped peripherals like you described. So your compiler barrier would have worked and been better than volatile because if your code made 2 reads of the address the compiler could've just made it 1.
I was also wondering if you had a reason for choosing the name compiler_barrier
rather than single_thread_fence
like LLVM calls it. I'm unsure which I prefer.
semantics, the compiler may be disallowed from moving reads or writes | ||
from before or after the call to the other side of the call to | ||
`compiler_barrier`. Note that it does **not** prevent the *hardware* | ||
from doing such re-orderings -- for that, the `volatile_*` class of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile doesn't help in that regard, it's just another form of compiler barrier really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, you're right. I've modified the text locally, but waiting to push until we settle on the things below.
`AtomicPtr` that starts as `NULL`), it may also be unsafe for the | ||
compiler to hoist code using `IMPORTANT_VARIABLE` above the | ||
`IS_READY.load`. In that case, a `compiler_barrier(Ordering::Acquire)` | ||
should be placed at the top of the `if` to prevent this optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need second compiler barrier regardless of the type. Also IMPORTANT_VARIABLE
doesn't need to be atomic only the variable that you are synchronzing on (IS_READY
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, come to think of it, I'm not sure either variable needs to be atomic? It's convenient because Rust doesn't easily have mutable statics otherwise, but this should also work correctly even if they're both "normal" memory locations.
As for the need for the second compiler barrier, I'm not sure I see why? It's fine for the load
of IMPORTANT_VARIABLE
to occur before the load
of IS_READY
as long as IMPORTANT_VARIABLE
isn't used (which it wouldn't be if IS_READY=0
), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IS_READY
needs to be atomic otherwise it could be partially written by main leaving it in some indeterminate state when the signal handler runs and tries to read it.
Yes it's fine if it isn't used, but what if it is used when IS_READY == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine for it to be used if IS_READY == 1
— the write to IMPORTANT_VARIABLE
must have succeeded, so it should contain the right value. Even if you read it before you read IS_READY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see what you mean. I was thinking in the more general case where the context could switch back again between the two. Something like an RTOS on a single core, but in that case you should just use a memory fence instead because they will basically be free on single core, and when you go to multi core your code won't break.
For a signal handler like this the whole thing is effectively atomic with respect to thread1 which also means the load of IS_READY
doesn't need to be atomic either, only the store does.
I'm not sure what you mean about the pointer though. The compiler can't hoist out the pointer dereference unless it already know it's a valid pointer, otherwise just normal single threaded code would break, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, of course, you're right. I'll remove the point about pointers.
With regards to the old device memory location example, I'm not sure that's quite right. The compiler barrier doesn't prevent the hardware from loading the data before storing the address (I think?), because they aren't dependent, and thus their strict ordering isn't required. You are right that For |
Yes I mean if volatile was sufficient (because it was known to strongly ordered memory and hence no hardware reordering) then your example should have worked too and potentially had some advantage over volatile. Yeah, I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonhoo it all looks good to me now.
I'm guessing @sfackler is supposed to be the one to merge this? |
@bors: r+ |
📌 Commit f093d59 has been approved by |
Add safe wrapper for atomic_compilerfence intrinsics This PR adds a proposed safe wrapper for the `atomic_singlethreadfence_*` intrinsics introduced by [RFC #888](rust-lang/rfcs#888). See #41091 for further discussion.
☀️ Test successful - status-appveyor, status-travis |
Pasting a concern found elsewhere:
|
I'm not attached to any particular name; @bstrie provides the most compelling case I've seen either way, so the change seems good. |
This addresses concerns raised following the merge of rust-lang#41092. Specifically: > The naming of these seems surprising: the multithreaded functions (and > both the single and multithreaded intrinsics themselves) are fences, > but this is a barrier. It's not incorrect, but the latter is both > inconsistent with the existing functions and slightly confusing with > another type in std (e.g., `Barrier`). `compiler_fence` carries the same semantic implication that this is a compiler-only operation, while being more in line with the fence/barrier concepts already in use in `std`.
…lexcrichton Rename compiler_barrier to compiler_fence This addresses concerns raised following the merge of rust-lang#41092. Specifically: > The naming of these seems surprising: the multithreaded functions (and both the single and multithreaded intrinsics themselves) are fences, but this is a barrier. It's not incorrect, but the latter is both inconsistent with the existing functions and slightly confusing with another type in std (e.g., `Barrier`). `compiler_fence` carries the same semantic implication that this is a compiler-only operation, while being more in line with the fence/barrier concepts already in use in `std`.
This PR adds a proposed safe wrapper for the
atomic_singlethreadfence_*
intrinsics introduced by RFC #888. See #41091 for further discussion.