Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safe wrapper for atomic_compilerfence intrinsics #41092

Merged
merged 7 commits into from
Apr 9, 2017

Conversation

jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Apr 5, 2017

This PR adds a proposed safe wrapper for the atomic_singlethreadfence_* intrinsics introduced by RFC #888. See #41091 for further discussion.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@jonhoo jonhoo force-pushed the std-fence-intrinsics branch from 4deac84 to 2598e45 Compare April 5, 2017 19:47
@sfackler
Copy link
Member

sfackler commented Apr 5, 2017

Can these be added in a normal function? It seems like the intrinsic itself would have to be placed inline, right? Will the inline pass do the right thing?

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 5, 2017

That's a good point. You may be right that #[inline(always)] is necessary on the wrapper. It'd be good to loop in @tari who filed the original intrinsics PR, since I don't have enough in-depth knowledge about how the LLVM integration works to say for sure.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 5, 2017

Also, travis fails because I added this under a new feature, and it says

Unstable feature 'std_compiler_fences' needs to have a section in The Unstable Book

I'll hold on with adding it there until we've figured out the basics.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 5, 2017
@alexcrichton
Copy link
Member

Looks good to me!

I'd be surprised if these were required to be inlined, but I also am unfamiliar with where you'd use them and the LLVM implementation.

@tari
Copy link
Contributor

tari commented Apr 6, 2017

I wouldn't think it needs to be always inlined, since if LLVM doesn't have visibility into this function it would be incorrect to optimize memory accesses around the function since it couldn't prove there are no barriers inside it (which is true of any function). If it does have visibility this will probably get inlined, which is fine.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 6, 2017

@tari Yeah, that matches my expectations too.
@alexcrichton and @sfackler, I just pushed an entry for the unstable book, and in the process renamed the unstable feature to compiler_barriers to match the function name. If Travis doesn't complain, I think I'm happy with this.

@jonhoo jonhoo force-pushed the std-fence-intrinsics branch from 807f236 to f6d262a Compare April 6, 2017 07:37
`v1 == 1` (since `thread2`'s store happened before its `load`, and its
load did not see `thread1`'s store). However, the code as written does
*not* guarantee this, because the compiler is allowed to re-order the
store and load within each thread. To enforce this particular behavior,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardware can still make this reordering regardless of what the compiler does. A better example might be something like

static FLAG: AtomicUsize = ATOMIC_USIZE_INIT;

// thread1:
// write some stuff non atomically
// need single thread fence here
FLAG.store(1, Ordering::Relaxed);

// thread1 signal handler
if (FLAG.load(Ordering::Relaxed) == 1) {
    // need single thread fence here
    // read stuff
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. In fact, even the example above is wrong for the same reason (you would need to use volatile_*). Maybe I'm finding myself agreeing more with the original RFC's conclusion that this is not something that people should generally need (at least until Rust start supporting signal handlers)...

@alexcrichton
Copy link
Member

Wow thanks for the thorough documentation @jonhoo! Looks great to me, but did you want to update with @parched's comment?

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 7, 2017

@alexcrichton Following his comments, I'm pretty sure all the examples I gave in the entry for the book are actually wrong, because the hardware is still free to re-arrange the loads and stores (you need to use volatile_read and volatile_write to make them correct). I'll try and come up with some better text to push, though with that realization (and as mentioned in the follow-up comment to @parched), I'm not as sure any more that this is sufficiently useful to have its own safe wrapper... Thoughts on that would be welcome.

@brson
Copy link
Contributor

brson commented Apr 7, 2017

Excellent docs @jonhoo. Thanks for writing them.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 7, 2017

Ultimately it might actually be good to move some of the explanation/examples from the unstable book entry into the documentation of compiler_barrier. Not sure if that's something we want to do now though? It would also make the documentation for that one function quite large (which maybe is okay?).

@tari
Copy link
Contributor

tari commented Apr 7, 2017

I'm not as sure any more that this is sufficiently useful to have its own safe wrapper... Thoughts on that would be welcome.

While most (possibly all?) operations you might do that need a single-threaded fence will involve unsafe code, this change also implies a future state where use of these functions is stable whereas current use of them requires a nightly compiler- that alone makes it worthwhile in my view.

Whether the new function should be unsafe or not, I'd say it should be safe because it does not directly perform any potentially-unsafe operations (even though it will almost always accompany very unsafe operations).


A few other situations where you might have preemption on a single thread might be if you're implementing green threads or handling interrupts (say, with an extern "msp430-interrupt" or extern "x86-interrupt" function) and passing state on the same CPU.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 7, 2017

@tari Sorry, I meant whether adding a wrapper in std is necessary/useful. But yes, I suppose you're right. As for interrupts/green threads, I added a sentence to that effect in e6597e1.

@steveklabnik
Copy link
Member

Ultimately it might actually be good to move some of the explanation/examples from the unstable book entry into the documentation of compiler_barrier. Not sure if that's something we want to do now though? It would also make the documentation for that one function quite large (which maybe is okay?).

Once it's made stable, these docs would have to go somewhere. Large docs on a function is fine with me, personally. You could also move some of it to the module itself. This seems fine for now though.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 7, 2017

@steveklabnik is there a documented process for how we transition from an unstable to a stable feature?

It seems Travis is also happy. Unless there are further objections (@tari? @parched?), I think this is ready to merge.

Copy link
Contributor

@parched parched left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in your original first example that people would normally use volatile_* but that relies on the implicit assumption that the locations are in strongly ordered memory which is usually the case for memory mapped peripherals like you described. So your compiler barrier would have worked and been better than volatile because if your code made 2 reads of the address the compiler could've just made it 1.

I was also wondering if you had a reason for choosing the name compiler_barrier rather than single_thread_fence like LLVM calls it. I'm unsure which I prefer.

semantics, the compiler may be disallowed from moving reads or writes
from before or after the call to the other side of the call to
`compiler_barrier`. Note that it does **not** prevent the *hardware*
from doing such re-orderings -- for that, the `volatile_*` class of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile doesn't help in that regard, it's just another form of compiler barrier really.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, you're right. I've modified the text locally, but waiting to push until we settle on the things below.

`AtomicPtr` that starts as `NULL`), it may also be unsafe for the
compiler to hoist code using `IMPORTANT_VARIABLE` above the
`IS_READY.load`. In that case, a `compiler_barrier(Ordering::Acquire)`
should be placed at the top of the `if` to prevent this optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need second compiler barrier regardless of the type. Also IMPORTANT_VARIABLE doesn't need to be atomic only the variable that you are synchronzing on (IS_READY).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, come to think of it, I'm not sure either variable needs to be atomic? It's convenient because Rust doesn't easily have mutable statics otherwise, but this should also work correctly even if they're both "normal" memory locations.

As for the need for the second compiler barrier, I'm not sure I see why? It's fine for the load of IMPORTANT_VARIABLE to occur before the load of IS_READY as long as IMPORTANT_VARIABLE isn't used (which it wouldn't be if IS_READY=0), no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS_READY needs to be atomic otherwise it could be partially written by main leaving it in some indeterminate state when the signal handler runs and tries to read it.

Yes it's fine if it isn't used, but what if it is used when IS_READY == 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for it to be used if IS_READY == 1 — the write to IMPORTANT_VARIABLE must have succeeded, so it should contain the right value. Even if you read it before you read IS_READY.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see what you mean. I was thinking in the more general case where the context could switch back again between the two. Something like an RTOS on a single core, but in that case you should just use a memory fence instead because they will basically be free on single core, and when you go to multi core your code won't break.

For a signal handler like this the whole thing is effectively atomic with respect to thread1 which also means the load of IS_READY doesn't need to be atomic either, only the store does.

I'm not sure what you mean about the pointer though. The compiler can't hoist out the pointer dereference unless it already know it's a valid pointer, otherwise just normal single threaded code would break, wouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, of course, you're right. I'll remove the point about pointers.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 7, 2017

With regards to the old device memory location example, I'm not sure that's quite right. The compiler barrier doesn't prevent the hardware from loading the data before storing the address (I think?), because they aren't dependent, and thus their strict ordering isn't required. You are right that volatile might not have been sufficient either though.

For compiler_barrier vs single_thread_fence, I deemed the former to be more unambiguous. To me, single_thread_fence suggests that it actually works somewhat like a fence, in that it also prevents hardware re-ordering, whereas compiler_barrier very clearly does not have such an implication.

@parched
Copy link
Contributor

parched commented Apr 7, 2017

Yes I mean if volatile was sufficient (because it was known to strongly ordered memory and hence no hardware reordering) then your example should have worked too and potentially had some advantage over volatile.

Yeah, I think compiler_barrier probably is better too. Well I guess i take 'fence' to mean 'stops observable reordering' which it does for single threads. I was just wondering if single_thread_fence might be more likely to stop people accidentally using them in a multithreaded context.

Copy link
Contributor

@parched parched left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhoo it all looks good to me now.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 8, 2017

I'm guessing @sfackler is supposed to be the one to merge this?

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Apr 8, 2017

📌 Commit f093d59 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Apr 8, 2017

⌛ Testing commit f093d59 with merge b2d9b63...

bors added a commit that referenced this pull request Apr 8, 2017
Add safe wrapper for atomic_compilerfence intrinsics

This PR adds a proposed safe wrapper for the `atomic_singlethreadfence_*` intrinsics introduced by [RFC #888](rust-lang/rfcs#888). See #41091 for further discussion.
@bors
Copy link
Contributor

bors commented Apr 9, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing b2d9b63 to master...

@bors bors merged commit f093d59 into rust-lang:master Apr 9, 2017
@bstrie
Copy link
Contributor

bstrie commented Apr 12, 2017

Pasting a concern found elsewhere:

The naming of these seems surprising: the multithreaded functions (and both the single and multithreaded intrinsics themselves) are fences, but this is a barrier. It's not incorrect, but the latter is both inconsistent with the existing functions and slightly confusing with another type in std: https://doc.rust-lang.org/std/sync/struct.Barrier.html. Alternatives would be the "obvious" compiler_fence, or C++'s atomic_signal_fence, or singlethread_fence (or fence_singlethread, etc.).

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 12, 2017

I'd be happy to rename it to compiler_fence. Submitted a PR for the rename over at #41262. @parched @tari ?

@tari
Copy link
Contributor

tari commented Apr 13, 2017

I'm not attached to any particular name; @bstrie provides the most compelling case I've seen either way, so the change seems good.

jonhoo added a commit to jonhoo/rust that referenced this pull request Apr 13, 2017
This addresses concerns raised following the merge of rust-lang#41092.
Specifically:

> The naming of these seems surprising: the multithreaded functions (and
> both the single and multithreaded intrinsics themselves) are fences,
> but this is a barrier. It's not incorrect, but the latter is both
> inconsistent with the existing functions and slightly confusing with
> another type in std (e.g., `Barrier`).

`compiler_fence` carries the same semantic implication that this is a
compiler-only operation, while being more in line with the fence/barrier
concepts already in use in `std`.
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 18, 2017
…lexcrichton

Rename compiler_barrier to compiler_fence

This addresses concerns raised following the merge of rust-lang#41092. Specifically:

> The naming of these seems surprising: the multithreaded functions (and both the single and multithreaded intrinsics themselves) are fences, but this is a barrier. It's not incorrect, but the latter is both inconsistent with the existing functions and slightly confusing with another type in std (e.g., `Barrier`).

`compiler_fence` carries the same semantic implication that this is a compiler-only operation, while being more in line with the fence/barrier concepts already in use in `std`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants