Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArcCell: wait-free get() #81

Closed
wants to merge 2 commits into from
Closed

Conversation

Vtec234
Copy link
Member

@Vtec234 Vtec234 commented Aug 2, 2016

As an idea I got from @schets in #80, I wrote a slightly less locking implementation of ArcCell. In this version:
get() is always wait-free.
set() is wait-free with respect to other set()s. Whenever any get() arrives, the set() will block and have to wait until no concurrent get()s are active, behaving like the original implementation would in the degenerate case of the spinlock always going to get()s.
Since this will scale better for many readers, but most likely worse for many writers, I may also PR this as a new type, even though the change is backwards-compatible in terms of behaviour.

Copy link

@arthurprs arthurprs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice, maybe we should extend it with a cas-like function too.

debug_assert_eq!(self.0.load(Ordering::SeqCst), 0);
self.0.store(unsafe { mem::transmute(t) }, Ordering::Release);
/// Create a new `ArcCell` from the given `Arc` interior.
pub fn with_val(v: T) -> ArcCell<T> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to shorten value to val.

@arthurprs
Copy link

I was looking at this again today and was wondering if it makes sense to set the high bit if the writer can't acquire the lock after a while. A reader seeing that should backoff.

@mstewartgallus
Copy link

If you already have an Arc then isn't what you want to implement https://en.wikipedia.org/wiki/Read-copy-update ?

@schets
Copy link
Member

schets commented Mar 12, 2017

@sstewartgallus RCU requires a memory management scheme to go along with the read-copy-update part. Imagine that you load the Arc ptr, stall, somebody drops the arc, and then resume trying to acquire the pointer. The Linux kernel scheme for doing so is often referred to RCU as a whole. Crossbeam EBR could also be used to manage the memory. This is managing it independently of the EBR with what's essentially a second refcounting scheme.

The orderings here are subtly incorrect. In get, the SeqCst load doesn't prevent the refcount increment from reordering past it, see generated code here. SeqCst on a load is essentially an acquire load that's also ordered with respect to all other SeqCst operations and observe a global order of all SeqCst operations.

In an ideal world an Acquire on the fetch_add should work, but on aarch64 the load could still reorder behind the store part of the ll/sc pair (it's basically load-linked-acquire->store-conditional). aarch64 rmw orderings are extremely screwy and a major stretch of the c++11 atomic memory model, but oh well. I guess acquire only applies to the load ordering technically. I'm not sure of a performance-friendly way to do this without platform specific code.

old
unsafe {
let t: usize = mem::transmute(t);
let old: Arc<T> = mem::transmute(self.ptr.swap(t, Ordering::Acquire));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ordering is subtly incorrect, see my comment in the main thread for why. There's a crate atomic_utilities that has a fence/ordering combo that will create an optimal rmw setup that is ordered-before future loads and stores. I'm adding docs to it now but there's an example in the code.

@arthurprs
Copy link

@schets I was checking this thread, what's wrong with the linked code? I get

load(std::atomic<int>&, std::atomic<int>&):
        nop nop nop

.L2:
        ldxr    w2, [x1]
        add     w2, w2, 1
        stxr    w3, w2, [x1]
        cbnz    w3, .L2
        nop nop nop

        ldar    w0, [x0]
        nop nop nop

.L3:
        ldxr    w2, [x1]
        sub     w2, w2, #1
        stxr    w3, w2, [x1]
        cbnz    w3, .L3
        nop nop nop

        ret

I know little about arm64 assembly but it doesn't look unordered to me.

@schets
Copy link
Member

schets commented Apr 17, 2017

@arthurprs ldar prevents memory accesses after the load from happening before it but doesn't do anything about memory accesses before. Arm also allows loads to speculatively move before branches and loads are allowed to move before an stxr (it can return true before the store is finished, meaning speculative execution isn't required for this problem to manifest).

Given all that, it's possible for the ldar instruction to be ordered-before the stxr and I'm pretty sure the ldxr instruction, so it's possible for the SeqCst load to happen partially or completely before the fetch_add/swap. That's definitely a problem in get(), and I believe a problem in set as well (less clear on that one).

@arthurprs
Copy link

arthurprs commented Apr 17, 2017

Ahh I misunderstood the "reorder", you meant inside the execution units. Just like I say, my arm64 skills suck.

You suggest using this, right? https://docs.rs/crate/atomic_utilities/0.5.0/source/src/fence_rmw.rs

@schets
Copy link
Member

schets commented Apr 19, 2017

That's what I've used for similar cases. It basically does:

  • acquire rmw on x86, since that's just an rmw + full fence all in one
  • relaxed rmw + seqcst fence, since every other architecture needs that sort of fencing or does it for acquire

@Vtec234 Vtec234 mentioned this pull request Dec 11, 2017
@Vtec234
Copy link
Member Author

Vtec234 commented Mar 7, 2018

More general types have been proposed in crossbeam-rs/rfcs#28.

@Vtec234 Vtec234 closed this Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants