Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a MarkableArcCell to allow building reference-counted concurrent collections #80

Closed
wants to merge 9 commits into from

Conversation

Vtec234
Copy link
Member

@Vtec234 Vtec234 commented Aug 1, 2016

What?

This PR adds a new type to crossbeam::sync - the MarkableArcCell. This type provides all the functionality of ArcCell and more. In particular, it allows one to mark the cell with a boolean value and then perform atomic compare_exchange operations on the (Arc<T>, bool) pair. It is based on a type from the java.util.concurrent.atomic module called the AtomicMarkableReference (docs) and is supposed to provide the same kind of functionality.

I also made a tiny addition to ArcCell - ArcCell::with_val(v) is equivalent to ArcCell::new(Arc::new(v)). I think it's nicer not to have to expliciitly initialize the Arc each time, but if the change is unwanted I don't mind removing it as it's not too important.

Why?

This type allows us to build concurrent lock-free collections as described in "The Art of Multiprocessor Programming" by M. Herlihy and N. Shavit. The book provides code in Java, which is a garbage-collected language. The big problem in writing such structures in non-GC languages is memory cleanup. In my opinion, the easiest to use equivalent of a GC is atomic reference counting, provided by Arc<T>.

I have already written a lock-free linked-list-based map and will publish it after this PR or its equivalent is merged, as the map depends on the MarkableArcCell. This will allow us to then create more performant maps, such as skiplist-based ones and stabilize the interface for a concurrent map in Rust, as discussed in #41, after which work on a lock-free hashmap could begin in crossbeam.

I have made some decisions regarding the interface of MarkableArcCell which might raise questions, so please ask and I will try to justify them.

// Locks the internal spinlock.
fn take(&self) -> Arc<T> {
loop {
match self.0.swap(0, Ordering::Acquire) {
Copy link
Member

@schets schets Aug 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't lock free, a suspended thread between a take/put operation can block others here arbitrarily.

While this isn't inherently wrong, this does mean that collections written with this won't be lock-free either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and we should consider other ways of creating a markable pointer.

@schets
Copy link
Member

schets commented Aug 1, 2016

What's the use case for this? The only literature that I found used this as some sort of atomic Optional. Are there uses for this where the nullness of the pointer and the marker status differ?

Is there a reason this must be reference counted instead of using the epoch memory management?

If there are any guarantees on heap pointer alignment in rust, this could also be done by using unused bits and would be implementable in a lock-free manner. If/when double word atomics are added they could make this current implementation lock free as well.

@Vtec234
Copy link
Member Author

Vtec234 commented Aug 1, 2016

If there are any guarantees on heap pointer alignment in rust, this could also be done by using unused bits and would be implementable in a lock-free manner.

Yes, a tagged pointer would be an optimization over two values, but the reason I use a spinlock is not that there are two values. When you look at ArcCell, it also spinlocks. The problem is in the Arc->AtomicUsize conversion.

Consider the situation:
Thread 1 reads the usize and transmutes it to an Arc. Assuming no other Arc copies are present, the refcount is 1. Then it suspends. Thread 2 comes in, and uses set_arc (or set in ArcCell) to set a new value. set_arc returns the old value, which Thread 2 then drops. The refcount is 0, and Thread 1 holds an invalid value.
The spinlock protects from this, since between a take and put the value is guaranteed to be valid.

The ability to make this lock-free therefore depends not on squashing the boolean into the unused bits of the Arc pointer, but on making ArcCell lock-free. It would be great if somebody could do that, unfortunately I have no idea how at the moment.

If/when double word atomics are added they could make this current implementation lock free as well.

I wouldn't count on hardware support for this, unfortunately, but as explained above a DCAS is not necessary.

What's the use case for this?

The problem I am trying to solve is that we need a stronger guarantee than just knowing that the memory we have a pointer to can be accessed, because while the memory may be valid, it might've been logically deleted from the list.

When holding a mem::epoch::Shared pointer, we know that the node has not been deallocated, but we are not sure whether it is still in the list. Inserting a new node after a node that has been logically deleted will have no effect, and would therefore be invalid. A marker bit solves this, as we can use an atomic operation to make sure that we are setting the next pointer of the previous node to the new one only if the previous node's marker is false, which indicates still being in the list. A remove procedure would switch that bit to true.

In other words, we need a way to make sure that the pointer we hold is still accessible from the list when modifying it.

My explanation is probably a bit wonky, so please see 4.3.3.1, p. 57 in this.

Is there a reason this must be reference counted instead of using the epoch memory management?

Probably not, but to my knowledge epoch memory management would require a similar addition of a pointer-boolean pair for this to work, unless it already contains something like this. Due to the MarkableArcCell not actually being lock-free as you remarked, we might consider adding a markable pointer to mem::epoch instead.

EDIT: In fact, I think this could be done with some additions mem::epoch::Atomic. I'll think about it some more, but that might be a better option.

@Vtec234 Vtec234 mentioned this pull request Aug 2, 2016
@schets
Copy link
Member

schets commented Aug 2, 2016

I'm at work right now so this will be short, but I think that this commit is really doing two things at once - it's adding the idea of a markable pointer, and also the idea of an Atomic Arc. Would it make sense to separate them? If only a markable pointer was needed it would be far cheaper than the AtomicMarkableArc for loads/stores, and the AtomicArc also wouldn't have to deal with the overhead of being markable.

@Vtec234
Copy link
Member Author

Vtec234 commented Aug 2, 2016

it's adding the idea of a markable pointer, and also the idea of an Atomic Arc.

If by AtomicArc you mean a type that provides atomic storage and retrieval of an Arc, then, well, it's already in crossbeam and it's called ArcCell. It's true that it doesn't provide CAS, but it might be extended to do that as well. Anyhow, this PR doesn't add that idea, as it's already there.

A markable pointer type is something that is indeed not yet present, and it's the aim of this PR to add one.

If only a markable pointer was needed it would be far cheaper than the AtomicMarkableArc for loads/stores, and the AtomicArc also wouldn't have to deal with the overhead of being markable.

The aim of this PR is to build a markable pointer that is already integrated with a cleanup scheme (not simply a ptr-boolean pair that has to be freed like a normal ptr) and the current choice is ref-counting or the epoch GC. It's built on top of ArcCell, because I want to try to build some ref-counted structures and see how they hold up.

An alternative to ref-counting, as you mentioned, is a GC (e.g. the epoch GC), and a similar marked pointer type might be added to mem::epoch, but it's a matter of benchmarking to see whether it is really cheaper - no refcounts, but GC sweeps, etc.
You also rightly said that it's not lock-free, so I made an ArcCell with a wait-free get() and I'll try to rewrite the markable variant on top of that.

@Vtec234
Copy link
Member Author

Vtec234 commented Aug 25, 2016

Finally got around to writing a more concurrency-friendly version of MarkableArcCell. In this one, all read methods are wait-free and all methods which write the Arc part are wait-free with respect to each other (other writes), but will lock until all readers are done. compare_arc_exchange_mark, which writes only the marker bit, may loop a little due to a CAS operation, so i'm not sure whether it can be classified as wait-free, but it's definitely lock-free.

With these guarantees, the guarantees that a skiplist as described in "The Art of Multiprocessor Programming" by M. Herlihy and N. Shavit implemented using this type would provide are:
find/contains - always lock-free
concurrent removes and inserts - won't block
concurrent writes with finds - writes may block until reads are done

What do you think?

@arthurprs
Copy link

I think this would be more appealing if you could link the code of the use cases you said to have. Code can be enlightening.

@Vtec234
Copy link
Member Author

Vtec234 commented Nov 10, 2016

@arthurprs You're right. I have uploaded a repository containing a concurrent linked list based on the MarkableArcCell: SortedLazyList. It's just a PoC, so I might've gotten my lock-freedom guarantees wrong, there might be undetected race conditions (even though the tests seem to pass), etc., but it demonstrates how data structures can be built based on this primitive. It's also terribly slow, but it could be extended relatively easily into the skiplist I mentioned earlier, and skiplists are not bad at all.
I'm sorry for taking so long, but I've been busy with uni and other silly things.

@arthurprs
Copy link

That's great. But I'm confused, you say the linked list is terribly slow so how the skip list would fare better?

@arthurprs
Copy link

Oh I see it's not actually a linked list, just similar name.

@Vtec234
Copy link
Member Author

Vtec234 commented Nov 10, 2016

Yes, skiplists are essentialy made by using several linked lists internally, each one with a different number of elements. Thanks to this and some other properties, skiplists scale much better than linked lists.

@arthurprs
Copy link

arthurprs commented Nov 23, 2016

I see the use case more clearly now, thanks. Does this scheme perform reasonably well in practice though?

@Vtec234
Copy link
Member Author

Vtec234 commented Jan 1, 2017

The linked list itself is not that useful, but good as a starting point. I used it in that way to write a map implemented as a skiplist using pretty much the same techniques as the linked list does. The problem with that map is that it is only lock-free if the underlying primitive (MarkableArcCell) is lock-free, and at the moment it is not (Please disregard the documentation on my repo, it assumes that MarkableArcCell is lock-free. Also, I was wrong when stating here earlier that it is). It nevertheless seems to perform reasonably well, maintaining the O(log n) complexity of a skiplist, although I hadn't yet done any significant optimizations or benchmarked it properly, since I focused on correctness when writing it. Like the linked list, it's based on "The Art of Multiprocessor Programming".

Now, even though the skiplist is more or less fast, we want it to perform well under contention, which can only be guaranteed by lock-freedom. I can see two ways of achieving this. One is to fix MarkableArcCell to be lock-free. I am unsure whether that's even possible, but feel free to try.

The other way is to replace the memory recollection scheme with crossbeam's epoch GC. That seems like the way to go for now, since plain pointers (epoch::Atomic) can trivially be paired with a bool (we cannot avoid the Markable (ptr, bool) primitive for skiplists) in a lock-free atomic primitive. I will probably PR something like MarkableAtomic soon and then try to rewrite the skiplist on top of that. This will have the additional advantage of being most likely faster. (Link taken from aturon's excellent blog post), but some disadvantages with respect to interface design (&ValueType cannot be returned, only ValueType).

TL;DR concurrency is hard

@Vtec234 Vtec234 mentioned this pull request Jan 6, 2017
@arthurprs
Copy link

@Vtec234 should this be closed (same as #102)?

@Vtec234
Copy link
Member Author

Vtec234 commented Jan 26, 2017

@arthurprs This primitive is integrated with a different memory-recollection scheme - reference counting. If the authors of crossbeam do not want to look into this scheme, since it's quite a bit slower, this should be closed, but i'll keep it open for now as reference counting has some advantages over garbage collection, primarily having more predictable recollection times.

@Vtec234
Copy link
Member Author

Vtec234 commented Mar 7, 2018

More general types have been proposed in crossbeam-rs/rfcs#28.

@Vtec234 Vtec234 closed this Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants