ArcCell not lock-free #160

jethrogb · 2017-12-11T20:21:00Z

The current implementation of ArcCell is not lock-free. It uses a spin lock. I very much want to use this functionality, but I don't think this implementation should be included in crossbeam right now (and frankly, the fact that it's included right now is quite misleading).

The text was updated successfully, but these errors were encountered:

ghost · 2017-12-11T21:28:00Z

You're right. We should probably implement ArcCell<T> using differential reference counting, although that would also require implementing a custom Arc<T> type (the one from the standard library wouldn't be compatible).

Do you perhaps have an example of how you want to use ArcCell<T>? I'm asking to see what exactly you need so that we can accommodate your use case as well as possible.

jethrogb · 2017-12-11T21:33:26Z

I've needed something like this over and over again to store program-global configuration. gets are frequent and sets extremely rare. For example https://docs.rs/log4rs/0.7.0/src/log4rs/lib.rs.html#446-449

Alternatively, I think you can easily implement this using 128-bit atomics on platforms that support them. But I don't think those are available on stable Rust.

Vtec234 · 2017-12-11T22:40:19Z

I've attempted a partially wait-free version of ArcCell in #81, but note that it is currently slightly incorrect. Changing the faulty ordering (see PR thread) should fix it, so you could try it out.

That said, crossbeam is currently ongoing a major reorganisation and new types with similar functionality might (and will if people need them) appear in the future.

ghost · 2017-12-11T23:19:11Z

@Vtec234 I'm slightly worried that wait-free get() operations might starve set() operations forever. Wouldn't it be better to simply use parking_lot::RwLock<Arc<T>> instead?

@jethrogb Actually, I believe a primitive tentatively called HazardBox<T> might be a better fit for your use case. But it doesn't exist yet. :) The idea is to protect the shared object by thread-local hazard pointers to avoid cache invalidation (instead of updating a globally shared value on every read).

Vtec234 · 2017-12-11T23:31:01Z

@stjepang That should work, yes. My implementation is essentialy a very simple rw-lock and I'm sure much better implementations are possible (maybe based on HP, like you said).

vorner · 2018-04-01T19:30:18Z

Hello

I played with this a bit and I have an implementation that is lock-free (wait-free in the case of the readers) and it shouldn't be possible to starve the writers. It also works with the standard Arc. I could release it as a separate crate, but I think the code would be better cared for in crossbeam somewhere (I'm not sure where exactly, if there's a specific sub-crate where it fits).

So, if you'd like to accept it, I can create a pull request and place it somewhere (and rename it to ArcCell, to keep the consistency with the current implementation).

What do you think about it?

http://github.com/vorner/arc-swap

ghost · 2018-04-03T15:01:58Z

@vorner Hey, that looks neat! Designing an atomic shared pointer (like AtomicReference in Java) in a language without GC is a surprisingly difficult problem, and your approach is quite interesting.

I think the biggest drawback of ArcSwap (and atomic_shared_ptr in C++ for that matter) is that readers are updating the same atomic numbers, thus suffering from contention. In my benchmarks, loading from an uncontended ArcSwap takes 28 ns, but then it gets to 180 ns if another thread is loading from the same ArcSwap at the same time. The numbers would probably be similar with atomic_shared_ptr.

Consider the fact that loading from AtomicReference in Java takes less than 10 ns (it's just a typical load instruction on x86 machines), and our numbers quickly pale in comparison.

The solution is to not update any shared counters when loading from ArcSwap. In order words, loading shouldn't return an Arc<T>, but something like ArcSwapGuard<'a, T> instead, which can be dereferenced into a &T for reading or "upgraded" into a real Arc<T> if the user wishes so. I'm prototyping something similar here, if you're interested. Loading takes around 25 ns, and doesn't degrade at all in presence of concurrent readers. And there is no locking whatsoever - reading and updating is always lock-free.

vorner · 2018-04-07T06:35:02Z

Reading the hazard-pointer approach, it does look interesting. But I'm a bit nervous about two things ‒ the fact that an update may force a reader to loop (therefore the slow-path operation hurting the hot-path one, possibly making it stuck for a long time/forever with a steady inflow of changes), and that the chains of registries and thread entries can grow to long ones and I don't think they ever shrink back in the code. Anyway it is well possible this will be faster than my approach.

Is there any estimate on when your approach could be ready? I ask because I want to know if it makes sense for me to clean it up, add all the license files, and publish it, or (in case hazard-cell will be ready soon in crossbeam) if it's not worth it.

ghost · 2018-04-08T12:56:04Z

the fact that an update may force a reader to loop (therefore the slow-path operation hurting the hot-path one, possibly making it stuck for a long time/forever with a steady inflow of changes)

If updates are so frequent that they manage to force reads to spin in a loop, then updates kind of are the hot-path. :) But I see what you're saying, it's a fair concern that readers aren't wait-free.

that the chains of registries and thread entries can grow to long ones and I don't think they ever shrink back in the code.

Yes, the current prototype isn't tackling this problem. It's in the plan to shrink back the linked lists.

Is there any estimate on when your approach could be ready?

There are other things on my schedule at the moment, but I'd say in a few weeks.

I ask because I want to know if it makes sense for me to clean it up, add all the license files, and publish it, or (in case hazard-cell will be ready soon in crossbeam) if it's not worth it.

If you have a use case for ArcSwap, go ahead! In any case, we'll definitely want to benchmark our approaches and see how they compare. I don't expect either one to be a win in all scenarios (for example, ArcSwap will almost surely have faster uncontended updates).

acsearle · 2018-09-12T03:39:10Z

I'll throw my hat in the ring too. I have a reasonably complete AtomicArc now using differential reference counting, inspired by Folly::AtomicSharedPtr. I suspect it is close to what @stjepang was envisioning in the second comment. It has the same problem of reads degrading under contention, has to use its own reimplementation of Arc, and will only work on x86_64 and (hopefully) AArch64 using their spare pointer bits. On the brighter side, it also provides AtomicWeak, has no thread_local or global structures, and no requirements for the user to call anything outside the std::sync::atomic interface.

https://github.com/acsearle/barc

Might be an interesting comparison point if nothing else.

ghost · 2018-09-13T12:53:02Z

@acsearle Thanks for sharing -- that looks really good, I'm impressed!

So I've just whipped up a simple set of benchmarks to compare the performance of load operation with barc, with hazard pointer-based Arc, and with mutex-protected Arc. Results:

running 6 tests
test conc_load_barc   ... bench:         141 ns/iter (+/- 28)
test conc_load_hazard ... bench:          19 ns/iter (+/- 1)
test conc_load_mutex  ... bench:          39 ns/iter (+/- 25)
test load_barc        ... bench:          17 ns/iter (+/- 1)
test load_hazard      ... bench:          19 ns/iter (+/- 2)
test load_mutex       ... bench:          13 ns/iter (+/- 2)

I wonder what you think about using spare pointer bits in light of near-future migrations to 56-bit address spaces on x86-64 (and maybe even 64-bit). That seems like an unfortunate portability nightmare. :( There was an interesting discussion about this topic in the LuaJIT repo.

jethrogb · 2018-09-13T15:31:12Z

@acsearle have you considered an implementation using double-word atomics? You should be able to test with 64-bit atomics on 32-bit while we wait for rust-lang/rust#53514 to land

acsearle · 2018-09-13T22:34:41Z

Wow, the performance of parking_lot Mutex is phenomenal! Is it enough to just .lock() the mutex, or would the better comparison be with .lock().clone() so we get and destroy a usable object per loop, as returned by AtomicWeightedArc::load() and (I think) hazard's .get()? The disheartening result on contested load is consistent what I saw in my own jury-rigged benchmarks: inherently slow since each load has to win a compare_exchange.

I wasn't aware that losing those bits was imminent (there's a relatively recent talk by Chandler Carruth somewhere where he talks about how LLVM uses them). That's unfortunate. Re other architectures, I have thought about it a little. For both of these questions, it should be fairly easy to experiment since it requires only a platform-specific reimplementation of CountedPtr and CountedNonNull. Another option is using alignment bits, perhaps over-aligning the ArcInner allocation to get more of them (if the Rust allocator supports this yet). But yes, as we lose bits, we have to replenish more frequently and the number of threads that are lock-free degrades until we become a spinlock.

Anyway, my concern (having learned a lot from implementing it) is that this style of implementation, independent of whether the count bits are in the pointer or a second word, can never be competitive on contested loads with implementations that manage to use an atomic read for that purpose, and so is a bit of a dead end :(

301: Remove ArcCell, MsQueue, and TreiberStack r=stjepang a=stjepang These three pieces are being removed, each for its own reason: 1. `ArcCell<T>` uses a spinlock and provides little value over just `Mutex<Arc<T>>`. There are much better alternatives like [arc-swap](https://docs.rs/arc-swap). We should really implement a better version of this primitive, but it's not trivial. See [here](#160) for a discussion. 2. `MsQueue<T>` has no reason to exist in presence of `SegQueue<T>`, which is better in almost every way. The implementation has gotten really old and could use some cleanup. A selling point of this queue is blocking pop operation, but channels are much better at that job anyway. 3. `TreiberStack<T>` is the least used piece of Crossbeam, if used at all. The implementation has poor performance and is only interesting for its educational value - it's the *hello world* of lock-free programming. For that reason, I'm moving it into `crossbeam-epoch/examples`. All of these are practically useless and only confuse users. People assume there's something smarter in `ArcCell` than a spinlock, or struggle with choosing `MsQueue` vs `SegQueue`. As such, these three pieces are a net *negative* for the library so it's best to just remove them and come up with better replacements in the future. Co-authored-by: Stjepan Glavina <stjepang@gmail.com>

… to issues: crossbeam-rs/crossbeam#160 --- and move to arcswap lib: https://docs.rs/arc-swap/0.3.11/arc_swap/index.html

…to issues: crossbeam-rs/crossbeam#160 --- and move to arcswap lib: https://docs.rs/arc-swap/0.3.11/arc_swap/index.html

* This modifies the following behavior: - updates `crossbeam` in `proptest_helpers` to 0.7.2 (which was [just released](crossbeam-rs/crossbeam#401 (comment))) - removes usage of `crossbeam` in `common::logger` as: + `ArcCell` was [removed](crossbeam-rs/crossbeam#301) in more recent versions of crossbeam, + it was [spin-locking](crossbeam-rs/crossbeam#160) anyway * Why this is better - closes #279 - [`arc-swap`](https://docs.rs/arc-swap/) is (mostly) lock-free * Why this is worse - [`arc-swap`](https://github.com/vorner/arc-swap) is young * Tests `cargo audit` + CI

jonhoo · 2020-02-17T15:14:25Z

ArcCell was removed in #301, so I assume this should be closed?

jethrogb · 2020-02-17T15:17:48Z

From #301:

We should really implement a better version of this primitive

Alternatively, this issue could become the feature request?

acsearle · 2020-09-09T11:44:13Z

I've put in a draft pull request of an AtomicArc using deferred coalesced reference counting, for discussion, but I seem to have screwed up linking it to this issue. It should be considerably faster than my 2018 attempt; atomic loads are pure reads, it uses guards and scoping to prevent temporaries from needing to manipulate the counts at all, and it uses a thread-local ledger to coalesce and (often cancel out) the pending increments and decrements before finally applying the increments at the end of the guard's life and deferring the decrements to the collector.

jethrogb · 2020-09-09T11:46:26Z

#564

I was looking through old issues, and crossbeam-rs#160 mentioned that crc32 checksumming for zip files was showing up in profiles. zip-rs 0.4 uses a byte-at-a-time version of crc32; zip-rs 0.5 relies on the `crc32fast` crate, which features faster implementations.

ghost added question discuss design and removed discuss labels Nov 5, 2018

vorner mentioned this issue Nov 13, 2018

Comparison with ArcCell vorner/arc-swap#12

Closed

ghost mentioned this issue Jan 24, 2019

Remove ArcCell, MsQueue, and TreiberStack #301

Merged

zeeshanlakhani mentioned this issue Apr 30, 2019

Zl/arccell to arcswap williamofockham/NetBricks#72

Closed

1 task

huitseeker mentioned this issue Jul 25, 2019

[common, easy] Update crossbeam diem/diem#331

Merged

cbeck88 mentioned this issue Aug 10, 2020

Make slog actually work inside of sgx enclave/impl crates mobilecoinfoundation/mobilecoin#358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArcCell not lock-free #160

ArcCell not lock-free #160

jethrogb commented Dec 11, 2017

ghost commented Dec 11, 2017

jethrogb commented Dec 11, 2017 •

edited

Loading

Vtec234 commented Dec 11, 2017 •

edited

Loading

ghost commented Dec 11, 2017

Vtec234 commented Dec 11, 2017

vorner commented Apr 1, 2018

ghost commented Apr 3, 2018

vorner commented Apr 7, 2018

ghost commented Apr 8, 2018

acsearle commented Sep 12, 2018

ghost commented Sep 13, 2018 •

edited by ghost

Loading

jethrogb commented Sep 13, 2018

acsearle commented Sep 13, 2018

jonhoo commented Feb 17, 2020

jethrogb commented Feb 17, 2020

acsearle commented Sep 9, 2020

jethrogb commented Sep 9, 2020

ArcCell not lock-free #160

ArcCell not lock-free #160

Comments

jethrogb commented Dec 11, 2017

ghost commented Dec 11, 2017

jethrogb commented Dec 11, 2017 • edited Loading

Vtec234 commented Dec 11, 2017 • edited Loading

ghost commented Dec 11, 2017

Vtec234 commented Dec 11, 2017

vorner commented Apr 1, 2018

ghost commented Apr 3, 2018

vorner commented Apr 7, 2018

ghost commented Apr 8, 2018

acsearle commented Sep 12, 2018

ghost commented Sep 13, 2018 • edited by ghost Loading

jethrogb commented Sep 13, 2018

acsearle commented Sep 13, 2018

jonhoo commented Feb 17, 2020

jethrogb commented Feb 17, 2020

acsearle commented Sep 9, 2020

jethrogb commented Sep 9, 2020

jethrogb commented Dec 11, 2017 •

edited

Loading

Vtec234 commented Dec 11, 2017 •

edited

Loading

ghost commented Sep 13, 2018 •

edited by ghost

Loading