-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArcCell not lock-free #160
Comments
You're right. We should probably implement Do you perhaps have an example of how you want to use |
I've needed something like this over and over again to store program-global configuration. Alternatively, I think you can easily implement this using 128-bit atomics on platforms that support them. But I don't think those are available on stable Rust. |
I've attempted a partially wait-free version of That said, crossbeam is currently ongoing a major reorganisation and new types with similar functionality might (and will if people need them) appear in the future. |
@Vtec234 I'm slightly worried that wait-free @jethrogb Actually, I believe a primitive tentatively called |
@stjepang That should work, yes. My implementation is essentialy a very simple rw-lock and I'm sure much better implementations are possible (maybe based on HP, like you said). |
Hello I played with this a bit and I have an implementation that is lock-free (wait-free in the case of the readers) and it shouldn't be possible to starve the writers. It also works with the standard So, if you'd like to accept it, I can create a pull request and place it somewhere (and rename it to ArcCell, to keep the consistency with the current implementation). What do you think about it? |
@vorner Hey, that looks neat! Designing an atomic shared pointer (like I think the biggest drawback of Consider the fact that loading from The solution is to not update any shared counters when loading from |
Reading the hazard-pointer approach, it does look interesting. But I'm a bit nervous about two things ‒ the fact that an update may force a reader to loop (therefore the slow-path operation hurting the hot-path one, possibly making it stuck for a long time/forever with a steady inflow of changes), and that the chains of registries and thread entries can grow to long ones and I don't think they ever shrink back in the code. Anyway it is well possible this will be faster than my approach. Is there any estimate on when your approach could be ready? I ask because I want to know if it makes sense for me to clean it up, add all the license files, and publish it, or (in case hazard-cell will be ready soon in crossbeam) if it's not worth it. |
If updates are so frequent that they manage to force reads to spin in a loop, then updates kind of are the hot-path. :) But I see what you're saying, it's a fair concern that readers aren't wait-free.
Yes, the current prototype isn't tackling this problem. It's in the plan to shrink back the linked lists.
There are other things on my schedule at the moment, but I'd say in a few weeks.
If you have a use case for |
I'll throw my hat in the ring too. I have a reasonably complete AtomicArc now using differential reference counting, inspired by Folly::AtomicSharedPtr. I suspect it is close to what @stjepang was envisioning in the second comment. It has the same problem of reads degrading under contention, has to use its own reimplementation of Arc, and will only work on x86_64 and (hopefully) AArch64 using their spare pointer bits. On the brighter side, it also provides AtomicWeak, has no thread_local or global structures, and no requirements for the user to call anything outside the std::sync::atomic interface. https://github.com/acsearle/barc Might be an interesting comparison point if nothing else. |
@acsearle Thanks for sharing -- that looks really good, I'm impressed! So I've just whipped up a simple set of benchmarks to compare the performance of load operation with
I wonder what you think about using spare pointer bits in light of near-future migrations to 56-bit address spaces on x86-64 (and maybe even 64-bit). That seems like an unfortunate portability nightmare. :( There was an interesting discussion about this topic in the LuaJIT repo. |
@acsearle have you considered an implementation using double-word atomics? You should be able to test with 64-bit atomics on 32-bit while we wait for rust-lang/rust#53514 to land |
Wow, the performance of parking_lot Mutex is phenomenal! Is it enough to just .lock() the mutex, or would the better comparison be with .lock().clone() so we get and destroy a usable object per loop, as returned by AtomicWeightedArc::load() and (I think) hazard's .get()? The disheartening result on contested load is consistent what I saw in my own jury-rigged benchmarks: inherently slow since each load has to win a compare_exchange. I wasn't aware that losing those bits was imminent (there's a relatively recent talk by Chandler Carruth somewhere where he talks about how LLVM uses them). That's unfortunate. Re other architectures, I have thought about it a little. For both of these questions, it should be fairly easy to experiment since it requires only a platform-specific reimplementation of CountedPtr and CountedNonNull. Another option is using alignment bits, perhaps over-aligning the ArcInner allocation to get more of them (if the Rust allocator supports this yet). But yes, as we lose bits, we have to replenish more frequently and the number of threads that are lock-free degrades until we become a spinlock. Anyway, my concern (having learned a lot from implementing it) is that this style of implementation, independent of whether the count bits are in the pointer or a second word, can never be competitive on contested loads with implementations that manage to use an atomic read for that purpose, and so is a bit of a dead end :( |
301: Remove ArcCell, MsQueue, and TreiberStack r=stjepang a=stjepang These three pieces are being removed, each for its own reason: 1. `ArcCell<T>` uses a spinlock and provides little value over just `Mutex<Arc<T>>`. There are much better alternatives like [arc-swap](https://docs.rs/arc-swap). We should really implement a better version of this primitive, but it's not trivial. See [here](#160) for a discussion. 2. `MsQueue<T>` has no reason to exist in presence of `SegQueue<T>`, which is better in almost every way. The implementation has gotten really old and could use some cleanup. A selling point of this queue is blocking pop operation, but channels are much better at that job anyway. 3. `TreiberStack<T>` is the least used piece of Crossbeam, if used at all. The implementation has poor performance and is only interesting for its educational value - it's the *hello world* of lock-free programming. For that reason, I'm moving it into `crossbeam-epoch/examples`. All of these are practically useless and only confuse users. People assume there's something smarter in `ArcCell` than a spinlock, or struggle with choosing `MsQueue` vs `SegQueue`. As such, these three pieces are a net *negative* for the library so it's best to just remove them and come up with better replacements in the future. Co-authored-by: Stjepan Glavina <stjepang@gmail.com>
… to issues: crossbeam-rs/crossbeam#160 --- and move to arcswap lib: https://docs.rs/arc-swap/0.3.11/arc_swap/index.html
…to issues: crossbeam-rs/crossbeam#160 --- and move to arcswap lib: https://docs.rs/arc-swap/0.3.11/arc_swap/index.html
* This modifies the following behavior: - updates `crossbeam` in `proptest_helpers` to 0.7.2 (which was [just released](crossbeam-rs/crossbeam#401 (comment))) - removes usage of `crossbeam` in `common::logger` as: + `ArcCell` was [removed](crossbeam-rs/crossbeam#301) in more recent versions of crossbeam, + it was [spin-locking](crossbeam-rs/crossbeam#160) anyway * Why this is better - closes #279 - [`arc-swap`](https://docs.rs/arc-swap/) is (mostly) lock-free * Why this is worse - [`arc-swap`](https://github.com/vorner/arc-swap) is young * Tests `cargo audit` + CI
* This modifies the following behavior: - updates `crossbeam` in `proptest_helpers` to 0.7.2 (which was [just released](crossbeam-rs/crossbeam#401 (comment))) - removes usage of `crossbeam` in `common::logger` as: + `ArcCell` was [removed](crossbeam-rs/crossbeam#301) in more recent versions of crossbeam, + it was [spin-locking](crossbeam-rs/crossbeam#160) anyway * Why this is better - closes #279 - [`arc-swap`](https://docs.rs/arc-swap/) is (mostly) lock-free * Why this is worse - [`arc-swap`](https://github.com/vorner/arc-swap) is young * Tests `cargo audit` + CI
|
From #301:
Alternatively, this issue could become the feature request? |
I've put in a draft pull request of an AtomicArc using deferred coalesced reference counting, for discussion, but I seem to have screwed up linking it to this issue. It should be considerably faster than my 2018 attempt; atomic loads are pure reads, it uses guards and scoping to prevent temporaries from needing to manipulate the counts at all, and it uses a thread-local ledger to coalesce and (often cancel out) the pending increments and decrements before finally applying the increments at the end of the guard's life and deferring the decrements to the collector. |
I was looking through old issues, and crossbeam-rs#160 mentioned that crc32 checksumming for zip files was showing up in profiles. zip-rs 0.4 uses a byte-at-a-time version of crc32; zip-rs 0.5 relies on the `crc32fast` crate, which features faster implementations.
The current implementation of
ArcCell
is not lock-free. It uses a spin lock. I very much want to use this functionality, but I don't think this implementation should be included incrossbeam
right now (and frankly, the fact that it's included right now is quite misleading).The text was updated successfully, but these errors were encountered: