Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce crossbeam-circbuf #277

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jeehoonkang
Copy link
Contributor

@jeehoonkang jeehoonkang commented Dec 25, 2018

I'm proposing to introduce crossbeam-circbuf, which implements bounded/unbounded SPSC/SPMC queues based on circular buffer. Bounded queues are implemented with fixed-sized circular buffer. Unbounded queues are implemented with dynamically growable/shrinkable circular buffer.

Currently I'm not sure how performant this crate is. A few benchmark will be helpful in evaluating the performance. I'm thinking of re-purposing the benchmark used in crossbeam-channel.

The implementation is sub-optimal in that it has an unnecessary indirection when accessing the underlying buffer. In order to remove the indireciton we need to support DST, which is not yet implemented in Crossbeam, though.

We discussed to introduce crossbeam-circbuf in another PR, and there we also discussed tentative crate organizations. Speaking of organizations, I think crossbeam-deque and crossbeam-arrayqueue can also be incorporated in this crate for the following reasons:

  • All of them share the same buffer struct.

  • crossbeam-arrayqueue is actually bounded MPMC queue based on circular buffer.

  • crossbeam-deque is actually a slight variant of unbounded SPMC queue based on circular buffer.

@jeehoonkang
Copy link
Contributor Author

It seems CI fails due to unrelated issue: https://travis-ci.org/crossbeam-rs/crossbeam/jobs/472150001#L455

bors retry

@jeehoonkang
Copy link
Contributor Author

In fceff1f, I added the array queue a la #189.

@ghost
Copy link

ghost commented Jan 5, 2019

By extending the crossbeam-channel benchmarks, I get these results for crossbeam-circbuf:

bounded_mpmc              Rust crossbeam-circbuf   0.496 sec
bounded_mpsc              Rust crossbeam-circbuf   0.400 sec
bounded_seq               Rust crossbeam-circbuf   0.138 sec
bounded_spmc              Rust crossbeam-circbuf   0.233 sec
bounded_spsc              Rust crossbeam-circbuf   0.123 sec

And here is MPMC queue:

unbounded_mpmc            Rust queue        0.179 sec
unbounded_mpsc            Rust queue        0.185 sec
unbounded_seq             Rust queue        0.242 sec
unbounded_spmc            Rust queue        0.204 sec
unbounded_spsc            Rust queue        0.112 sec

By bumping the block size from 32 to 256 we get even better results:

unbounded_mpmc            Rust queue        0.166 sec
unbounded_mpsc            Rust queue        0.190 sec
unbounded_seq             Rust queue        0.223 sec
unbounded_spmc            Rust queue        0.162 sec
unbounded_spsc            Rust queue        0.101 sec

And this is crossbeam-deque:

unbounded_seq             Rust crossbeam-deque   0.185 sec
unbounded_spmc            Rust crossbeam-deque   0.286 sec
unbounded_spsc            Rust crossbeam-deque   0.196 sec

My takeaways are:

  • The bounded SPSC should be a very simple typical SPSC queue instead, not something based on Chase-Lev.

  • Unbounded SPSC might be faster with something like this, and wouldn't need to use crossbeam-epoch. We could try allocating nodes in blocks to squeeze out more performance from it.

  • The MPMC queue seems to be faster in the MPMC and MPSC cases.

  • The case where crossbeam-circuf excels is SPMC, which is unsurprising because that's pretty much the use case Chase-Lev was invented for. But even then the MPMC queue is competitive (maybe because it doesn't need crossbeam-epoch).

  • Currently, crossbeam-circbuf is a clear winner only in the "seq" (single thread) case, which is uninteresting for a concurrent queue anyway. :( I wonder if we could optimize things some more to find its niche.

I should also add that it's possible that these benchmarks are not measuring the right thing and might be unfair to crossbeam-deque. If you have a suggestion on how to do benchmarks differently, do say!

@jeehoonkang
Copy link
Contributor Author

In my unscientific experiment, it seems the numbers will become much better if we could remove an indirect pointer access using #209. It'll be roughly on par with #338 for the bounded spsc case.

@ghost
Copy link

ghost commented Mar 3, 2019

@jeehoonkang

Do you have any numbers or perhaps benchmarks I could run to reproduce them?

In my first comment on this PR, I noted it takes 0.123 sec for the bounded SPSC variant of crossbeam-circbuf to finish the spsc benchmark. 5 million messages are sent so that means sending & receiving a single message takes 25 ns.

The new bounded SPSC queue spends 0.029 sec on that same benchmark, which is 6 ns per message. That's 4x faster than crossbeam-circbuf.

I honestly doubt the double indirection costs much, if anything (my guess is less than 1 ns per message). The new bounded SPSC queue uses double indirection too, so it definitely cannot cost more than 6 ns per message.

Even if we conservatively assume the cost of double indirection is the whole 6 ns per message, that would mean removing it brings the result for crossbeam-circbuf from 25 ns down to 19 ns, which is still 3x slower than the new queue.

@jeehoonkang
Copy link
Contributor Author

jeehoonkang commented Mar 4, 2019

@stjepang I prepared for the DST branch a long time ago, so maybe I'm wrong to attribute performance difference to pointer indirection. There may be orthogonal improvements I forgot about. I'll soon provide benchmark setup and numbers. Sorry for the delay.

@jeehoonkang jeehoonkang force-pushed the circbuf branch 2 times, most recently from f6e30f5 to cc18736 Compare May 26, 2020 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant