Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single thread pool for all libraries #411

Closed
torkleyy opened this issue Mar 10, 2017 · 13 comments
Closed

Use a single thread pool for all libraries #411

torkleyy opened this issue Mar 10, 2017 · 13 comments

Comments

@torkleyy
Copy link

It's very unpractical to work with multiple threads if every library uses another thread pool. Please consider using one thread pool which can be shared between frameworks.

Here are some examples:

@alexcrichton
Copy link
Member

Seems reasonable to me! The cpu pool here though is largely just demonstration purposes, and it shouldn't be too hard to create your own thread pool locally. (or reuse one of these existing ones with futures.

@carllerche
Copy link
Member

One thing of note, as w/ all things, there is no one size fits all. Rayon has vastly different semantics than futures-cpupool. You probably wouldn't want to use Rayon to drive IO tasks (due to fairness)

That said, something like http://github.com/carllerche/futures-spawn will hopefully help w/ being able to write code abstract over the executor.

@leoyvens
Copy link
Contributor

@carllerche I'm curious, how does futures-cpupool and Rayon's future support differ? When should one be recommended over the other?

@alexcrichton
Copy link
Member

I think @nikomatsakis may actually be the best to comment to that effect.

@liranringel
Copy link
Contributor

By saying a single thread pool, do you mean a single instance or just the same code?

@carllerche
Copy link
Member

@leodasvacas The main difference is scheduling heuristics. Rayon has no fairness guarantees (in fact, it is the opposite of fair). This is because it is geared towards parallel computation of a single result. Aka, you should use it when you only care about the final result and not how each task gets scheduled.

futures-cpupool is 100% fair. Futures are scheduled entirely in the order in which they are scheduled.

There are also other variants of scheduling logic with different trade offs.

@nikomatsakis
Copy link
Contributor

I think @carllerche's summary is reasonable; Rayon certainly doesn't try to guarantee fairness (and, at least for some use cases, fairness is not particularly desirable). But I agree with the overall gist of this issue: there should be a way to use "the default" CPU pool, and ideally end-users could change it without requiring their dependencies to be updated.

@HadrienG2
Copy link

HadrienG2 commented Nov 3, 2017

Just wondering: assuming availability of Rust language features which make this easy (such as const generics), would it be possible and desirable to create a generic thread pool crate which can fit all current use cases in the Rust ecosystem, given only a little bit of compile-time configuration?

I'm asking because writing a good thread pool takes a sizeable amount of effort, and as far as I can see there really are only so many ways to do it. Consider the design constraints which a modern thread pool design must operate upon:

  • Modern CPUs can have a lot of hardware threads, easily more than 100. Check out Intel's Xeon Phi (up to 72 cores, 4-way SMT) or IBM's POWER9 (up to 24 cores, 8-way SMT) if you are not convinced of that. If you want to scale that far (and future CPUs will likely go further), a design based on a single shared work queue will fail to achieve good performance, and using one work queue per CPU thread with some load balancing algorithm like work stealing becomes necessary.
  • Unless some form of prioritization (which AFAIK no current Rust thread pool crate provides) is applied, a worker thread which is reaching for more work only has one choice to make: which end of the queue should work be fetched from? Fetching work in FIFO order is better for fairness, whereas fetching work in LIFO order is better for cache locality (and thus performance). There is a real design choice here, but with const generics, this could be a compile-time parameter of the implementation. Even without const generics, we can make this work today using zero sized type-based hacks.
  • When work is submitted to the thread pool, the caller must have a way to synchronize with its completion. This is exactly what Rust futures were designed for, and they were built to be very lightweight, so except for very specific use cases which would require coarser-grained synchronization (e.g. flushing the work queue on application termination), I think everyone would be happy with a future-based synchronization interface.
  • Being able to submit extra work from inside the thread pool (i.e. recursive parallelism) is very convenient for many recursive algorithms, and is also a way to reduce work queue pressure by only dividing up work lazily rather than eagerly. I think we will agree that any performant thread pool should support this use case, and that it can be implemented without harming the other use cases.

Is there any design constraint which I am overlooking that would prevent the creation of a common generic thread pool which fulfills everyone's needs without being excessively over-engineered and hard to use? Otherwise, I might be interested in exploring this path further.

@carllerche
Copy link
Member

I have been working in my spare time on futures-pool which is meant to be a general purpose, shareable, pool for futures. It is not ready yet though. It is being designed for fairness, which is going to be what you usually want when working w/ networking related scenarios IMO.

@HadrienG2
Copy link

HadrienG2 commented Nov 4, 2017

@carllerche This would be part of what I'm thinking about. The other part, which may be crazier and untractable in practice, would be to make the thread pool library generic over its scheduling policy, in the sense that you can configure it at compile time to use FIFO, LIFO, or maybe even priority-driven scheduling algorithms like EDF.

In this way, people who care most about fairness can configure it in FIFO mode for maximal fairness, people who care most about throughput can configure it in LIFO mode for maximal throughput, people who have real-time latency constraints can configure it in EDF mode... you get the idea. Any algorithm from the OS community's abundant literature on non-preemptive scheduling is potentially applicable to a thread pool, since it's basically a parallel batch scheduler.

I'm currently doodling some code to see how crazy that idea is. But if it's workable and usable, it would be a way to achieve the OP's goal of having one unified thread pool library for all Rust multithreaded programming environments.

@carllerche
Copy link
Member

I'm not sure what the value of having a thread pool impl generic over the scheduler vs. being generic over T: future::Executor.

I'm also skeptical that one could efficiently write a single thread pool that handles all cases as well as could be.

@HadrienG2
Copy link

HadrienG2 commented Nov 7, 2017

A well-implemented executor is actually quite a nontrivial object. Among other things, it needs to handle:

  • Worker thread initialization and CPU pinning (mildly OS-specific)
  • Distribution of incoming work across workers
  • Load balancing when a worker is starving
  • Putting workers to sleep when there has been no work to do for a while (cpu time is expensive, especially on battery-powered stuff), and waking them up when new work is incoming again.
  • Correct recursive parallelism (which should avoid unnecessary synchronization with other workers when they are already busy, and should not be prioritized like incoming work since it is effectively part of the executing task)
  • Termination of workers at the end of the program

My point is that there is a good default for almost all of these operations, and that the only thing that truly needs to remain customizable is the worker-local task scheduling policy. Because in the end, that's the only fundamental difference between a thread pool designed for network packet processing (like Tokio's) and a thread pool designed for maximal compute performance (like Rayon's).

So far, my early experiments with the concept of an executor that is generic over its scheduling policy do not match your conclusion that it needs to be inefficient. But once I have a reasonably complete prototype to show, the development of which may reveal further issues, we can discuss this matter more.

@aturon
Copy link
Member

aturon commented Mar 20, 2018

I'm going to close out this issue. With 0.2's default executors, sharing a thread pool should be much easier.

@aturon aturon closed this as completed Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants