Replies: 1 comment 1 reply
-
enkiTS is able to schedule task efficiently when the main thread is busy. Some thoughts from your traces: It looks to me like each enkiTS worker thread is going to sleep and then waking up and doing work, then going to sleep again in the first approach. This is inefficient, because there is a cost to going to sleep & waking a thread, and a cost for the task stealing search which threads do when they have no work. In the second example there is sufficient work for all threads. There are also frequency scaling effects which can occur. The OS can scale the frequency of 1 CPU core active higher than when more CPU cores are working, This would mean that in approach 2 the main core/thread might experience a significant frequency boost. You also seem to be in the unlucky spot on your system where your parallel workload takes roughly as long as the serial workload, so you cannot add tasks faster than they are being completed. A CPU with fewer cores would likely not see this problem, or at least the difference would be less. I do have some ongoing research into minimizing the cost of waking threads, but I'm not sure if this will improve the performance of the first approach because there is insufficient work at any one time for all threads to be active, and new work can appear whilst some threads are busy so another thread will then be woken. This means there will always be more threads active but not doing useful work in approach 1 compared to 2. A heuristic which might work is to batch your tasks into groups roughly some multiple of the number of enkiTS worker threads ( I also note that 2 (or the batching variant) seems an ideal use case for using a set size equal to the number of islands found. Batching with a single task with a range equal to the batch size would likely be the best approach across all CPU configurations. I don't think enkITS can do much automatically to resolve this (beyond some small optimizations for waking threads etc.) - batching tasks in the scheduler would kill performance for longer running tasks. |
Beta Was this translation helpful? Give feedback.
-
I'm setting up a parallel island solver in Box2D (erincatto/box2c#32).
I tried two approaches:
In both cases I call
WaitforAll
after all the island tasks have been submitted. The performance of these two approaches is not what I expect. Approach 1 takes ~3.5ms while approach 2 takes ~2ms for my benchmark (200 islands).Is enkiTS unable to schedule the tasks efficiently while the main thread is busy? Here are my Tracy profiles. The first is approach 1 with the island tasks spread out on the right. The second is approach 2 with all the island tasks grouped on the right.
Beta Was this translation helpful? Give feedback.
All reactions