-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible data race in blocking_get{_for} #397
Comments
It looks like wait should check the condition before waiting, so my theory is not correct. |
Are you using the portable executor? There is a potential deadlock with blocking_get() in that case. You have tasks If |
We aren't using the portable executor (as we use tbb and have a custom executor mapping onto tbb workers). Which makes the code path much simpler and so more perplexing that it hangs sometimes. On another note, is there any documentation of the portable executor? I couldn't find any. |
The tbb thread pool doesn’t scale - so I you block all the threads in the pool you will deadlock. This is surprisingly easy to do. TBB also can’t wake a thread to steel and I don’t know how their steeling algorithm works so if you block you may end up waiting on a task scheduled on the same queue. |
Yes, it looks like it might be something in that direction. The thread being blocked is the main thread, which hasn't had tbb parallel calls and so wasn't thought to be problematic in terms of deadlocking. But having added more logging, the future's work gets stuck somewhere between being submitted to the tbb task group and being executed. BTW, our motivation for sending the future's work to the tbb thread pool as that we have a lot of underlying code that uses tbb parallel constructs and want to be able to mix that with futures-based code without oversubscribing. |
Anyway, while I'm still flummoxed, it doesn't appear that it is likely to be due to stlab code, and is perhaps more likely to be due to not understanding what tbb is doing underneath. |
If you can distill down the basics of what you are doing with the futures I can take another look... |
We just hit the same issue with the sleep version of blocking get. So while it is dramatically less likely, it is still there. I'll see if I can distill things down to something simpler to reproduce it. |
@drussel How's that distillation coming along? |
I'd made an attempt that failed to reproduce it. And haven't had a chance to spend more time on it sorry. |
Some of our tests have been occasionally hanging in stlab::blocking_get when called on (often) already ready futures. Perhaps this is this is due to a race on the call to
notify_one
. In particular, if thenotify_one
call is made before the outer thread waits on it, then the thread will not get unblocked. Does this make sense?That said, I would expect that patching blocking_get to check flag while holding the lock before waiting on the condition variable would fix this, but it does not. So clearly I don't entirely understand the problem.
Going back to the simpler check, sleep implementation of blocking_get does work and and since we only use it in test code, is just fine for us.
Stlab versions 1.6.X. Gcc, clang and vc++.
The text was updated successfully, but these errors were encountered: