-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a ThreadPool and parallel execution of some of the compilation work items #7462
Conversation
Some API changes to consider:
Some implementation changes to consider:
Edit: specificities on some points |
src/ThreadPool.zig
Outdated
break :blk nodes; | ||
}; | ||
|
||
while (idle_nodes.popFirst()) |node| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok here's a subtle bug that's hard to see. These idle_nodes
are singly linked list nodes, which are allocated on the stacks of each worker. (see the runWorker
function) However, by this line we have set is_shutdown
to true and have released the mutex. This means all workers are now free to exit. So all these idle_nodes
on the stack of each worker can be freed at any time, which includes the time before we get to this loop that accesses them. Also note that it's not just the next
pointer that's invalid, but the AutoResetEvent itself is also allocated on the stack and could also be invalidated here. To fix this issue, and also be able to handle when a threads has to exit (i.e. via an abort), the AutoResetEvent should probably be allocated on the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all these idle_nodes on the stack of each worker can be freed at any time
The worker thread owner for each node in the idle_node list is blocked on its node's AutoResetEvent so it cannot exit until unblocked by .set()
. This means even when the lock is released, the blocking threads keep the nodes alive and they remain valid until the line below which wakes them up that follows the shutdown action chain of: deallocating the Node from the stack, re-acquiring the pool lock, seeing shutdown, releasing lock and returning, then being joined by the main pool thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worker thread owner for each node in the idle_node list is blocked on its node's AutoResetEvent so it cannot exit until unblocked by .set()
A worker can exit at any time from an abort. In such a case, we don't want to corrupt the shared idle_queue
when a single worker has to exit for some reason.
Also, if shutdown gets called after a worker's AutoResetEvent has been set, and before that worker has checked if shutdown has been called on line 114, then the worker can exit before the shutdown thread has accessed it's idle_node.
If you don't want a worker exit/abort to result in corruption of the idle_queue (which I think it shouldn't), then you pretty much have to allocate this metadata on the heap, there's really no other choice. Relying on the stack of each worker to keep our metadata intact is always going to be brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A worker can exit at any time from an abort
The worker wouldn't be able to call abort() while its still in the idle_queue as it would be blocked on AutoResetEvent.wait(). It is only ever unblocked once it has been removed from the idle_queue (with the lock held) as shown in schedule() or shutdown(). If another thread were to call abort() however, then Id imagine that would kill all threads in the process rather than produce race.
checked if shutdown has been called on line 114, then the worker can exit before the shutdown thread has accessed it's idle_node.
A workers idle_node doesn't exist while its checking for shutdown. It only exist before and during its time in the idle_queue and while blocked on AutoResetEvent.
Relying on the stack of each worker to keep our metadata intact is always going to be brittle.
Why is this the case? The stack data always remains valid in the current code as explained above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kprotty ah sorry I misread the code. I was under the false impression that the thread removed itself from idle_queue
.
So these 2 cases I mentioned should not be possible. However, there are still other cases where a worker thread can get killed. For example getting killed by OOM or manually by a user. With the code as it exists, this would result in use after free. The only way that I can think to solve this use after free is to allocate this data on the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marler8997 That is indeed a possibility, but I am on the side that a user messing with the internal state of code is already undefined behavior and shouldn't have to introduce runtime overhead to protect against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOM isn't the user intervening (Out of Memory Killer). I've gotten OOM errors with the Zig compiler, this is a real use case. By using references to the stacks of other threads, it causes errors in one thread to cascade to others in the form of use after free, which makes them extremely difficult to debug/triage/reproduce. Memory corruption can cause almost anything to happen. We can avoid these errors by putting shared data on the heap instead of the stack. This way, if/when a thread gets killed, it's errors will stay local to it instead of causing "use after free" errors on other threads as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't OOM kill the entire process as opposed to a given thread? If all threads die, then there's nothing to guard against. In the current code, a thread being randomly killed would only be an issue if its killed with its idle_node in the idle_queue. This currently can't happen without killing the entire process as the user has no access to the thread's id to send it a sigkill/sigterm and its only vulnerable while sleeping on the AutoResetEvent (which the user also doesn't have access to).
Edit: made the response contextual to this case rather than for generic panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking on this more and playing with another thread pool implementation, I've decided I'm more ok with this solution.
As to why, the first reason is that this stack pointer to the worker thread's AutoResetEvent is stored in the idle
queue for a very short "time", at least in terms of "number of instructions" executed on the worker thread. This stack reference is put into the shared idle
queue just before the worker calls wait
, then removed by another thread before it wakes up, so there's only a few instructions that the worker will even execute during this time window. The chances of it getting killed and us losing its stack are low. So I think in this case its ok to make an exception to the general avoidance of cross referencing stack memory between threads.
The second reason is that I realize the alternative solution comes with its own issues. If we stored all the AutoResetEvents in an array on the heap, then "false sharing" becomes a performance concern. "false sharing" could be addressed by spacing the AutoResetEvents far enough away to ensure they are on separate cache lines, but we can avoid this altogether by storing them on each thread's stack instead.
src/Compilation.zig
Outdated
@@ -1366,6 +1375,21 @@ pub fn getAllErrorsAlloc(self: *Compilation) !AllErrors { | |||
}; | |||
} | |||
|
|||
const WaitGroup = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a race here (edit: cc @marler8997)
- WaitGroup.done() decrements counter to 0, gets preempted
- WaitGroup.wait() sees 0 counter, exits, deallocates itself if on stack
- WaitGroup.done() gets rescheduled, saw the 0 dec, tries to set event, event memory was already deallocated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like you'd need sort of sort of futex wait comparison/validation mechanic to resolve this. Wrapping all the state checking in a Mutex would be the cross platform equivalent. Might be a good reason to introduce parking_lot in the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I haven't reviewed all this code, but based on your comment it does look like a race.
P.S. It seems like reviewing multi-threaded scheduler code takes about 10 times longer than it does to review normal code. It's going to take me some time to really review all this code, even though it's less than 200 lines!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I plan to do a rewrite of this later today with the goal of simplifying the review process at the possible expense of concurrent perf.
edit: fwiw, this was written while racing andrew during the stream so I'm partially surprised it even worked for demo purposes given the previous bugs.
ee98c9c
to
160f800
Compare
var event = std.AutoResetEvent{}; | ||
@atomicStore(?*std.AutoResetEvent, &self.event, &event, .SeqCst); | ||
if (@atomicLoad(usize, &self.counter, .SeqCst) != 0) | ||
event.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rip, another race on my part:
- stop() decrements to 0, gets preempted
- wait() loads counter, sees 0, exits and invalidates event memory
- stop() gets scheduled, swaps null & sees valid event, tries to set it and accesses invalid memory
Fix is to ensure that stop no longer has access to event ptr in wait() when returning:
// Create and publish the event preemptively before checking the counter.
// If we check before setting the event, then:
// - we could see non-zero, get preempted
// - stop() could set to zero, see no event, continue
// - we get rescheduled, and sleep on the event after seeing non-zero above (deadlock)
var event = std.AutoResetEvent{};
@atomicStore(?*std.AutoResetEvent, &self.event, &event, .SeqCst);
// if the counter is non-zero, then the last stop() will see our event stored above and wake us up
if (@atomicLoad(usize, &self.counter, .SeqCst) != 0)
return event.wait();
// If the counter is zero, then we need to make sure we take out our event.
// stop() could set zero, get preempted, we see 0, return, stop() reschedules and tries to wake invalid event.
//
// We address this by trying to remove the event from access to stop()
// If we failed to remove it, that means that stop() has it and will try to wake us up.
// If so, we wait for the wake up which is the indicator that stop() is no longer referencing this event.
if (@atomicRmw(?*std.AutoResetEvent, &self.event, .Xchg, null, .SeqCst) == null)
event.wait();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given its not a hot path in Compilation, we could probably switch back to the lock-based version for auditing purposes
160f800
to
b2ac6c6
Compare
3ef7644
to
ab01aab
Compare
while (true) { | ||
const held = self.lock.acquire(); | ||
|
||
if (self.run_queue.popFirst()) |run_node| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you check is_running
before checking for the next task? Couldn't shutdown be called before all tasks are completed? For example, if an errors occurs we may want to shutdown all the threads?
ef0bd4d
to
be3638e
Compare
If this errdefer did get run it would constitute a race condition. So I deleted the dead code for clarity.
We generally get away with atomic primitives, however a lock is required around the refresh function since it traverses the Node graph, and we need to be sure no references to Nodes remain after end() is called.
thx king protty
And enable it for Drone CI. I hate to do this, but I need to make progress on other fronts.
ad50ec6
to
1d94a68
Compare
This is the results of a live coding stream, it needs to be cleaned up before merging.
Merge checklist:
std.Progress
thread-safeParallelize more kinds of work itemsthis will be follow-up worktroubleshoot the aarch64 deadlockI worked around it. Will address in a follow-up issue.Edit: forgot to mention, thanks to @kprotty for the ThreadPool implementation :D