-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task ringbuffer #112
Task ringbuffer #112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting! My major concern with the implementation is, however, that an allocation-free implementation like this is closely tied to stalling the main thread on submission, but that goal is only partially addressed by spin-waiting inside reserve_new_tid
(We explicitly want to throttle submission to avoid overwhelming MPI as discussed earlier).
Moving task deletion to the scheduler thread (inside notify...
) unnecessarily introduces a second mutator to the ring buffer. We should really merge #86 first, which gives us the epoch_monitor
that allows the task manager to block on a horizon being executed (and becoming the new effective epoch) if it would otherwise overrun the ring buffer. reserve_new_tid
would then return an optional, and wait_for_available_slot
becomes unnecessary.
Also, a couple of smaller things:
task_ring_buffer
should rigorously document which member functions can be invoked only by the producer thread (main / task manager) and which ones are thread-safe.I don't believe we need to separateEdit: On second thought, that is not true, since we need the release-store onnext_active_tid
fromnext_task_id
if we extendhas_task
to check for!= nullptr
on the task pointer. Then we can also make the API ofemplace()
less restrictive.next_active_id
to make the non-atomic insertion into theunique_ptr
visible.- Once the task manager thread becomes the only mutator,
number_of_deleted_tasks
does not need to be atomic any more. Also, all atomic accesses can be eitheracquire
orrelease
, no need forseq_cst
anywhere.
d0df1b2
to
fec38b9
Compare
This latest version now uses the epoch monitor for synchronization. The performance impact of that compared to spinning is measurable but seems ok: There are now also unit tests for the borderline scenarios of the ringbuffer (running out of buffer space due to long-running active tasks but eventually recovering; and running into a situation which would deadlock). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some general remarks in addition to the line comments:
- I would perfer to avoid implicitly-
seq_cst
atomic operations in favor of those with explicit memory ordering. Even though the former is always correct, it keeps the reader in the dark about when atomics are used to synchronize other memory accesses. E.g. most loads in the task manager thread can berelaxed
. - task_ring_buffer should rigorously document which member functions can be invoked only by the producer thread (main / task manager) and which ones are thread-safe.
f6ee72d
to
c7def2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Based on reviews & discussion: * now uses epoch monitor * tracks in-flight horizons/epochs to be able to report deadlock scenarios * unit tests * document how member functions may be called * explicit memory semantics on atomics
This is a performance refactoring replacing the
unordered_map
in thetask_manager
with a ringbuffer.Far more importantly than the difference in data structure (though that is also measurable) this completely eliminates any locking contention for accessing tasks by id.
It also introduces a new microbenchmark for task handling, in 2 variants, one with pure task handling, and one in a more realistic scenario with another thread accessing tasks: