Task ringbuffer #112

PeterTh · 2022-04-01T00:58:02Z

This is a performance refactoring replacing the unordered_map in the task_manager with a ringbuffer.

Far more importantly than the difference in data structure (though that is also measurable) this completely eliminates any locking contention for accessing tasks by id.

It also introduces a new microbenchmark for task handling, in 2 variants, one with pure task handling, and one in a more realistic scenario with another thread accessing tasks:

benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
------------------------------------------------------------------------------- Master
generating and deleting tasks                  100             1    590.461 ms 
                                        5.74591 ms    5.67021 ms    5.80049 ms 
                                        324.677 us    251.264 us    397.061 us 
generating and deleting tasks with                                             
access thread                                  100             1     3.78352 s 
                                        37.0397 ms    35.3406 ms    38.6955 ms 
                                        8.59095 ms    7.65626 ms    9.70456 ms 
------------------------------------------------------------------------------- This Branch
generating and deleting tasks                  100             1    464.589 ms 
                                        4.56318 ms    4.50633 ms    4.60211 ms 
                                        237.555 us    174.134 us    308.134 us 
generating and deleting tasks with                                             
access thread                                  100             1    878.879 ms 
                                        8.39569 ms    8.29195 ms    8.48651 ms 
                                        495.973 us    419.964 us    580.127 us

fknorr

This is exciting! My major concern with the implementation is, however, that an allocation-free implementation like this is closely tied to stalling the main thread on submission, but that goal is only partially addressed by spin-waiting inside reserve_new_tid (We explicitly want to throttle submission to avoid overwhelming MPI as discussed earlier).

Moving task deletion to the scheduler thread (inside notify...) unnecessarily introduces a second mutator to the ring buffer. We should really merge #86 first, which gives us the epoch_monitor that allows the task manager to block on a horizon being executed (and becoming the new effective epoch) if it would otherwise overrun the ring buffer. reserve_new_tid would then return an optional, and wait_for_available_slot becomes unnecessary.

Also, a couple of smaller things:

task_ring_buffer should rigorously document which member functions can be invoked only by the producer thread (main / task manager) and which ones are thread-safe.
I don't believe we need to separate next_active_tid from next_task_id if we extend has_task to check for != nullptr on the task pointer. Then we can also make the API of emplace() less restrictive. Edit: On second thought, that is not true, since we need the release-store on next_active_id to make the non-atomic insertion into the unique_ptr visible.
Once the task manager thread becomes the only mutator, number_of_deleted_tasks does not need to be atomic any more. Also, all atomic accesses can be either acquire or release, no need for seq_cst anywhere.

include/task_ring_buffer.h

include/task_manager.h

include/task_ring_buffer.h

PeterTh · 2022-06-17T12:49:23Z

This latest version now uses the epoch monitor for synchronization. The performance impact of that compared to spinning is measurable but seems ok:

There are now also unit tests for the borderline scenarios of the ringbuffer (running out of buffer space due to long-running active tasks but eventually recovering; and running into a situation which would deadlock).

fknorr

Thanks! Some general remarks in addition to the line comments:

I would perfer to avoid implicitly-seq_cst atomic operations in favor of those with explicit memory ordering. Even though the former is always correct, it keeps the reader in the dark about when atomics are used to synchronize other memory accesses. E.g. most loads in the task manager thread can be relaxed.
task_ring_buffer should rigorously document which member functions can be invoked only by the producer thread (main / task manager) and which ones are thread-safe.

include/task_manager.h

include/task_ring_buffer.h

src/task_manager.cc

test/task_ring_buffer_tests.cc

src/task_manager.cc

psalz

LGTM

include/task_manager.h

src/task_manager.cc

test/task_ring_buffer_tests.cc

include/task_manager.h

fknorr

LGTM!

Based on reviews & discussion: * now uses epoch monitor * tracks in-flight horizons/epochs to be able to report deadlock scenarios * unit tests * document how member functions may be called * explicit memory semantics on atomics

PeterTh force-pushed the task-ringbuffer branch from ac758ef to 6666223 Compare April 1, 2022 09:13

fknorr requested changes Apr 1, 2022

View reviewed changes

psalz reviewed Apr 4, 2022

View reviewed changes

include/task_ring_buffer.h Outdated Show resolved Hide resolved

PeterTh self-assigned this Apr 4, 2022

PeterTh force-pushed the task-ringbuffer branch 6 times, most recently from d0df1b2 to fec38b9 Compare May 19, 2022 11:51

PeterTh requested a review from fknorr June 17, 2022 13:23

fknorr reviewed Jun 22, 2022

View reviewed changes

PeterTh force-pushed the task-ringbuffer branch from ca5fe59 to eefb8dc Compare June 28, 2022 13:24

fknorr reviewed Jun 28, 2022

View reviewed changes

src/task_manager.cc Outdated Show resolved Hide resolved

src/task_manager.cc Outdated Show resolved Hide resolved

test/task_ring_buffer_tests.cc Outdated Show resolved Hide resolved

src/task_manager.cc Show resolved Hide resolved

PeterTh force-pushed the task-ringbuffer branch from eefb8dc to 75d56ef Compare June 29, 2022 08:42

PeterTh force-pushed the task-ringbuffer branch 2 times, most recently from f6ee72d to c7def2d Compare July 18, 2022 10:27

psalz approved these changes Jul 18, 2022

View reviewed changes

include/task_manager.h Show resolved Hide resolved

src/task_manager.cc Show resolved Hide resolved

test/task_ring_buffer_tests.cc Show resolved Hide resolved

include/task_manager.h Outdated Show resolved Hide resolved

fknorr approved these changes Jul 18, 2022

View reviewed changes

PeterTh force-pushed the task-ringbuffer branch from c7def2d to bb3b9ca Compare July 18, 2022 13:54

PeterTh added 8 commits July 18, 2022 16:40

Add task handling benchmark

3b72c0c

Replace task map with a simple ringbuffer

35738b7

More elegant and correct task_ring_buffer::has_task implementation

c5d78e8

Allow revoking reserved tids, add exception handler to do so

162f990

Use RAII for task buffer reservation slot management

02e8a0b

Update benchmark results for post-ringbuffer

99c5105

task_ring_buffer improvements

dac161d

Based on reviews & discussion: * now uses epoch monitor * tracks in-flight horizons/epochs to be able to report deadlock scenarios * unit tests * document how member functions may be called * explicit memory semantics on atomics

Replace latest epoch tracking with on-demand lookup

72ad57c

PeterTh force-pushed the task-ringbuffer branch from bb3b9ca to 72ad57c Compare July 18, 2022 14:40

PeterTh merged commit 5139256 into master Jul 18, 2022

PeterTh deleted the task-ringbuffer branch July 18, 2022 14:42

fknorr mentioned this pull request Aug 25, 2022

Remove task mutex #137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task ringbuffer #112

Task ringbuffer #112

PeterTh commented Apr 1, 2022

fknorr left a comment •

edited

Loading

PeterTh commented Jun 17, 2022

fknorr left a comment

psalz left a comment

fknorr left a comment

Task ringbuffer #112

Task ringbuffer #112

Conversation

PeterTh commented Apr 1, 2022

fknorr left a comment • edited Loading

Choose a reason for hiding this comment

PeterTh commented Jun 17, 2022

fknorr left a comment

Choose a reason for hiding this comment

psalz left a comment

Choose a reason for hiding this comment

fknorr left a comment

Choose a reason for hiding this comment

fknorr left a comment •

edited

Loading