Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-graph synchronization with Epochs #86

Merged
merged 21 commits into from
Apr 27, 2022
Merged

In-graph synchronization with Epochs #86

merged 21 commits into from
Apr 27, 2022

Conversation

fknorr
Copy link
Contributor

@fknorr fknorr commented Feb 7, 2022

So far, queue::slow_full_sync() and the shutdown of worker executors are implemented by sending SYNC and SHUTDOWN commands through broadcast_control_command() and then busy-waiting on the main thread. Since we already introduced synchronization logic to task and command graphs before as part of horizons, we now have the opportunity to integrate SYNC and SHUTDOWN into the graphs as well and reduce specialized code paths. This will also be a stepping stone towards using explicit synchronization to exchange data in celerity buffers with the main thread.

This PR represents synchronization points in the graph with epochs, which themselves are a generalization of INIT/NOP tasks and commands. Each graph node (except the first epoch) has exactly one preceding epoch, and no node can ever depend on a node before its epoch. In that sense they are similar to horizons, except that they fully serialize execution and dependency tracking:

TaskGraph

To ensure correct temporal ordering, a new epoch node receives a true-dependency on the entire execution front (orange edges); and all nodes without other true-dependencies (pure producers, tasks a b d e) receive a dependency on their epoch (pink edges). A new epoch will immediately become the last-writer and host-buffer initializer for all successor tasks. Like any other node, epochs can be subsumed by horizons, after which the subsuming horizon becomes the epoch for all following nodes.

Inserting a new epoch task with task_manager::finish_epoch() will create one epoch command per node, with dependencies on the current execution front. Each epoch_command will issue one epoch_job, which awaits the completion of said dependencies, optionally issues an MPI_Barrier and then notifies the task manager via notify_epoch_completed(). Other threads can await this notification to block the main thread in queue::slow_full_sync() and ~runtime().

The nodes preceding a task epoch are pruned once the first task is created after notify_epoch_completed(). In the command graph generator, nodes can be deleted as soon as they lie behind the last epoch, as they can't be depended on by any successor task. In that regard, epochs act like immediately-applied horizons.

This PR also recognizes that the task manager can prune all nodes preceding a horizon A as soon as it has been notified that horizon B with A → B has been reached. This notification signifies that all commands of tasks preceding A have finished executing, so even though they might still be referenced from already-executed commands in the command graph, the task data never has to be accessed again.

To aid debugging, Celerity will now also track the origin of a dependency in addition to its kind. This is currently visualized in generated dot graphs.

The old synchronization infrastructure (broadcast_control_command(), sync_id, ...) has been removed.

@fknorr fknorr requested review from psalz and PeterTh February 7, 2022 13:54
@fknorr fknorr force-pushed the epochs branch 4 times, most recently from 583104d to 3a53747 Compare February 21, 2022 14:57
@fknorr fknorr force-pushed the epochs branch 2 times, most recently from 59307c8 to 24f4c12 Compare February 22, 2022 10:06
@fknorr
Copy link
Contributor Author

fknorr commented Feb 22, 2022

From live discussion with @PeterTh: We need to re-visit an earlier concern about race conditions on tasks returned form task_manager::get_task(): The assumption was that this method can return a const task* that can be accessed without a lock since tasks don't change after creation. However, their dependencies are modified from ~intrusive_graph_node(), so that is not actually true. This is relevant for the PR in question since reducing the pruning delay from 2 horizons to 1 might cause conflicting accesses on the current epoch task.

include/command.h Outdated Show resolved Hide resolved
include/task.h Outdated Show resolved Hide resolved
include/task_manager.h Outdated Show resolved Hide resolved
src/graph_generator.cc Outdated Show resolved Hide resolved
src/graph_generator.cc Show resolved Hide resolved
src/graph_serializer.cc Outdated Show resolved Hide resolved
src/runtime.cc Outdated Show resolved Hide resolved
src/task_manager.cc Outdated Show resolved Hide resolved
src/task_manager.cc Outdated Show resolved Hide resolved
test/graph_compaction_tests.cc Show resolved Hide resolved
include/log.h Show resolved Hide resolved
include/log.h Outdated Show resolved Hide resolved
Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only about a third through the changes so far; I've added some minor notes. However I did a quick test and it seems like there is a ~10% performance drop when running wave_sim -N 4096 -T 200 with 4 workers on gpuc1: Current master takes 522ms on average (over 5 runs), where this PR takes 585ms on average.

include/command.h Outdated Show resolved Hide resolved
include/command_graph.h Outdated Show resolved Hide resolved
include/graph_generator.h Outdated Show resolved Hide resolved
include/graph_generator.h Outdated Show resolved Hide resolved
include/log.h Show resolved Hide resolved
src/executor.cc Outdated Show resolved Hide resolved
include/command.h Outdated Show resolved Hide resolved
src/runtime.cc Outdated Show resolved Hide resolved
src/graph_serializer.cc Outdated Show resolved Hide resolved
src/graph_serializer.cc Outdated Show resolved Hide resolved
@fknorr fknorr force-pushed the epochs branch 12 times, most recently from 36454f2 to c19e7f9 Compare March 14, 2022 14:03
@fknorr
Copy link
Contributor Author

fknorr commented Mar 15, 2022

I have collected longer-running benchmark results on the wave_sim behavior, since previous graph-generation timings turned out to be spurious. It appears that the runtime is usually pretty stable but produces rare and pronounced outliers. This happens on both master and this PR, so I believe we cannot attribute their existence to the changes in this PR.

The following numbers were collected by running wave_sim on 4 nodes, repeated 1000 times, while measuring the run time surrounding all kernel submissions after slow_full_sync():

wave_sim_stats2

wave_sim

The outliers skew the mean, but by eliminating them (or using the median), we see that the "normal" run-time behavior has not changed noticeably.

fknorr and others added 17 commits March 30, 2022 20:01
ORDER_DEP was handled like TRUE_DEP everywhere, and it does not add any value except for context information in printed graphs.
This adds a dependency_origin enum to distinguish data flow dependencies from epoch/horizon/collective temporal dependencies.
… members in task_manager and graph_generator
Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, I've added a few more notes.

The only thing that I still find somewhat confusing is that epochs are both an abstract concept and a concrete type of task/command; in that horizons can also act as epochs, and there are some fields that are named *epoch*, but can also point to a horizon task/command. Can you outline to me again briefly why you decided against treating horizons as a type of epoch?

test/test_utils.h Outdated Show resolved Hide resolved
include/command.h Outdated Show resolved Hide resolved
include/graph_generator.h Outdated Show resolved Hide resolved
include/graph_generator.h Show resolved Hide resolved
src/graph_generator.cc Outdated Show resolved Hide resolved
src/task_manager.cc Show resolved Hide resolved
test/graph_generation_tests.cc Outdated Show resolved Hide resolved
src/graph_generator.cc Outdated Show resolved Hide resolved
src/graph_generator.cc Show resolved Hide resolved
src/graph_generator.cc Outdated Show resolved Hide resolved
Copy link

@BlackMark29A BlackMark29A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The only thing I found was a typo that wasn't even introduced by this PR.

test/graph_compaction_tests.cc Outdated Show resolved Hide resolved
@fknorr
Copy link
Contributor Author

fknorr commented Apr 27, 2022

Thanks to all of you for the continued endurance on this!

@fknorr fknorr merged commit 61dd07e into master Apr 27, 2022
@fknorr fknorr deleted the epochs branch April 27, 2022 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants