-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IDAG] Communication & Receive Arbitration #252
Conversation
Check-perf-impact results: (19e9c0e63e2eff8d602cc7a81b622c19) ❓ No new benchmark data submitted. ❓ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Pull Request Test Coverage Report for Build 8340717687Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Check-perf-impact results: (7b849de16ff11660b98988ab0b032db7) ❓ No new benchmark data submitted. ❓ |
...if corresponding fragments have already been received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I particularly like the combinatorial testing setup for the receive_arbiter
, very elegant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
CI is unreliable at the moment (either a Github or slurmactiond issue or both). Everything passes locally for me and formatting is also correct, so I'm doing a blind merge here. |
This is the second PR in the Instruction Graph series, and the first one to touch the execution side of the new runtime. It adds the necessary infrastructure to perform in-place sends and receives from (host) buffers and to match pilots to their corresponding payloads. See #249 for context.
This does not really interfere with #249 nor is it based on that PR, the only overlap being
pilot.h
.communicator
class communicator
is an abstraction ofMPI_Comm
that provides all operations to transmit and receive pilots and perform asynchronous sends / receives on strided buffer-allocation data with known message ids. It does not spawn its own thread but relies on polling from the executor thread (which currently is the main thread in tests).Edit: clang-tidy gives me a heap of MPI check warnings about re-using requests and memory leaks, but I'm convinced all of these are false positives (see the second github-actions review below).
receive_arbiter
Instead of linearizing and packing transfer data into
frame
s, the IDAG runtime transfers metadata ahead-of-time throughpilot_messages
and performs in-placeMPI_Isend
/Irecv
operations (via the communicator abstraction). Sincereceive_instruction
s and pilots, andMPI_Recv
'd data can appear in any order, a state machine must be maintained for every transfer at execution time to drive every receive operation to completion.I call this method receive arbitration. It requires some added complexity to deal with the fact that incoming send-boxes do not need to match received regions in the IDAG exactly. In this regard, it is similar to the legacy
buffer_transfer_manager
. Like communicators, the receive arbiter is also polled from the executor thread and does not add its own concurrency.The receive-arbiter tests mock the communicator to generate all possible sequences of events observable at execution time for each test case, to account for the non-deterministic behavior in the cluster.
async_event
Type-erased events are needed for many different situations in IDAG execution. Since this PR is the first to require them, it adds
async_event
as a generic construct.