Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: central broadcasting entity #171

Open
devreal opened this issue Oct 18, 2021 · 0 comments
Open

RFC: central broadcasting entity #171

devreal opened this issue Oct 18, 2021 · 0 comments
Assignees

Comments

@devreal
Copy link
Contributor

devreal commented Oct 18, 2021

At the moment, each Op is passed a datum and one or more keys to broadcast along with the data. However, the new ttg::broadcast allows broadcasting to multiple output terminals, resulting in Ops connected to each output terminal broadcasting individually. In POTRF, that means that we're sending the same data up to three times (from TRSM to SYRK and to two instances of GEMM).

I propose a new entity that sits on top of the output terminals and broadcasts the data first to each relevant process and from there to each Op. That requires the following additions:

  1. Output terminals have to provide key-process-mapping and the op_id from the Op to the broadcast entity (extension to interface between ttg::Out and the Op).
  2. Backends that support distributed execution have to expose either a broadcasting capability (could use PaRSEC's broadcast API for that) with callbacks back into the broadcast entity at the targets or an AM layer to implement broadcasts manually.
    • This can be made optional such that at compile-time we decide whether to use the centralized broadcast or not, depending on whether the backend provides what we need. If not, we fall back to what we have today.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants