-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232
Conversation
This is the result of the (limited) benchmark I had the chance to do. Grouped by matrix size, each group is sorted from fastest to slower and the value represents the % (1 = 1%) of slowdown compared to the fastest (ranked 1st) of the group.
EDIT: a guide on naming is actually required (thanks @msimberg for pointing that out)
e.g. |
cscs-ci run |
Do you have a short explanation for the names? I assume |
Currently the number of workers available (per configuration) was used as is for spawning an equivalent number of workers for the panel computation, where each worker was in charge of working on a integral number of tiles.
When not enough tiles were available, some worker might end up participating to the synchronisation points, but without any work to do. This happens for sure with small matrices, but due to the iterative nature of the reduction to band algorithm, it was happening anyway also during "tail" iterations in bigger matrices.
With this PR, the configuration is acquired as maximum number of workers available, but now the algorithm ensures that just worker with at least 1 tile to work on get spawned.
TODO:
Notes: