ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232

albestro · 2024-12-06T14:11:21Z

Currently the number of workers available (per configuration) was used as is for spawning an equivalent number of workers for the panel computation, where each worker was in charge of working on a integral number of tiles.

When not enough tiles were available, some worker might end up participating to the synchronisation points, but without any work to do. This happens for sure with small matrices, but due to the iterative nature of the reduction to band algorithm, it was happening anyway also during "tail" iterations in bigger matrices.

With this PR, the configuration is acquired as maximum number of workers available, but now the algorithm ensures that just worker with at least 1 tile to work on get spawned.

TODO:

set a different default in benchmarking scripts for GPU systems

Notes:

minimum amount of work is set to 1 tile per worker (it might eventually become a config parameter as well); after some not very extensive benchmark, at least 2 tile per worker resulted in slightly worse performances
took the chance also to rename to snake_case the configure getter

albestro · 2024-12-06T14:19:04Z

This is the result of the (limited) benchmark I had the chance to do.

Grouped by matrix size, each group is sorted from fastest to slower and the value represents the % (1 = 1%) of slowdown compared to the fastest (ranked 1st) of the group.

method          matrix_rows  
min1-local-50   10240         0.000000
min1-local-30   10240         0.100732
min1-local-60   10240         1.169981
fixed-local-10  10240         1.469779
min1-local-40   10240         1.944966
fixed-local-20  10240         3.498074
fixed-local-30  10240         4.496047
fixed-local-40  10240         9.679262
fixed-local-50  10240        12.253712
min2-local-20   10240        22.462525
min2-local-30   10240        23.538846
Name: time, dtype: float64

method          matrix_rows  
min1-local-30   20480         0.000000
min1-local-40   20480         0.478761
min1-local-60   20480         0.607473
min1-local-50   20480         1.277703
fixed-local-20  20480         1.588817
fixed-local-30  20480         1.955221
fixed-local-40  20480         6.059442
fixed-local-50  20480         6.587626
fixed-local-10  20480         8.844894
min2-local-20   20480        12.976138
min2-local-30   20480        14.305041
Name: time, dtype: float64

method          matrix_rows  
fixed-local-30  30097         0.000000
min1-local-50   30097         0.006670
min1-local-40   30097         0.320485
min1-local-30   30097         0.326179
min1-local-60   30097         0.506757
fixed-local-20  30097         3.273669
fixed-local-40  30097         3.341264
fixed-local-50  30097         5.344298
min2-local-30   30097         9.589429
min2-local-20   30097         9.634085
fixed-local-10  30097        11.407736
Name: time, dtype: float64

method          matrix_rows  
min1-local-50   40960         0.000000
min1-local-40   40960         0.653660
min1-local-60   40960         0.985014
fixed-local-30  40960         1.104801
min1-local-30   40960         1.355148
fixed-local-40  40960         2.310430
fixed-local-20  40960         2.483217
min2-local-30   40960         4.856126
fixed-local-50  40960         5.234016
min2-local-20   40960         6.197485
fixed-local-10  40960        10.533097

EDIT: a guide on naming is actually required (thanks @msimberg for pointing that out)

{strategy}-local-{nworkers}

nworkers is the number given via cli/configuration for workers
strategy might be either:
- fixed, as it is on master, so always nworkers will participate to the bulk
- dynamic ("min{x}"), the nworkers is interpreted as the maximum number of workers available. The actual number is calculated depending on the workload (=number of tiles) based on a minimum amount of work per workers (x tiles, value of "min{x}").

e.g. min2-local-30: minimum 2 tiles per worker and at most 30 workers

albestro · 2024-12-06T18:05:57Z

cscs-ci run

msimberg · 2024-12-09T09:12:20Z

Grouped by matrix size, each group is sorted from fastest to slower and the value represents the % (1 = 1%) of slowdown compared to the fastest (ranked 1st) of the group.

method          matrix_rows  
min1-local-50   10240         0.000000
min1-local-30   10240         0.100732
min1-local-60   10240         1.169981
fixed-local-10  10240         1.469779
min1-local-40   10240         1.944966
fixed-local-20  10240         3.498074
fixed-local-30  10240         4.496047
fixed-local-40  10240         9.679262
fixed-local-50  10240        12.253712
min2-local-20   10240        22.462525
min2-local-30   10240        23.538846
...

Do you have a short explanation for the names? I assume min1 means at least one tile per worker, min2 at least two tiles per worker, and fixed is probably what is on master. What does the -30 etc. suffix represent?

… (bulk) (#1232)

albestro added 2 commits December 6, 2024 14:54

dynamic number of workers for reduction to band panel bulk

dc6ffce

snake_case for config getter

c72c143

albestro added the Type:Optimization label Dec 6, 2024

albestro added this to the Optimizations milestone Dec 6, 2024

albestro self-assigned this Dec 6, 2024

albestro marked this pull request as ready for review December 6, 2024 15:44

albestro requested a review from rasolca December 6, 2024 15:44

rasolca approved these changes Dec 6, 2024

View reviewed changes

msimberg approved these changes Dec 9, 2024

View reviewed changes

rasolca merged commit e8c7f2c into master Dec 10, 2024
5 checks passed

rasolca deleted the alby/dynamic-r2b branch December 10, 2024 17:56

github-actions bot pushed a commit that referenced this pull request Dec 10, 2024

Doc: ReductionToBand: Dynamic number of workers for Panel computation…

e866ba0

… (bulk) (#1232)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232

ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232

albestro commented Dec 6, 2024

albestro commented Dec 6, 2024 •

edited

Loading

albestro commented Dec 6, 2024

msimberg commented Dec 9, 2024 •

edited

Loading

ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232

ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232

Conversation

albestro commented Dec 6, 2024

albestro commented Dec 6, 2024 • edited Loading

albestro commented Dec 6, 2024

msimberg commented Dec 9, 2024 • edited Loading

albestro commented Dec 6, 2024 •

edited

Loading

msimberg commented Dec 9, 2024 •

edited

Loading