Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReductionToBand: Dynamic number of workers for Panel computation (bulk) #1232

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

albestro
Copy link
Collaborator

@albestro albestro commented Dec 6, 2024

Currently the number of workers available (per configuration) was used as is for spawning an equivalent number of workers for the panel computation, where each worker was in charge of working on a integral number of tiles.

When not enough tiles were available, some worker might end up participating to the synchronisation points, but without any work to do. This happens for sure with small matrices, but due to the iterative nature of the reduction to band algorithm, it was happening anyway also during "tail" iterations in bigger matrices.

With this PR, the configuration is acquired as maximum number of workers available, but now the algorithm ensures that just worker with at least 1 tile to work on get spawned.

TODO:

  • set a different default in benchmarking scripts for GPU systems

Notes:

  • minimum amount of work is set to 1 tile per worker (it might eventually become a config parameter as well); after some not very extensive benchmark, at least 2 tile per worker resulted in slightly worse performances
  • took the chance also to rename to snake_case the configure getter

@albestro albestro added this to the Optimizations milestone Dec 6, 2024
@albestro albestro self-assigned this Dec 6, 2024
@albestro
Copy link
Collaborator Author

albestro commented Dec 6, 2024

This is the result of the (limited) benchmark I had the chance to do.

Grouped by matrix size, each group is sorted from fastest to slower and the value represents the % (1 = 1%) of slowdown compared to the fastest (ranked 1st) of the group.

method          matrix_rows  
min1-local-50   10240         0.000000
min1-local-30   10240         0.100732
min1-local-60   10240         1.169981
fixed-local-10  10240         1.469779
min1-local-40   10240         1.944966
fixed-local-20  10240         3.498074
fixed-local-30  10240         4.496047
fixed-local-40  10240         9.679262
fixed-local-50  10240        12.253712
min2-local-20   10240        22.462525
min2-local-30   10240        23.538846
Name: time, dtype: float64

method          matrix_rows  
min1-local-30   20480         0.000000
min1-local-40   20480         0.478761
min1-local-60   20480         0.607473
min1-local-50   20480         1.277703
fixed-local-20  20480         1.588817
fixed-local-30  20480         1.955221
fixed-local-40  20480         6.059442
fixed-local-50  20480         6.587626
fixed-local-10  20480         8.844894
min2-local-20   20480        12.976138
min2-local-30   20480        14.305041
Name: time, dtype: float64

method          matrix_rows  
fixed-local-30  30097         0.000000
min1-local-50   30097         0.006670
min1-local-40   30097         0.320485
min1-local-30   30097         0.326179
min1-local-60   30097         0.506757
fixed-local-20  30097         3.273669
fixed-local-40  30097         3.341264
fixed-local-50  30097         5.344298
min2-local-30   30097         9.589429
min2-local-20   30097         9.634085
fixed-local-10  30097        11.407736
Name: time, dtype: float64

method          matrix_rows  
min1-local-50   40960         0.000000
min1-local-40   40960         0.653660
min1-local-60   40960         0.985014
fixed-local-30  40960         1.104801
min1-local-30   40960         1.355148
fixed-local-40  40960         2.310430
fixed-local-20  40960         2.483217
min2-local-30   40960         4.856126
fixed-local-50  40960         5.234016
min2-local-20   40960         6.197485
fixed-local-10  40960        10.533097

EDIT: a guide on naming is actually required (thanks @msimberg for pointing that out)

{strategy}-local-{nworkers}

  • nworkers is the number given via cli/configuration for workers
  • strategy might be either:
    • fixed, as it is on master, so always nworkers will participate to the bulk
    • dynamic ("min{x}"), the nworkers is interpreted as the maximum number of workers available. The actual number is calculated depending on the workload (=number of tiles) based on a minimum amount of work per workers (x tiles, value of "min{x}").

e.g. min2-local-30: minimum 2 tiles per worker and at most 30 workers

@albestro albestro marked this pull request as ready for review December 6, 2024 15:44
@albestro albestro requested a review from rasolca December 6, 2024 15:44
@albestro
Copy link
Collaborator Author

albestro commented Dec 6, 2024

cscs-ci run

@msimberg
Copy link
Collaborator

msimberg commented Dec 9, 2024

Grouped by matrix size, each group is sorted from fastest to slower and the value represents the % (1 = 1%) of slowdown compared to the fastest (ranked 1st) of the group.

method          matrix_rows  
min1-local-50   10240         0.000000
min1-local-30   10240         0.100732
min1-local-60   10240         1.169981
fixed-local-10  10240         1.469779
min1-local-40   10240         1.944966
fixed-local-20  10240         3.498074
fixed-local-30  10240         4.496047
fixed-local-40  10240         9.679262
fixed-local-50  10240        12.253712
min2-local-20   10240        22.462525
min2-local-30   10240        23.538846
...

Do you have a short explanation for the names? I assume min1 means at least one tile per worker, min2 at least two tiles per worker, and fixed is probably what is on master. What does the -30 etc. suffix represent?

@rasolca rasolca merged commit e8c7f2c into master Dec 10, 2024
5 checks passed
@rasolca rasolca deleted the alby/dynamic-r2b branch December 10, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants