-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TridiagSolver (dist): STEP2b Make rank1 work just on local non-deflated eigenvectors (multi-threaded) #997
Conversation
a36baa8
to
05d4760
Compare
c22b56f
to
42d6681
Compare
Develop: TridiagSolver (dist): STEP1 rank-independent sort of eigenvalues by column type for rank1 solver (#967) Develop: TridiagSolver (dist): STEP2 Make rank1 work just on local non-deflated eigenvectors (#996) Develop: TridiagSolver (dist): STEP2b Make rank1 work just on local non-deflated eigenvectors (multi-threaded) (#997) Develop: TridiagSolver (dist): STEP3 reduce GEMM step computational cost (#998)
42d6681
to
477b2e2
Compare
05d4760
to
fffad13
Compare
477b2e2
to
0d17094
Compare
fffad13
to
a538c2e
Compare
const std::size_t nthreads = [dist_sub, k_lc] { | ||
const std::size_t workload = to_sizet(dist_sub.localSize().rows() * k_lc); | ||
const std::size_t workload_unit = 2 * to_sizet(dist_sub.tile_size().linear_size()); | ||
|
||
const std::size_t min_workers = 1; | ||
const std::size_t available_workers = getTridiagRank1NWorkers(); | ||
|
||
const std::size_t ideal_workers = util::ceilDiv(to_sizet(workload), workload_unit); | ||
return std::clamp(ideal_workers, min_workers, available_workers); | ||
}(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pointing out the logic, which might deserve a comment/note.
Number of "ideal" threads is calculated dynamically as a function of how much work rank1-solver has to do. The work is measured in terms of local problem size and a thread should have at least 2 tiles to work on.
This ideal number of threads is then used to determine how many actual threads can be used, by checking the range defined by configuration parameters (at least 1, at most config parameter).
6c4a7b5
to
ec9ebf6
Compare
cscs-ci run |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #997 +/- ##
==========================================
+ Coverage 93.98% 94.04% +0.05%
==========================================
Files 145 145
Lines 9034 9050 +16
Branches 1159 1157 -2
==========================================
+ Hits 8491 8511 +20
Misses 321 321
+ Partials 222 218 -4 ☔ View full report in Codecov by Sentry. |
…eflated eigenvectors (multi-threaded) (#997)
This PR makes the changes from #996 multithreaded.
Main changes:
Suggestion for reviewers: turn on "hide whitespace" feature (due to the different indentation introduced with an additional nesting for dynamic nthreads).
TODO: