-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TridiagSolver (dist): STEP1 rank-independent sort of eigenvalues by column type for rank1 solver #967
Conversation
b9fddda
to
36527f5
Compare
02bd56a
to
31503bb
Compare
36527f5
to
0a9482d
Compare
3095aa4
to
bd90d4b
Compare
0a9482d
to
86cc933
Compare
b1d0c7e
to
e80f01a
Compare
Develop: TridiagSolver (dist): STEP1 rank-independent sort of eigenvalues by column type for rank1 solver (#967) Develop: TridiagSolver (dist): STEP2 Make rank1 work just on local non-deflated eigenvectors (#996) Develop: TridiagSolver (dist): STEP2b Make rank1 work just on local non-deflated eigenvectors (multi-threaded) (#997) Develop: TridiagSolver (dist): STEP3 reduce GEMM step computational cost (#998)
f9b5825
to
7c45e8f
Compare
d7b2b41
to
d26d9f4
Compare
e80f01a
to
65349b5
Compare
cscs-ci run |
// TODO remove this branch. It exists just because GPU permuteJustLocal is not implemented yet | ||
copy(idx_loc_begin, sz_loc_tiles, ws.e0, ws_hm.e0); | ||
dlaf::permutations::internal::permuteJustLocal<Backend::MC, Device::CPU, T, Coord::Col>( | ||
i_begin, i_end, ws_hm.i5, ws_hm.e0, ws_hm.e2); | ||
copy(idx_loc_begin, sz_loc_tiles, ws_hm.e2, ws.e1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About permuteJustLocal
: currently just the MC backend has been implemented. The GPU variant of the tridiag_solver
uses it by copying forth-and-back the data.
Since there is the chance that we will drop it in favour of re-using the "classic" permute
, as per discussion with @rasolca, we wouldn't spend time on implementing tests for it in this PR.
sort eigenvectors by column type (non-deflated{upper,dense,lower}, non-deflated)
61b3265
to
51109c9
Compare
@@ -432,54 +373,187 @@ auto stablePartitionIndexForDeflationArrays(const SizeType n, const ColType* typ | |||
return std::tuple(k, std::move(n_udl)); | |||
} | |||
|
|||
// This function returns number of global non-deflated eigenvectors, together with two permutations: | |||
// - @p index_sorted (sort(non-deflated)|sort(deflated)) -> initial. | |||
// - @p index_sorted_coltype (sort(upper)|sort(dense)|sort(lower)|sort(deflated)) -> initial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This describes how it is implemented, but actually the fact that UDL are sorted is an implementation detail (as specified in the code line 456-459), while X (deflated) should be sorted (IIRC @rasolca can you confirm?).
I would change the doc accordingly, i.e. removing the sort
for UDL while leaving it for X (also below in the parameter list).
Note: the example I used below in the doc is not the most generic one, because it is given already sorted. Let me know if you want to have a more generic example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for U, D and L to be sorted.
Up to you if you want to leave it in the doc.
cscs-ci run |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #967 +/- ##
=======================================
Coverage 94.04% 94.05%
=======================================
Files 145 145
Lines 9008 9068 +60
Branches 1156 1160 +4
=======================================
+ Hits 8472 8529 +57
- Misses 319 321 +2
- Partials 217 218 +1 ☔ View full report in Codecov by Sentry. |
Matrix<const T, Device::CPU>& evals, | ||
Matrix<const SizeType, Device::CPU>& in, | ||
Matrix<SizeType, Device::CPU>& out) { | ||
SizeType stablePartitionIndexForDeflationArrays(const matrix::Distribution& dist_sub, const SizeType n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note-to-self: probably n
parameter can be removed, since it is implicitly defined in dist_sub
.
… by column type for rank1 solver (#967)
This is a starting point towards getting a gemm cost reduction as we did for the local variant of the algorithm. Unfortunately, it will require more work than the local one, so this is just the very first step of the entire work that will come.
ImplementpermuteJustLocal
for GPUDistribution
helpers in a separate PR (@rasolca)Note:
A
stablePartition
function has been dropped and one has been added, resulting in a difficult to read diff (sorry for that, after merge-squashing I don't have a way to isolate it and make it easier for your review). In addition to that, I took also the chance to actually update the test with the new functionalities of the newstablePartition
implementation.