Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TridiagSolver (dist): STEP1 rank-independent sort of eigenvalues by column type for rank1 solver #967

Merged
merged 5 commits into from
Dec 11, 2023

Conversation

albestro
Copy link
Collaborator

@albestro albestro commented Sep 4, 2023

This is a starting point towards getting a gemm cost reduction as we did for the local variant of the algorithm. Unfortunately, it will require more work than the local one, so this is just the very first step of the entire work that will come.

Note:
A stablePartition function has been dropped and one has been added, resulting in a difficult to read diff (sorry for that, after merge-squashing I don't have a way to isolate it and make it easier for your review). In addition to that, I took also the chance to actually update the test with the new functionalities of the new stablePartition implementation.

@albestro albestro added this to the Optimizations milestone Sep 4, 2023
@albestro albestro self-assigned this Sep 4, 2023
@albestro albestro force-pushed the alby/trisolver-gemm-sub branch from b9fddda to 36527f5 Compare September 4, 2023 14:03
@albestro albestro force-pushed the alby/trisolver-sort-for-gemm-dist branch from 02bd56a to 31503bb Compare September 4, 2023 14:06
@albestro albestro force-pushed the alby/trisolver-gemm-sub branch from 36527f5 to 0a9482d Compare September 5, 2023 09:16
@albestro albestro force-pushed the alby/trisolver-sort-for-gemm-dist branch 2 times, most recently from 3095aa4 to bd90d4b Compare September 6, 2023 12:40
@albestro albestro force-pushed the alby/trisolver-gemm-sub branch from 0a9482d to 86cc933 Compare September 6, 2023 12:41
@albestro albestro changed the title TridiagSolver (dist): rank-independent sort of eigenvalues by column type for rank1 solver TridiagSolver (dist) STEP-1: rank-independent sort of eigenvalues by column type for rank1 solver Sep 29, 2023
@albestro albestro changed the title TridiagSolver (dist) STEP-1: rank-independent sort of eigenvalues by column type for rank1 solver TridiagSolver (dist): STEP1 rank-independent sort of eigenvalues by column type for rank1 solver Sep 29, 2023
@albestro albestro force-pushed the alby/trisolver-sort-for-gemm-dist branch from b1d0c7e to e80f01a Compare September 29, 2023 10:13
albestro added a commit that referenced this pull request Oct 16, 2023
Develop: TridiagSolver (dist): STEP1 rank-independent sort of eigenvalues by column type for rank1 solver (#967)
Develop: TridiagSolver (dist): STEP2 Make rank1 work just on local non-deflated eigenvectors (#996)
Develop: TridiagSolver (dist): STEP2b Make rank1 work just on local non-deflated eigenvectors (multi-threaded) (#997)
Develop: TridiagSolver (dist): STEP3 reduce GEMM step computational cost (#998)
@albestro albestro force-pushed the alby/trisolver-gemm-sub branch 2 times, most recently from f9b5825 to 7c45e8f Compare November 27, 2023 16:36
@albestro albestro force-pushed the alby/trisolver-gemm-sub branch from d7b2b41 to d26d9f4 Compare November 30, 2023 08:21
Base automatically changed from alby/trisolver-gemm-sub to master November 30, 2023 10:24
@albestro albestro force-pushed the alby/trisolver-sort-for-gemm-dist branch from e80f01a to 65349b5 Compare November 30, 2023 16:29
@albestro
Copy link
Collaborator Author

cscs-ci run

Comment on lines 1693 to 1698
// TODO remove this branch. It exists just because GPU permuteJustLocal is not implemented yet
copy(idx_loc_begin, sz_loc_tiles, ws.e0, ws_hm.e0);
dlaf::permutations::internal::permuteJustLocal<Backend::MC, Device::CPU, T, Coord::Col>(
i_begin, i_end, ws_hm.i5, ws_hm.e0, ws_hm.e2);
copy(idx_loc_begin, sz_loc_tiles, ws_hm.e2, ws.e1);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About permuteJustLocal: currently just the MC backend has been implemented. The GPU variant of the tridiag_solver uses it by copying forth-and-back the data.

Since there is the chance that we will drop it in favour of re-using the "classic" permute, as per discussion with @rasolca, we wouldn't spend time on implementing tests for it in this PR.

sort eigenvectors by column type (non-deflated{upper,dense,lower}, non-deflated)
@albestro albestro force-pushed the alby/trisolver-sort-for-gemm-dist branch from 61b3265 to 51109c9 Compare December 4, 2023 15:17
@@ -432,54 +373,187 @@ auto stablePartitionIndexForDeflationArrays(const SizeType n, const ColType* typ
return std::tuple(k, std::move(n_udl));
}

// This function returns number of global non-deflated eigenvectors, together with two permutations:
// - @p index_sorted (sort(non-deflated)|sort(deflated)) -> initial.
// - @p index_sorted_coltype (sort(upper)|sort(dense)|sort(lower)|sort(deflated)) -> initial
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes how it is implemented, but actually the fact that UDL are sorted is an implementation detail (as specified in the code line 456-459), while X (deflated) should be sorted (IIRC @rasolca can you confirm?).

I would change the doc accordingly, i.e. removing the sort for UDL while leaving it for X (also below in the parameter list).

Note: the example I used below in the doc is not the most generic one, because it is given already sorted. Let me know if you want to have a more generic example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for U, D and L to be sorted.
Up to you if you want to leave it in the doc.

@albestro albestro requested review from rasolca and msimberg December 4, 2023 17:24
@albestro albestro marked this pull request as ready for review December 4, 2023 17:24
@albestro
Copy link
Collaborator Author

albestro commented Dec 4, 2023

cscs-ci run

@codecov-commenter
Copy link

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (e311f6a) 94.04% compared to head (0d16047) 94.05%.

Files Patch % Lines
include/dlaf/permutations/general/impl.h 93.93% 1 Missing and 1 partial ⚠️
include/dlaf/eigensolver/tridiag_solver/merge.h 98.30% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #967   +/-   ##
=======================================
  Coverage   94.04%   94.05%           
=======================================
  Files         145      145           
  Lines        9008     9068   +60     
  Branches     1156     1160    +4     
=======================================
+ Hits         8472     8529   +57     
- Misses        319      321    +2     
- Partials      217      218    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Matrix<const T, Device::CPU>& evals,
Matrix<const SizeType, Device::CPU>& in,
Matrix<SizeType, Device::CPU>& out) {
SizeType stablePartitionIndexForDeflationArrays(const matrix::Distribution& dist_sub, const SizeType n,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note-to-self: probably n parameter can be removed, since it is implicitly defined in dist_sub.

@rasolca rasolca merged commit b99bb16 into master Dec 11, 2023
4 checks passed
@rasolca rasolca deleted the alby/trisolver-sort-for-gemm-dist branch December 11, 2023 10:15
github-actions bot pushed a commit that referenced this pull request Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants