-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supernodal SpTRSV, improve symbolic performance #899
Conversation
…a few more timers
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
5 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
6 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iyamazaki, Please see my review comments. What do you think about using bit-wise operators to more efficiently utilize the memory allocated in check
and completely avoid resetting it at the end of the outer loop? Another option would be to make check an array of bools.
int nblocks = 0; | ||
for (int s = 0; s < nsuper; s++) { | ||
int j1 = nb[s]; | ||
int j2 = j1+1; // based on the first row | ||
|
||
size_type nidxs = 0; | ||
for (size_type i = row_map_host (j1); i < row_map_host (j2); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is row_map_host
an array of row indices indicating where rows that contain non-zero scalars exist in a given 2-rank matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it points to the location of the nonzero, but even though we operate on supernodal blocks, the matrix is still stored in the standard CSR format with rank-1 arrays.
int nblocks = 0; | ||
for (int s = 0; s < nsuper; s++) { | ||
int j1 = nb[s]; | ||
int j2 = j1+1; // based on the first row | ||
|
||
size_type nidxs = 0; | ||
for (size_type i = row_map_host (j1); i < row_map_host (j2); i++) { | ||
int s2 = map (entries_host (i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is s2
the index of a 2-rank matrix (supernodal block?) that contains non-zero scalars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, s2 is the index of the supernodal block, but the matrix is stored in the standard CSR format with rank-1 arrays.
int nblocks = 0; | ||
for (int s = 0; s < nsuper; s++) { | ||
int j1 = nb[s]; | ||
int j2 = j1+1; // based on the first row | ||
|
||
size_type nidxs = 0; | ||
for (size_type i = row_map_host (j1); i < row_map_host (j2); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the maximum value in row_map_host(:)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the number of nonzero entries in the matrix.
Kokkos::deep_copy (check, 0); | ||
//Kokkos::deep_copy (check, 0); | ||
for (size_type i = 0; i < nidxs; i++) { | ||
check (idxs(i)) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making check
an array of 64bit integers and using bit-wise operators to toggle bits. In this way, it may be reasonable to allocate check
to be large enough so that it does not need to be overwritten before every iteration of the inner loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nsup is the number of supernodal blocks. So, in the worst case, it can be number of columns, and in order not to be overwritten, I need to allocate n^2 entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the size of check
in bytes now? What is the maximum value of n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the size of the local matrix and is an input from a user. So, it is difficult to say the max. Maybe, typically, it is around O(10^5)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If n
is O(10^5), then check
would need to be up to ~1GB if each element of check
is a single bit? To reduce memory usage, a fall-back approach could be to have two or more limited-size check
arrays; when one is used up, zero the used one out in parallel; this would require some synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @e10harvey. Any saving in the memory is good (though nsup is usually much smaller than n, and this setup is still done on the host).
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
6 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
4 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your review, @e10harvey !!
int nblocks = 0; | ||
for (int s = 0; s < nsuper; s++) { | ||
int j1 = nb[s]; | ||
int j2 = j1+1; // based on the first row | ||
|
||
size_type nidxs = 0; | ||
for (size_type i = row_map_host (j1); i < row_map_host (j2); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it points to the location of the nonzero, but even though we operate on supernodal blocks, the matrix is still stored in the standard CSR format with rank-1 arrays.
int nblocks = 0; | ||
for (int s = 0; s < nsuper; s++) { | ||
int j1 = nb[s]; | ||
int j2 = j1+1; // based on the first row | ||
|
||
size_type nidxs = 0; | ||
for (size_type i = row_map_host (j1); i < row_map_host (j2); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the number of nonzero entries in the matrix.
int nblocks = 0; | ||
for (int s = 0; s < nsuper; s++) { | ||
int j1 = nb[s]; | ||
int j2 = j1+1; // based on the first row | ||
|
||
size_type nidxs = 0; | ||
for (size_type i = row_map_host (j1); i < row_map_host (j2); i++) { | ||
int s2 = map (entries_host (i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, s2 is the index of the supernodal block, but the matrix is stored in the standard CSR format with rank-1 arrays.
Kokkos::deep_copy (check, 0); | ||
//Kokkos::deep_copy (check, 0); | ||
for (size_type i = 0; i < nidxs; i++) { | ||
check (idxs(i)) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nsup is the number of supernodal blocks. So, in the worst case, it can be number of columns, and in order not to be overwritten, I need to allocate n^2 entries?
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
2 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @iyamazaki ! Sorry for the delayed approval.
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]! |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
Thank you @e10harvey so much for reviewing this PR!! |
This PR aims to reduce the SpTRSV symbolic time by keep track of nonzero entries in generate_supernodal_graph (using deep-copy of the whole vector can be expensive for a large matrix).
Spot-checks on White: