-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream support for Gauss-Seidel: Symbolic, Numeric, Apply (PSGS and Team_PSGS) #1906
Conversation
// std::cout << "num_big_rows:" << num_big_rows << std::endl; | ||
|
||
if (KokkosKernels::Impl::kk_is_gpu_exec_space<MyExecSpace>()) { | ||
// check if we have enough memory for this. lower the concurrency if | ||
// we do not have enugh memory. | ||
// TODO: Need to account for number of streams here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the shared memory byte calculations, we may need to update the kk routines that query the exe space. If we can do that, I think the implementation can remain stream-agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the memory queries are independent of streams since the GPU memory is not split and allocated to each stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming
void set_execution_space(const HandleExecSpace exec_space) { | ||
static bool is_exec_space_set = false; | ||
if (!is_exec_space_set) { | ||
this->execution_space = exec_space; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user wants to change the execution space, this can only be done once per handle instantiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes are needed for consistency with the rest of the library.
I am okay with the execution space instance being stored in the handle if that is something done consistently for all sparse algorithms.
@@ -151,10 +151,10 @@ inline void kk_exclusive_parallel_prefix_sum( | |||
template <typename forward_array_type, typename MyExecSpace> | |||
void kk_inclusive_parallel_prefix_sum( | |||
typename forward_array_type::value_type num_elements, | |||
forward_array_type arr) { | |||
typedef Kokkos::RangePolicy<MyExecSpace> my_exec_space; | |||
forward_array_type arr, MyExecSpace my_exec_space = MyExecSpace()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normal pattern when allowing an execution space instance in a kernel is to do the following:
// Add a new overload with the execution space instance as first parameter
// this follows the patter of Kokkos Core and makes our library uniform, i.e.
// user friendly
template <typename forward_array_type, typename MyExecSpace>
void kk_inclusive_parallel_prefix_sum(MyExecSpace space, ...) {}
// Replace the old implementation with a simple wrapper that
// creates an instance of the execution space and passes it to
// the new overload, this is for backward compatibility
template <typename forward_array_type, typename MyExecSpace>
void kk_inclusive_parallel_prefix_sum(...) {
MyExecSpace space{};
kk_inclusive_parallel_prefix_sum(pace, ...);
}
In the case of kk_inclusive_parallel_prefix_sum
since the function is in the impl namespace we can be a bit more lose and we can remove backward compatibility since it should only be used in the library but you might want to check for any usage in Trilinos...
common/src/KokkosKernels_Utils.hpp
Outdated
@@ -457,9 +457,9 @@ struct Fill_Reverse_Map { | |||
template <typename forward_array_type, typename MyExecSpace> | |||
void inclusive_parallel_prefix_sum( | |||
typename forward_array_type::value_type num_elements, | |||
forward_array_type arr) { | |||
forward_array_type arr, MyExecSpace my_exec_space = MyExecSpace()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment about kk_inclusive_parallel_prefix_sum
|
||
typedef typename reverse_array_type::value_type lno_t; | ||
typedef typename forward_array_type::value_type reverse_lno_t; | ||
|
||
const lno_t MINIMUM_TO_ATOMIC = 64; | ||
|
||
typedef Kokkos::RangePolicy<MyExecSpace> my_exec_space; | ||
typedef Kokkos::RangePolicy<MyExecSpace> range_policy_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, that previous name was really poor thanks for catching that...
common/src/KokkosKernels_Utils.hpp
Outdated
size_type &max_row_size) { | ||
typedef Kokkos::RangePolicy<MyExecSpace> my_exec_space; | ||
size_type &max_row_size, | ||
MyExecSpace my_exec_space = MyExecSpace()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment about kk_inclusive_parallel_prefix_sum
// std::cout << "num_big_rows:" << num_big_rows << std::endl; | ||
|
||
if (KokkosKernels::Impl::kk_is_gpu_exec_space<MyExecSpace>()) { | ||
// check if we have enough memory for this. lower the concurrency if | ||
// we do not have enugh memory. | ||
// TODO: Need to account for number of streams here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the memory queries are independent of streams since the GPU memory is not split and allocated to each stream.
@@ -1591,8 +1600,8 @@ class PointGaussSeidel { | |||
this->IterativePSGS(gs, numColors, h_color_xadj, numIter, apply_forward, | |||
apply_backward); | |||
|
|||
// Kokkos::parallel_for( range_pol(0,nr), PermuteVector(x_lhs_output_vec, | |||
// Permuted_Xvector, color_adj)); | |||
// Kokkos::parallel_for( range_policy_t(0,nr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be removed?
@@ -1673,8 +1682,8 @@ class PointGaussSeidel { | |||
apply_backward); | |||
} | |||
|
|||
// Kokkos::parallel_for( range_pol(0,nr), PermuteVector(x_lhs_output_vec, | |||
// Permuted_Xvector, color_adj)); | |||
// Kokkos::parallel_for( range_policy_t(0,nr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this?
sparse/src/KokkosSparse_Utils.hpp
Outdated
typename view_type::non_const_value_type &sum_reduction) { | ||
typedef Kokkos::RangePolicy<MyExecSpace> my_exec_space; | ||
typename view_type::non_const_value_type &sum_reduction, | ||
MyExecSpace my_exec_space = MyExecSpace()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment about kk_inclusive_parallel_prefix_sum
Thanks for the early feedback, @lucbv. It's a big help. I read through all your comments. Why don't we take the same overload approach via the user-facing sparse handle creation overload but store a private reference in the kernel handle to the user's execution space? In this way, we avoid passing the execution space instance as a new parameter deep into the internal sparse call stacks while also providing a similar look and feel to the blas interfaces. For spiluk, sptrsv and GS we will have that new interleaving interface as well. But, if user's want to launch a sparse kernel in a serial for-loop instead, that would also be supported under-the-hood via this handle creation overload. |
- This change also slightly improves performance perf_test/sparse: Add launch and compute timers
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 799 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10 # 390 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021 # 63 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021_Light_LayoutRight # 62 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GNU1021 # 62 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19_solo # 68 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001_solo # 57 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110 # 568 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_A64FX_GCC1020 # 563 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_ROCM520 # 562 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 # 84 (click to expand)
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Remove a couple fences from the unit tests.
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
NOTICE: The AutoTester has encountered an internal error (usually a Communications Timeout), testing will be restarted, previous tests may still be running but will be ignored by the AutoTester... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
3 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
* @param gs_algorithm Specifies which algorithm to use: | ||
* | ||
* KokkosSpace::GS_DEFAULT PointGaussSeidel | ||
* KokkosSpace::GS_PERMUTED ?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-kelley or @lucbv: Can you help me complete in this documentation please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GS_DEFAULT is actually for both point and block. It turns into GS_PERMUTED for CPUs and GS_TEAM for GPUs. PERMUTED and TEAM are the same basic algorithm, the difference is that PERMUTED uses a RangePolicy over rows, but TEAM uses a TeamPolicy over batches of rows and ThreadVector parallelism within rows.
"PERMUTED" refers to the fact that rows/columns of the matrix are reordered into colors for better locality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @brian-kelley. I'll update these docs in a follow-on PR.
@brian-kelley, @lucbv: I've addressed all your feedback. Are you ready to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'm reviewing this now then
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ brian-kelley ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
TODOs:
Related to #1860