-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Axpby using less deep copy (solves issue #2080) #2081
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_ROCM560
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_Tpls_ROCM560
Jenkins Parameters
Using Repos:
Pull Request Author: eeprude |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_ROCM560
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_Tpls_ROCM560
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
5 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
4 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eeprude , this looks like a good way to solve the Trilinos issue.
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ brian-kelley ]! |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
3 similar comments
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
For performance reasons, Trilinos has tests that count the number of calls to Kokkos::deep_copy(). The counts increased with the last changes in axpby(), always when X and Y were rank 2 on device (cuda in the current tests) and 'a' and 'b' were scalars. In such cases, I should not make the internal types of 'a' and 'b' to become Kokkos views, but rather make the internal types to be scalars as well. The current PR makes such performance improvement, not only for the situation of X and Y being rank 2 on device, but also for the situation of X and Y being rank 1 on device. The unit tests at the axpby() level were already very thorough, and they all continue to pass (I temporarily put a printout out in the logic change and confirmed that the new logic is being properly exercised).
Thanks to Brian Kelley for explaining me the test results and the necessity of keeping 'a' and 'b' scalars when (i) X and Y are rank-2 or rank-1 on device, and (ii) both inputs 'a' and 'b' are already scalars.