-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1891 #1927
Fix #1891 #1927
Conversation
(HashmapAccumulator data races on Ada and Hopper architectures). To avoid checking for every architecture at every location, add a new macro KOKKOSKERNELS_CUDA_INDEPENDENT_THREADS which is defined if we're targeting an architecture with independent thread scheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks fine to me but also prompts the question: is this something unique to NVIDIA GPUs or is it also the case on some AMD/Intel GPUs?
AFAIK it's only NVIDIA hardware that can do this, wavefronts/subgroups behave like SIMDs (all lanes are executing the same instruction at any given time). |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_Tpls_ARMPL2110
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_A64FX_GCC1020
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_ROCM520
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Verify this fixes my trilinos bug. Thanks |
Thanks @bathmatt then I'll merge this and we can work on a Trilinos snapshot if @brian-kelley has not started on it already! |
Awesome !!! I've got it snapped into my local branch of trilinos. |
@lucbv i just put up a patch here: trilinos/Trilinos#12097 |
Thanks @brian-kelley in general we could make the assumption that all new NVIDIA GPUs will support this thread level scheduling and reverse the logic of your macro, something like
This would avoid having to update the macro every time a new GPU comes out? |
Marking with TrilinosPatchMatch to track so the Trilinos changes are not clobbered in the event we issue a patch release (see PR trilinos/Trilinos#12097) |
(HashmapAccumulator data races on Ada and Hopper architectures). To avoid checking for every architecture at every location, add a new macro
KOKKOSKERNELS_CUDA_INDEPENDENT_THREADS
which is defined if we're targeting an architecture with independent thread scheduling (Volta, Ampere, Turing, Ada, Hopper, and any future ones).