-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new utilities for writing Alpaka kernels #43205
Conversation
enable gpu |
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43205/37529
|
A new Pull Request was created by @fwyzard (Andrea Bocci) for master. It involves the following packages:
@fwyzard, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
-1 Failed Tests: RelVals-GPU GpuUnitTests RelVals-GPU
GPU Unit TestsI found 1 errors in the following unit tests: ---> test alpakaTestKernelCudaAsync had ERRORS Comparison SummarySummary:
|
The original test fails to run, but exits with a non-error status:
I though SCRAM would not run the CUDA tests if |
That is the default behavior, but for GPU_X IBs and the PR GPU tests all tests depending on
So the node where the GPU tests were run did not have a GPU (setup compatible with CUDA 12.2?) after all? |
CERN GPU nodes will be updated to CUDA 12.2 between tomorrow and Wednesday:
https://cern.service-now.com/service-portal?id=outage&n=OTG0145266 .
|
|
Looks ok to me |
+heterogenous |
+heterogeneous |
This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43205/37682
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35811/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+heterogeneous |
blocks_with_stride
and elements_in_block
ranges
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
Introduce four new utilities for writing Alpaka kernels:
blocks_with_stride(acc, size)
elements_in_block(acc, block, size)
once_per_grid(acc)
once_per_block(acc)
Simplify the unit tests, and extend them to cover the newly introduced functionality.
blocks_with_stride
blocks_with_stride(acc, size)
returns a range than spans the (virtual) block indices required to cover the given problem size.For example, if size is 1000 and the block size is 16, it will return the range from 0 to 62 (63 blocks of 16 elements covers 1008 elements, enough for a total size of 1000).
If the work division has more than 63 blocks, only the first 63 will perform one iteration of the loop, and the other will exit immediately.
If the work division has less than 63 blocks, some of the blocks will perform more than one iteration, in order to cover then whole problem space.
All threads in a block see the same loop iterations, while threads in different blocks may see a different number of iterations.
elements_in_block
elements_in_block(acc, block, size)
returns a range that spans all the elements within the given block. Iterating over the range yields values of typeElementIndex
, that contain both.global
and.local
indices of the corresponding element.If the work division has only one element per thread, the loop will perform at most one iteration.
If the work division has more than one elements per thread, the loop will perform that number of iterations, or less if it reaches size.
once_per_grid
once_per_grid(acc)
evaluates to true for a single thread within the kernel execution grid.Usually the condition is true for block 0 and thread 0, but these indices should not be relied upon.
once_per_block
once_per_block(acc)
evaluates to true for a single thread within the block.Usually the condition is true for thread 0, but this index should not be relied upon.
PR validation:
The updated unit tests compile and pass.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
If the
master
branch is moved toCMSSW_14_0_X
, this PR will be backported toCMSSW_13_3_X
.