Introduce new utilities for writing Alpaka kernels #43205

fwyzard · 2023-11-06T17:27:18Z

PR description:

Introduce four new utilities for writing Alpaka kernels:

blocks_with_stride(acc, size)
elements_in_block(acc, block, size)
once_per_grid(acc)
once_per_block(acc)

Simplify the unit tests, and extend them to cover the newly introduced functionality.

blocks_with_stride

blocks_with_stride(acc, size) returns a range than spans the (virtual) block indices required to cover the given problem size.

For example, if size is 1000 and the block size is 16, it will return the range from 0 to 62 (63 blocks of 16 elements covers 1008 elements, enough for a total size of 1000).
If the work division has more than 63 blocks, only the first 63 will perform one iteration of the loop, and the other will exit immediately.
If the work division has less than 63 blocks, some of the blocks will perform more than one iteration, in order to cover then whole problem space.

All threads in a block see the same loop iterations, while threads in different blocks may see a different number of iterations.

elements_in_block

elements_in_block(acc, block, size) returns a range that spans all the elements within the given block. Iterating over the range yields values of type ElementIndex, that contain both .global and .local indices of the corresponding element.

If the work division has only one element per thread, the loop will perform at most one iteration.
If the work division has more than one elements per thread, the loop will perform that number of iterations, or less if it reaches size.

once_per_grid

once_per_grid(acc) evaluates to true for a single thread within the kernel execution grid.

Usually the condition is true for block 0 and thread 0, but these indices should not be relied upon.

once_per_block

once_per_block(acc) evaluates to true for a single thread within the block.

Usually the condition is true for thread 0, but this index should not be relied upon.

PR validation:

The updated unit tests compile and pass.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

If the master branch is moved to CMSSW_14_0_X, this PR will be backported to CMSSW_13_3_X.

fwyzard · 2023-11-06T17:28:06Z

enable gpu

fwyzard · 2023-11-06T17:28:10Z

please test

cmsbuild · 2023-11-06T17:34:11Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43205/37529

This PR adds an extra 24KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File HeterogeneousCore/AlpakaInterface/interface/workdivision.h modified in PR(s): Adding HeterogeneousCore/AlpakaUtilities and changes into HeterogeneousCore/AlpakaInterface #40932, Porting Pixel Tracks to Alpaka [Not to Merge] #41117, Pixel Alpaka Migration: AlpakaInterface Updates [II] #41284, Ports prefixScan, OneToManyAssoc and HistoContainer from CUDAUtilities. #43064

cmsbuild · 2023-11-06T17:34:35Z

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

HeterogeneousCore/AlpakaInterface (heterogeneous)

@fwyzard, @makortel can you please review it and eventually sign? Thanks.
@missirol, @makortel, @rovere this is something you requested to watch as well.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

cmsbuild · 2023-11-06T20:13:39Z

-1

Failed Tests: RelVals-GPU GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35643/summary.html
COMMIT: 1798b10
CMSSW: CMSSW_13_3_X_2023-11-06-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43205/35643/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

12434.58712434.587_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation/step2_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation.log

GPU Unit Tests

I found 1 errors in the following unit tests:

---> test alpakaTestKernelCudaAsync had ERRORS

Comparison Summary

Summary:

You potentially added 10 lines to the logs
Reco comparison results: 136 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3363010
DQMHistoTests: Total failures: 1790
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3361198
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
Checked 214 log files, 167 edm output root files, 50 DQM output files
TriggerResults: no differences found

fwyzard · 2023-11-06T22:36:13Z

GPU Unit Tests

I found 1 errors in the following unit tests:

---> test alpakaTestKernelCudaAsync had ERRORS

The original test fails to run, but exits with a non-error status:

$ $CMSSW_FULL_RELEASE_BASE/test/$SCRAM_ARCH/alpakaTestKernelCudaAsync
No devices available on the platform alpaka_cuda_async, the test will be skipped.
No devices available on the platform alpaka_cuda_async, the test will be skipped.
No devices available on the platform alpaka_cuda_async, the test will be skipped.
===============================================================================
test cases: 3 | 3 passed
assertions: - none -

I though SCRAM would not run the CUDA tests if cudaIsEnabled fails ?

makortel · 2023-11-07T07:03:13Z

I though SCRAM would not run the CUDA tests if cudaIsEnabled fails ?

That is the default behavior, but for GPU_X IBs and the PR GPU tests all tests depending on cuda are explicitly run (this is visible e.g. in the PR GPU unit test log

+ eval USER_UNIT_TESTS=cuda timeout 7320 scram b -v -k -j 4 unittests
++ USER_UNIT_TESTS=cuda
++ timeout 7320 scram b -v -k -j 4 unittests

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35643/gpuUnitTests/log.txt
)

So the node where the GPU tests were run did not have a GPU (setup compatible with CUDA 12.2?) after all?

fwyzard · 2023-11-07T07:45:00Z

CERN GPU nodes will be updated to CUDA 12.2 between tomorrow and Wednesday: https://cern.service-now.com/service-portal?id=outage&n=OTG0145266 .

fwyzard · 2023-11-07T07:46:55Z

@makortel

do you think the way to write the test is OK, or we should have a different behaviour?
do the other changes look good?

makortel · 2023-11-07T09:39:14Z

Looks ok to me

fwyzard · 2023-11-07T09:44:20Z

+heterogenous

fwyzard · 2023-11-07T13:00:48Z

+heterogeneous

cmsbuild · 2023-11-07T13:01:11Z

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

fwyzard · 2023-11-14T17:08:33Z

please test

cmsbuild · 2023-11-14T17:11:02Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43205/37682

This PR adds an extra 24KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File HeterogeneousCore/AlpakaInterface/interface/workdivision.h modified in PR(s): Adding HeterogeneousCore/AlpakaUtilities and changes into HeterogeneousCore/AlpakaInterface #40932, Porting Pixel Tracks to Alpaka [Not to Merge] #41117, Pixel Alpaka Migration: AlpakaInterface Updates [II] #41284, Ports prefixScan, OneToManyAssoc and HistoContainer from CUDAUtilities. #43064, ECAL unpacker and ECAL multifit algorithm migration to alpaka #43257
- File HeterogeneousCore/AlpakaInterface/test/alpaka/testKernel.dev.cc modified in PR(s): ECAL unpacker and ECAL multifit algorithm migration to alpaka #43257

cmsbuild · 2023-11-14T17:11:23Z

Pull request #43205 was updated. @fwyzard, @makortel can you please check and sign again.

cmsbuild · 2023-11-14T19:56:04Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35811/summary.html
COMMIT: 8c859bc
CMSSW: CMSSW_14_0_X_2023-11-14-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43205/35811/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 256 lines from the logs
Reco comparison results: 136 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3363028
DQMHistoTests: Total failures: 2392
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3360614
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
Checked 214 log files, 167 edm output root files, 50 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 48 differences found in the comparisons
DQMHistoTests: Total files compared: 3
DQMHistoTests: Total histograms compared: 39740
DQMHistoTests: Total failures: 1835
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 37905
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
Checked 8 log files, 10 edm output root files, 3 DQM output files
TriggerResults: no differences found

fwyzard · 2023-11-14T22:26:52Z

+heterogeneous

cmsbuild · 2023-11-14T22:27:16Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

antoniovilela · 2023-11-15T21:03:54Z

+1

Simplify and extend the alpaka kernel tests

3eed8ad

cmsbuild added this to the CMSSW_14_0_X milestone Nov 6, 2023

cmsbuild added pending-signatures tests-pending orp-pending code-checks-pending heterogeneous-pending labels Nov 6, 2023

cmsbuild added tests-started and removed tests-pending labels Nov 6, 2023

cmsbuild added code-checks-approved and removed code-checks-pending labels Nov 6, 2023

cmsbuild added tests-rejected and removed tests-started labels Nov 6, 2023

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Nov 7, 2023

cmsbuild added tests-pending code-checks-pending and removed tests-approved code-checks-approved labels Nov 14, 2023

cmsbuild added tests-started and removed tests-pending labels Nov 14, 2023

cmsbuild added code-checks-approved and removed code-checks-pending labels Nov 14, 2023

cmsbuild added tests-approved and removed tests-started labels Nov 14, 2023

fwyzard changed the title ~~Add the blocks_with_stride and elements_in_block ranges~~ Introduce new utilities for writing Alpaka kernels Nov 14, 2023

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Nov 14, 2023

fwyzard mentioned this pull request Nov 14, 2023

Introduce new utilities for writing Alpaka kernels [13.3.x] #43280

Merged

cmsbuild mentioned this pull request Nov 15, 2023

Porting Pixel Tracks to Alpaka [Not to Merge] #41117

Closed

fwyzard mentioned this pull request Nov 15, 2023

Add Alpaka Implementation of PFClusterProducer #43130

Merged

cmsbuild added orp-approved and removed orp-pending labels Nov 15, 2023

cmsbuild merged commit 18bde74 into cms-sw:master Nov 15, 2023
24 checks passed

cmsbuild mentioned this pull request Nov 16, 2023

Add recipe for pytorch (C++ interface only) cms-sw/cmsdist#8388

Merged

fwyzard deleted the implement_blocks_with_stride branch January 30, 2024 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce new utilities for writing Alpaka kernels #43205

Introduce new utilities for writing Alpaka kernels #43205

fwyzard commented Nov 6, 2023 •

edited

Loading

fwyzard commented Nov 6, 2023

fwyzard commented Nov 6, 2023

cmsbuild commented Nov 6, 2023

cmsbuild commented Nov 6, 2023

cmsbuild commented Nov 6, 2023

fwyzard commented Nov 6, 2023

GPU Unit Tests

makortel commented Nov 7, 2023

fwyzard commented Nov 7, 2023 via email

fwyzard commented Nov 7, 2023

makortel commented Nov 7, 2023

fwyzard commented Nov 7, 2023

fwyzard commented Nov 7, 2023

cmsbuild commented Nov 7, 2023

fwyzard commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

fwyzard commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

antoniovilela commented Nov 15, 2023

Introduce new utilities for writing Alpaka kernels #43205

Introduce new utilities for writing Alpaka kernels #43205

Conversation

fwyzard commented Nov 6, 2023 • edited Loading

PR description:

blocks_with_stride

elements_in_block

once_per_grid

once_per_block

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

fwyzard commented Nov 6, 2023

fwyzard commented Nov 6, 2023

cmsbuild commented Nov 6, 2023

cmsbuild commented Nov 6, 2023

cmsbuild commented Nov 6, 2023

RelVals-GPU

GPU Unit Tests

Comparison Summary

fwyzard commented Nov 6, 2023

GPU Unit Tests

makortel commented Nov 7, 2023

fwyzard commented Nov 7, 2023 via email

fwyzard commented Nov 7, 2023

makortel commented Nov 7, 2023

fwyzard commented Nov 7, 2023

fwyzard commented Nov 7, 2023

cmsbuild commented Nov 7, 2023

fwyzard commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

Comparison Summary

GPU Comparison Summary

fwyzard commented Nov 14, 2023

cmsbuild commented Nov 14, 2023

antoniovilela commented Nov 15, 2023

fwyzard commented Nov 6, 2023 •

edited

Loading