Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test for Alpaka libraries and build rules #44622

Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Apr 4, 2024

PR description:

The package HeterogeneousTest/AlpakaDevice implements a header only library that defines Alpaka device-only functions, and a plugin and test that use them.

The package HeterogeneousTest/AlpakaKernel implements a header only library that imports device functions from HeterogeneousTest/AlpakaDevice to define Alpaka kernels, and a plugin and test that use them.

The package HeterogeneousTest/AlpakaWrapper implements a library that imports kernels from HeterogeneousTest/AlpakaKernel to define and export host-only wrappers around them, usable by non-Alpaka libraries, plugins and applications, and implements a plugin and test that use them.

The package HeterogeneousTest/AlpakaOpaque implements a library that imports kernels from HeterogeneousTest/AlpakaKernel to define and export host-only wrappers around the whole Alpaka section, usable by libraries, plugins and applications that are not explicitly Alpaka-aware, and implements a plugin and test that use them.

In addition, fix the CUDA and ROCm documentation, and various typos in the ROCm tests.

PR validation:

The new unit tests build and pass (on a machine with NVIDIA GPUs):

Skip    0s ... HeterogeneousTest/AlpakaDevice/testAlpakaDeviceAdditionROCmAsync (Failed to run rocmIsEnabled)
Skip    0s ... HeterogeneousTest/AlpakaKernel/testAlpakaDeviceAdditionKernelROCmAsync (Failed to run rocmIsEnabled)
Skip    0s ... HeterogeneousTest/AlpakaWrapper/testAlpakaDeviceAdditionWrapperROCmAsync (Failed to run rocmIsEnabled)
Skip    0s ... HeterogeneousTest/AlpakaOpaque/testAlpakaDeviceAdditionOpaqueROCmAsync (Failed to run rocmIsEnabled)
Pass    0s ... HeterogeneousTest/AlpakaDevice/testAlpakaDeviceAdditionSerialSync
Pass    0s ... HeterogeneousTest/AlpakaKernel/testAlpakaDeviceAdditionKernelSerialSync
Pass    0s ... HeterogeneousTest/AlpakaWrapper/testAlpakaDeviceAdditionWrapperSerialSync
Pass    0s ... HeterogeneousTest/AlpakaOpaque/testAlpakaDeviceAdditionOpaqueSerialSync
Pass    5s ... HeterogeneousTest/AlpakaDevice/testAlpakaDeviceAdditionCudaAsync
Pass    5s ... HeterogeneousTest/AlpakaOpaque/testAlpakaDeviceAdditionOpaqueCudaAsync
Pass    5s ... HeterogeneousTest/AlpakaWrapper/testAlpakaDeviceAdditionWrapperCudaAsync
Pass    6s ... HeterogeneousTest/AlpakaKernel/testAlpakaDeviceAdditionKernelCudaAsync
Pass   10s ... HeterogeneousTest/AlpakaDevice/testAlpakaTestDeviceAdditionModule
Pass   10s ... HeterogeneousTest/AlpakaKernel/testAlpakaTestKernelAdditionModule
Pass   10s ... HeterogeneousTest/AlpakaWrapper/testAlpakaTestWrapperAdditionModule
Pass   11s ... HeterogeneousTest/AlpakaOpaque/testAlpakaTestOpaqueAdditionModule

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

These packages are only used as unit tests, so a backport is not essential.
However, since we do use Alpaka-based packages for data taking in CMSSW 14.0.x, it could be a good idea to backport these unit tests as well.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

These packages implements unit tests for, among other things, the issue described in #44506 and fixed by cms-sw/cmssw-config#107: HeterogeneousTest/AlpakaWrapper fails to build in CMSSW_14_1_0_pre2, and builds correctly in CMSSW_14_1_X_2024-04-04-1100.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

@cms-sw/orp-l2 or @makortel, please let me know if you think these unit tests should be backported to CMSSW_14_0_X.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

@makortel, unlike the CUDA and ROCm case, with Alpaka there is no significant difference between HeterogeneousTest/AlpakaDevice and HeterogeneousTest/AlpakaKernel, because kernels are implemented as device functions and anyway (almost) all device code is templated on the accelerator type and needs to be in header files.

I kept the two packages for consistency with the CUDA and ROCm case.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2024

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44622/39810

  • This PR adds an extra 32KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

@makortel, these packages do not use the "alpaka framework" for simplicity and to decouple them from any specific framework implementation.

@fwyzard fwyzard force-pushed the implement_HeterogeneousTest_Alpaka_141x branch from 535ef59 to 775a4bc Compare April 4, 2024 17:21
@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 4, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44622/39811

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2024

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • HeterogeneousTest/AlpakaDevice (****)
  • HeterogeneousTest/AlpakaKernel (****)
  • HeterogeneousTest/AlpakaOpaque (****)
  • HeterogeneousTest/AlpakaWrapper (****)
  • HeterogeneousTest/CUDAKernel (heterogeneous)
  • HeterogeneousTest/ROCmDevice (heterogeneous)
  • HeterogeneousTest/ROCmKernel (heterogeneous)
  • HeterogeneousTest/ROCmOpaque (heterogeneous)
  • HeterogeneousTest/ROCmWrapper (heterogeneous)

The following packages do not have a category, yet:

HeterogeneousTest/AlpakaDevice
HeterogeneousTest/AlpakaKernel
HeterogeneousTest/AlpakaOpaque
HeterogeneousTest/AlpakaWrapper
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@makortel, @fwyzard can you please review it and eventually sign? Thanks.
@missirol this is something you requested to watch as well.
@rappoccio, @antoniovilela, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2024

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4509d9/38609/summary.html
COMMIT: 775a4bc
CMSSW: CMSSW_14_1_X_2024-04-04-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44622/38609/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

Entering library rule at src/HeterogeneousTest/CUDAKernel/plugins
>> Compiling  src/HeterogeneousTest/CUDAKernel/plugins/CUDATestKernelAdditionAlgo.cu
>> Compiling edm plugin src/HeterogeneousTest/CUDAKernel/plugins/CUDATestKernelAdditionModule.cc
>> Cuda Device Link tmp/el8_amd64_gcc12/src/HeterogeneousTest/CUDAKernel/plugins/HeterogeneousTestCUDAKernelPlugins/HeterogeneousTestCUDAKernelPlugins_cudadlink.o 
nvlink error   : Undefined reference to '_ZN3cms8cudatest13add_vectors_fEPKfS2_Pfm' in '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-04-04-1100/static/el8_amd64_gcc12/libHeterogeneousTestCUDAKernel_nv.a:DeviceAdditionKernel.cu_nv.o' (target: sm_60)
gmake: *** [tmp/el8_amd64_gcc12/src/HeterogeneousTest/CUDAKernel/plugins/HeterogeneousTestCUDAKernelPlugins/HeterogeneousTestCUDAKernelPlugins_cudadlink.o] Error 255
>> Building edm plugin tmp/el8_amd64_gcc12/src/HeterogeneousTest/CUDAKernel/plugins/HeterogeneousTestCUDAKernelPlugins/libHeterogeneousTestCUDAKernelPlugins.so
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02831/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.3.1/../../../../x86_64-redhat-linux-gnu/bin/ld.bfd: cannot find tmp/el8_amd64_gcc12/src/HeterogeneousTest/CUDAKernel/plugins/HeterogeneousTestCUDAKernelPlugins/HeterogeneousTestCUDAKernelPlugins_cudadlink.o: No such file or directory
collect2: error: ld returned 1 exit status
gmake: *** [tmp/el8_amd64_gcc12/src/HeterogeneousTest/CUDAKernel/plugins/HeterogeneousTestCUDAKernelPlugins/libHeterogeneousTestCUDAKernelPlugins.so] Error 1
Leaving library rule at src/HeterogeneousTest/CUDAKernel/plugins


@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 5, 2024

Pull request #44622 was updated. @fwyzard, @makortel can you please check and sign again.

The package HeterogeneousTest/AlpakaDevice implements a header only
library that defines Alpaka device-only functions, and a plugin and test
that use them.

The package HeterogeneousTest/AlpakaKernel implements a header only
library that imports device functions from HeterogeneousTest/AlpakaDevice
to define Alpaka kernels, and a plugin and test that use them.

The package HeterogeneousTest/AlpakaWrapper implements a library that
imports kernels from HeterogeneousTest/AlpakaKernel to define and
export host-only wrappers around them, usable by non-Alpaka libraries,
plugins and applications, and implements a plugin and test that use them.

The package HeterogeneousTest/AlpakaOpaque implements a library that
imports kernels from HeterogeneousTest/AlpakaKernel to define and
export host-only wrappers around the whole Alpaka section, usable by
libraries, plugins and applications that are not explicitly Alpaka-aware,
and implements a plugin and test that use them.
@fwyzard fwyzard force-pushed the implement_HeterogeneousTest_Alpaka_141x branch from 18d66a9 to f963c4c Compare April 5, 2024 20:44
@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 5, 2024

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 5, 2024

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 5, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44622/39832

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 5, 2024

Pull request #44622 was updated. can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 5, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4509d9/38640/summary.html
COMMIT: f963c4c
CMSSW: CMSSW_14_1_X_2024-04-05-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44622/38640/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 1226
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 38514
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

@antoniovilela
Copy link
Contributor

@cms-sw/orp-l2 or @makortel, please let me know if you think these unit tests should be backported to CMSSW_14_0_X.

I am not sure I see a need for the backport. Maybe @makortel has some insight. Thanks.

I agree backporting the tests would not be strictly necessary, but maybe having them in 14_0_X would make it easier to ensure we have all the build rule fixes backported to 14_0_X? I don't have strong feeling to either direction.

Thanks. Ok for the backport.

@antoniovilela
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants