Roll back to CUDA 12.1.1 [gcc 12] #8527

smuzaffar · 2023-05-30T05:48:33Z

Reverts #8515

cmsbuild · 2023-05-30T05:48:58Z

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_13_2_X/g12.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

cmsbuild · 2023-05-30T05:50:13Z

Pull request #8527 was updated.

smuzaffar · 2023-05-30T05:50:45Z

test parameters:

full_cmssw = true
enable = gpu

smuzaffar · 2023-05-30T05:50:55Z

please test

fwyzard · 2023-05-30T05:55:29Z

Do we gain anything from CUDA 12.1 over 12.0 ?
If not, I would prefer to stay with 12.0 and keep the --src-in-ptx option.

smuzaffar · 2023-05-30T06:16:28Z

@fwyzard , I am just testing that if cuda 12.1.1 and no --src-in-ptx flag allow us to move forward with gcc12/cuda12. Note that cuda 12.0.1 with gcc12 has build errors

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc12/CMSSW_13_2_X_2023-05-29-1100/RecoLocalTracker/SiStripClusterizer

gcc/x86_64-redhat-linux-gnu/12.2.1/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type

cmsbuild · 2023-05-30T12:10:22Z

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-02c814/32849/summary.html
COMMIT: c852f8a
CMSSW: CMSSW_13_2_X_2023-05-29-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8527/32849/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-02c814/32849/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-02c814/32849/git-merge-result

Build

I found compilation error when building:

>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HFNoseRawToDigiFake.cc
>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HGCalRawToDigi.cc
>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HGCalRawToDigiFake.cc
>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HGCalSlinkEmulator.cc
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HGCalRawToDigi.cc: In member function 'virtual void HGCalRawToDigi::produce(edm::Event&, const edm::EventSetup&)':
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HGCalRawToDigi.cc:68:37: error: comparing the result of pointer addition '(ptr + ((sizetype)i))' and NULL [-Werror=address]
   68 |       data_32bit.emplace_back(((ptr + i) ? ((*(ptr + i) & 0xff) << 0) : 0) +
      |                                ~~~~~^~~~
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_2_X_2023-05-29-1100/src/EventFilter/HGCalRawToDigi/plugins/HGCalRawToDigi.cc:69:41: error: comparing the result of pointer addition '(ptr + (((sizetype)i) + 1))' and NULL [-Werror=address]
   69 |                               ((ptr + i + 1) ? ((*(ptr + i + 1) & 0xff) << 8) : 0) +
      |                                ~~~~~~~~~^~~~

smuzaffar · 2023-05-30T12:30:09Z

even removing --source-in-ptx does not work for cuda12/gcc12 due to error [a] (which is there for both cuda 12.0.1 and 12.1.1)

[a]

>> Compiling  RecoLocalTracker/SiStripClusterizer/plugins/SiStripRawToClusterGPUKernel.cu
gcc/12.2.1/lib/gcc/x86_64-redhat-linux-gnu/12.2.1/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
                  __v8hf __attribute__ ((__vector_size__ (16)));

fwyzard · 2023-05-30T21:38:53Z

hi @smuzaffar , as a workaround you can add

#undef _Float16

right after

#include <cub/cub.cuh>

in RecoLocalTracker/SiStripClusterizer/plugins/SiStripRawToClusterGPUKernel.cu.

While this should be safe for us (we don't use _Float16 in SiStripRawToClusterGPUKernel.cu) I do not know if this can have any consequences in general.

I'll open a bug report with NVIDIA.

fwyzard · 2023-05-31T07:11:11Z

See

fwyzard · 2023-05-31T08:45:56Z

After feedback from NVIDIA, the workaround should be safe, see cms-sw/cmssw#41819 .

In the meantime, a fix is being prepared upstream, see NVIDIA/libcudacxx#476 .

smuzaffar · 2023-05-31T09:59:10Z

thanks a lot @fwyzard for following this up. As you like to stay with cuda 12.0.1 with --source-in-ptx so I am closing this PR

jrhemstad · 2023-05-31T11:14:40Z

@fwyzard I wanted to go out of my way to thank you for providing such an excellent bug triage and reproducer in https://github.com/fwyzard/nvidia_bug_4139266.

We had been trying to track this down from other bug reports and it would have taken me forever to figure out the _Float16 macro conflict.

If only every bug report and reproducer was this good 😉.

Revert "Roll back to CUDA 12.0.1 [gcc 12]"

0cf5964

cmsbuild added tests-pending externals-pending pending-signatures orp-pending labels May 30, 2023

Update cuda-flags.file: drop --source-in-ptx

c852f8a

cmsbuild added tests-started and removed tests-pending labels May 30, 2023

smuzaffar mentioned this pull request May 30, 2023

[GCC] Update to GCC 12.3.1 #8526

Merged

cmsbuild added tests-rejected and removed tests-started labels May 30, 2023

smuzaffar closed this May 31, 2023

smuzaffar deleted the revert-8515-IB/CMSSW_13_2_X/g12_CUDA_12.0.1 branch June 1, 2023 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roll back to CUDA 12.1.1 [gcc 12] #8527

Roll back to CUDA 12.1.1 [gcc 12] #8527

smuzaffar commented May 30, 2023

cmsbuild commented May 30, 2023

cmsbuild commented May 30, 2023

smuzaffar commented May 30, 2023

smuzaffar commented May 30, 2023

fwyzard commented May 30, 2023

smuzaffar commented May 30, 2023

cmsbuild commented May 30, 2023

smuzaffar commented May 30, 2023 •

edited

Loading

fwyzard commented May 30, 2023

fwyzard commented May 31, 2023

fwyzard commented May 31, 2023

smuzaffar commented May 31, 2023

jrhemstad commented May 31, 2023

Roll back to CUDA 12.1.1 [gcc 12] #8527

Roll back to CUDA 12.1.1 [gcc 12] #8527

Conversation

smuzaffar commented May 30, 2023

cmsbuild commented May 30, 2023

cmsbuild commented May 30, 2023

smuzaffar commented May 30, 2023

smuzaffar commented May 30, 2023

fwyzard commented May 30, 2023

smuzaffar commented May 30, 2023

cmsbuild commented May 30, 2023

Build

smuzaffar commented May 30, 2023 • edited Loading

fwyzard commented May 30, 2023

fwyzard commented May 31, 2023

fwyzard commented May 31, 2023

smuzaffar commented May 31, 2023

jrhemstad commented May 31, 2023

smuzaffar commented May 30, 2023 •

edited

Loading