Fix uploading the EventSetup conditions to multiple CUDA devices #34948

fwyzard · 2021-08-19T09:30:23Z

PR description:

When multiple CUDA devices are available, each CUDA EventSetup object uses a different CUDA event to keep track if the conditions have been uploaded to each device. However, all events were associated to the default device, instead of the correct one, leading to a runtime error when multiple devices were used.

These changes associate to the correct CUDA device the events used to track if the conditions have been transferred.

PR validation:

Without this PR, running any GPU-enabled job on a system with multiple GPUs would fail with the error

----- Begin Fatal Exception 19-Aug-2021 10:35:57 CEST-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 6 stream: 1
   [1] Running path 'dqmoffline_step'
   [2] Prefetching for module PrimaryVertexMonitor/'pixelPVMonitor'
   [3] Prefetching for module PixelVertexProducerFromSoA/'pixelVertices'
   [4] Prefetching for module PixelVertexSoAFromCUDA/'pixelVerticesSoA@cuda'
   [5] Prefetching for module PixelVertexProducerCUDA/'pixelVerticesCUDA'
   [6] Prefetching for module CAHitNtupletCUDA/'pixelTracksCUDA'
   [7] Prefetching for module SiPixelRecHitCUDA/'siPixelRecHitsPreSplittingCUDA'
   [8] Calling method for module SiPixelRawToClusterCUDA/'siPixelClustersPreSplittingCUDA'
Exception Message:
A std::exception was thrown.

/build/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/28bc67141038e9b7c17a74564c8e6173/opt/cmssw/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_X_2021-08-15-0000/src/HeterogeneousCore/CUDACore/interface/ESProduct.h, line 80:
cudaCheck(cudaEventRecord(data.m_event.get(), cudaStream));
cudaErrorInvalidResourceHandle: invalid resource handle
----- End Fatal Exception -------------------------------------------------

With this PR, these jobs complete correctly.
For example, step 3 from runTheMatrix.py -l 10824.502 -e -t4 --what gpu:

$ cudaComputeCapabilities 
   0     6.1    GeForce GTX 1080 Ti
   1     6.1    GeForce GTX 1080 Ti
   2     6.1    GeForce GTX 1080 Ti
   3     6.1    GeForce GTX 1080 Ti
   4     6.1    GeForce GTX 1080 Ti
   5     6.1    GeForce GTX 1080 Ti
   6     6.1    GeForce GTX 1080 Ti
   7     6.1    GeForce GTX 1080 Ti

$ cmsRun step3_RAW2DIGI_RECO_VALIDATION_
DQM.py
%MSG-i ThreadStreamSetup:  (NoModuleName) 19-Aug-2021 10:59:19 CEST pre-events
setting # threads 4
setting # streams 4
%MSG
19-Aug-2021 10:59:32 CEST  Initiating request to open file file:step2.root
19-Aug-2021 10:59:33 CEST  Successfully opened file file:step2.root
Begin processing the 1st record. Run 1, Event 6, LumiSection 1 on stream 0 at 19-Aug-2021 10:59:40.893 CEST
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 1 at 19-Aug-2021 10:59:40.918 CEST
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 3 at 19-Aug-2021 10:59:40.918 CEST
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 2 at 19-Aug-2021 10:59:40.937 CEST
Begin processing the 5th record. Run 1, Event 1, LumiSection 1 on stream 2 at 19-Aug-2021 10:59:46.703 CEST
Begin processing the 6th record. Run 1, Event 5, LumiSection 1 on stream 1 at 19-Aug-2021 10:59:46.813 CEST
Begin processing the 7th record. Run 1, Event 8, LumiSection 1 on stream 0 at 19-Aug-2021 10:59:46.851 CEST
Begin processing the 8th record. Run 1, Event 9, LumiSection 1 on stream 3 at 19-Aug-2021 10:59:46.947 CEST
Begin processing the 9th record. Run 1, Event 7, LumiSection 1 on stream 2 at 19-Aug-2021 10:59:47.269 CEST
Begin processing the 10th record. Run 1, Event 10, LumiSection 1 on stream 3 at 19-Aug-2021 10:59:47.469 CEST
19-Aug-2021 10:59:48 CEST  Closed file file:step2.root

If this PR is a backport please specify the original PR and why you need to backport that PR:

These changes should be backported to CMSSW_12_0_X.
These changes should be backported to CMSSW_11_3_X if there are plans to use that release cycle for more online tests.

fwyzard · 2021-08-19T09:35:27Z

@makortel, what do you think of the changes ?

Some other possibilities could be:

do not make changes to cms::cuda::ScopedSetDevice, and set/reset manually the device in cms::cuda::ESProduct;
only add the default constructor to cms::cuda::ScopedSetDevice, and set the device manually in cms::cuda::ESProduct;
add an overload to cms::cuda::EventCache.get() that takes the desired device, cms::cuda::EventCache.get(int device) and uses that instead of the current device.

fwyzard · 2021-08-19T09:35:33Z

please test

cmsbuild · 2021-08-19T09:37:33Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34948/24769

This PR adds an extra 20KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34948/24769/code-format.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34948/24769/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

fwyzard · 2021-08-19T09:42:07Z

For what is worth, I explicitly disagree that these changes required by clang-format

diff --git a/HeterogeneousCore/CUDAUtilities/interface/ScopedSetDevice.h b/HeterogeneousCore/CUDAUtilities/interface/ScopedSetDevice.h
index d11baa0191d4..0cf740af8f71 100644
--- a/HeterogeneousCore/CUDAUtilities/interface/ScopedSetDevice.h
+++ b/HeterogeneousCore/CUDAUtilities/interface/ScopedSetDevice.h
@@ -10,14 +10,10 @@ namespace cms {
     class ScopedSetDevice {
     public:
       // Store the original device, without setting a new one
-      ScopedSetDevice() {
-        cudaCheck(cudaGetDevice(&originalDevice_));
-      }
+      ScopedSetDevice() { cudaCheck(cudaGetDevice(&originalDevice_)); }
 
       // Store the original device, and set a new current device
-      explicit ScopedSetDevice(int device) : ScopedSetDevice() {
-        set(device);
-      }
+      explicit ScopedSetDevice(int device) : ScopedSetDevice() { set(device); }
 
       // Restore the original device
       ~ScopedSetDevice() {
@@ -29,9 +25,7 @@ namespace cms {
 
       // Set a new current device, without changing the original device
       // that will be restored when this object is destroyed
-      void set(int device) {
-        cudaCheck(cudaSetDevice(device));
-      }
+      void set(int device) { cudaCheck(cudaSetDevice(device)); }
 
     private:
       int originalDevice_;

make the code more readable and maintainable.

Quite the opposite I would say.

Add a default constructor that stores the current devices, without changing it. Add a set() method to change the current device, without affecting the stored value for the original device.

Associate to the correct CUDA device the events used to track if the conditions have been transferred to each device.

fwyzard · 2021-08-19T09:45:18Z

please test

cmsbuild · 2021-08-19T09:52:26Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34948/24770

This PR adds an extra 20KB to repository

cmsbuild · 2021-08-19T09:52:52Z

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

HeterogeneousCore/CUDACore (heterogeneous)
HeterogeneousCore/CUDAUtilities (heterogeneous)

@makortel, @fwyzard can you please review it and eventually sign? Thanks.
@makortel, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2021-08-19T10:21:52Z

type bugfix

cmsbuild · 2021-08-19T13:21:53Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-86065e/17884/summary.html
COMMIT: 097fe4a
CMSSW: CMSSW_12_1_X_2021-08-18-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34948/17884/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 39
DQMHistoTests: Total histograms compared: 3000352
DQMHistoTests: Total failures: 0
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3000330
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
Checked 165 log files, 37 edm output root files, 39 DQM output files
TriggerResults: no differences found

makortel · 2021-08-19T14:09:14Z

Thanks @fwyzard for the fix!

what do you think of the changes ?

I agree with the changes, but let me think out loud below.

Some other possibilities could be:

do not make changes to cms::cuda::ScopedSetDevice, and set/reset manually the device in cms::cuda::ESProduct;

I think it is better for the ESProduct constructor to set the current device back to original. I'd expect majority of the CUDA ESProducers to not call any CUDA API calls explicitly in their produce() functions (I didn't check, could easily be wrong), but if they would, I'd certainly expect the construction of the ESProduct to not lead to the current device being changed.

With that requirement I think it is better to use the ScopedSetDevice than call cudaGetDevice()/cudaSetDevice() explicitly.

only add the default constructor to cms::cuda::ScopedSetDevice, and set the device manually in cms::cuda::ESProduct;

I think I like more the device setting going through ScopedSetDevice.

add an overload to cms::cuda::EventCache.get() that takes the desired device, cms::cuda::EventCache.get(int device) and uses that instead of the current device.

That would mean that the EventCache::get() would internally use ScopedSetDevice that would lead to more calls to cudaSetDevice() (via ScopedSetDevice constructor). Probably not a big deal in practice though.

The ScopedContext classes are the other use case EventCache, and they rely on the "current device" idiom already (although in principle I'd like to pass around the device explicitly).

makortel · 2021-08-19T14:09:57Z

enable gpu

makortel · 2021-08-19T14:10:04Z

@cmsbuild, please test

fwyzard · 2021-08-19T14:14:49Z

what do you think of the changes ?

I agree with the changes, but let me think out loud below.

👍
I agree with your conclusions.

fwyzard · 2021-08-19T14:29:15Z

+heterogeneous

cmsbuild · 2021-08-19T14:29:38Z

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

cmsbuild · 2021-08-19T17:25:06Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-86065e/17893/summary.html
COMMIT: 097fe4a
CMSSW: CMSSW_12_1_X_2021-08-19-1100/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34948/17893/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19735
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19729
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: no differences found

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 8 differences found in the comparisons
DQMHistoTests: Total files compared: 39
DQMHistoTests: Total histograms compared: 3000352
DQMHistoTests: Total failures: 12
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3000318
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
Checked 165 log files, 37 edm output root files, 39 DQM output files
TriggerResults: no differences found

perrotta · 2021-08-19T21:23:49Z

+1

makortel · 2021-08-20T18:50:25Z

HeterogeneousCore/CUDACore/interface/ESProduct.h

@@ -19,23 +20,24 @@ namespace cms {
    class ESProduct {
    public:
      ESProduct() : gpuDataPerDevice_(numberOfDevices()) {
+        cms::cuda::ScopedSetDevice scopedDevice;


This needs to be protected for 0 devices, I'll prepare a fix.

On the other hand, I need to re-educate myself why exactly this happens.

On the other hand, I need to re-educate myself why exactly this happens.

Seems that the PixelCPEFast ESProduct is used for both GPU modules and CPU modules, and it holds cms::cuda::ESProduct<...> as a member. This should really be refactored, but it might be most efficient to leave that to the time we have a pattern for issuing the CUDA operations from the ESProducer instead of the ESProduct (hopefully later this fall).

cmsbuild added this to the CMSSW_12_1_X milestone Aug 19, 2021

cmsbuild added code-checks-pending heterogeneous-pending orp-pending pending-signatures tests-pending labels Aug 19, 2021

cmsbuild added tests-started and removed tests-pending labels Aug 19, 2021

cmsbuild added code-checks-rejected and removed code-checks-pending labels Aug 19, 2021

fwyzard added 2 commits August 19, 2021 11:44

Extend the cms::cuda::ScopedSetDevice interface

1d940d4

Add a default constructor that stores the current devices, without changing it. Add a set() method to change the current device, without affecting the stored value for the original device.

Fix uploading the EventSetup conditions to multiple CUDA devices

097fe4a

Associate to the correct CUDA device the events used to track if the conditions have been transferred to each device.

fwyzard force-pushed the CUDA_fix_concurrent_EventSetup_filling branch from 65d2aac to 097fe4a Compare August 19, 2021 09:45

cmsbuild added code-checks-pending and removed code-checks-rejected labels Aug 19, 2021

cmsbuild added code-checks-approved and removed code-checks-pending labels Aug 19, 2021

cmsbuild added the bug-fix label Aug 19, 2021

This was referenced Aug 19, 2021

Fix uploading the EventSetup conditions to multiple CUDA devices [12.0.x] #34949

Merged

Fix uploading the EventSetup conditions to multiple CUDA devices [11.3.x] #34950

Merged

cmsbuild removed the tests-started label Aug 19, 2021

cmsbuild added the tests-approved label Aug 19, 2021

cmsbuild added tests-started and removed tests-approved labels Aug 19, 2021

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Aug 19, 2021

cmsbuild added tests-approved and removed tests-started labels Aug 19, 2021

cmsbuild added orp-approved and removed orp-pending labels Aug 19, 2021

cmsbuild merged commit d653ec0 into cms-sw:master Aug 19, 2021

This was referenced Aug 20, 2021

[BuildRules] Added checks for private header usage cms-sw/cmsdist#7221

Merged

initialize TrackstersProducer::iterIndex_ to 0 #34962

Merged

makortel reviewed Aug 20, 2021

View reviewed changes

makortel mentioned this pull request Aug 20, 2021

Restore cms::cuda::ESProduct construction without GPU #34968

Merged

makortel mentioned this pull request Sep 13, 2021

Fix uploading the EventSetup conditions to multiple CUDA devices cms-patatrack/pixeltrack-standalone#221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix uploading the EventSetup conditions to multiple CUDA devices #34948

Fix uploading the EventSetup conditions to multiple CUDA devices #34948

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

makortel commented Aug 19, 2021

makortel commented Aug 19, 2021

makortel commented Aug 19, 2021

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

perrotta commented Aug 19, 2021

makortel Aug 20, 2021

makortel Aug 20, 2021

Fix uploading the EventSetup conditions to multiple CUDA devices #34948

Fix uploading the EventSetup conditions to multiple CUDA devices #34948

Conversation

fwyzard commented Aug 19, 2021

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR:

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

Comparison Summary

makortel commented Aug 19, 2021

makortel commented Aug 19, 2021

makortel commented Aug 19, 2021

fwyzard commented Aug 19, 2021

fwyzard commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

cmsbuild commented Aug 19, 2021

GPU Comparison Summary

Comparison Summary

perrotta commented Aug 19, 2021

makortel Aug 20, 2021

Choose a reason for hiding this comment

makortel Aug 20, 2021

Choose a reason for hiding this comment