Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cms::cuda::ESProduct for zero devices #29061

Merged
merged 2 commits into from
Mar 2, 2020

Conversation

makortel
Copy link
Contributor

PR description:

This PR

  • renames numberOfCUDADevices() function to cms::cuda::numberOfDevices() to follow the convention established in Tools for CUDA modules #28537 (somehow this function slipped through then)
  • enables cms::cuda::ESProduct to be constructed on a machine without GPUs

PR validation:

Unit tests run.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29061/13945

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

HeterogeneousCore/CUDACore
HeterogeneousCore/CUDAServices

@makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks.
@davidlange6, @silviodonato, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

test parameters
enable_tests = gpu

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 28, 2020

The tests are being triggered in jenkins.
Test Parameters:

@makortel
Copy link
Contributor Author

@fwyzard Please review.

I was trying to run the patatrack pixel tracking on CPU on a machine without GPUs, and it failed because some EventSetup stuff (IIRC it was PixelCPEFast) tried to construct cms::cuda::ESProduct() that tried to ask the number of CUDA devices and threw an exception. Let me know if you'd prefer this PR to be cherry-picked to cms-patatrack/cmssw.

@cmsbuild
Copy link
Contributor

+1
Tested at: e752b87
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0b1657/4936/summary.html
CMSSW: CMSSW_11_1_X_2020-02-28-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0b1657/4936/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2679707
  • DQMHistoTests: Total failures: 40
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2679348
  • DQMHistoTests: Total skipped: 319
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2020

Going over at the code it looks good to me.
I'm fine with getting the information from the CUDAService, but I thought from the framework side you wanted to reduce the reliance on it ?

By the way, do we have any other uses of numberOfCUDADevices , possibly in the Patatrack branch ?
I was surprised that I didn't find any.

Let me know if you'd prefer this PR to be cherry-picked to cms-patatrack/cmssw.

No, if this enters pre4 I'll pick it up when I update Patatrack to it.

@makortel
Copy link
Contributor Author

makortel commented Mar 2, 2020

I'm fine with getting the information from the CUDAService, but I thought from the framework side you wanted to reduce the reliance on it ?

Yeah, this PR is sort-of an quick&easy(&dirty?) fix to get the default constructor working without devices.

The ESProducts in device memory can be done in a better way that does not require device-targeting ESProducts to be constructed for a CPU-only job. So eventually this behavior will go away.

By the way, do we have any other uses of numberOfCUDADevices , possibly in the Patatrack branch ?

We do not. The cms::cuda::chooseDevice() (used by cms::cuda::ScopedContext) calls CUDAService::numberOfDevices() directly, so it is effectively the same.

No, if this enters pre4 I'll pick it up when I update Patatrack to it.

Let's try that then.

@makortel
Copy link
Contributor Author

makortel commented Mar 2, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @silviodonato, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

About

DQMHistoTests: Total failures: 40

see #29076

@cmsbuild cmsbuild merged commit 88badf0 into cms-sw:master Mar 2, 2020
@makortel makortel deleted the fixCUDAESProduct branch March 3, 2020 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants