CUDAService verbosity #35117

fwyzard · 2021-09-02T00:13:13Z

PR description:

Add the NVIDIA driver, CUDA driver and runtime library versions to the CUDAService message.
Make the CUDAService less verbose by default, with an option to display the full messages, and enable it in the MessageLogger by default.

PR validation:

The default, compact message on a machine with two GPUs now looks like

CUDA runtime version 11.2, driver version 11.4, NVIDIA driver version 470.57.02
CUDA device 0: Tesla T4 (sm_75)
CUDA device 1: Tesla T4 (sm_75)

The full, verbose message on the same machine now looks like

NVIDIA driver:    470.57.02
CUDA driver API:  11.4 (compiled with 11.2)
CUDA runtime API: 11.2 (compiled with 11.2)
CUDA runtime successfully initialised, found 2 compute devices.

CUDA device 0: Tesla T4
  compute capability:          7.5 (sm_75)
  streaming multiprocessors:            40
  CUDA cores:                         2560
  single to double performance:       32:1
  compute mode:           default (shared)
  memory:  15009 MB free /  15109 MB total
  constant memory:                   64 kB
  L2 cache size:                   4096 kB
  L1 cache mode:   local and global memory

Other capabilities
  can map host memory into the CUDA address space for use with cudaHostAlloc()/cudaHostGetDevicePointer()
  does not support coherently accessing pageable memory without calling cudaHostRegister() on it
  cannot access pageable memory via the host's page tables
  can access host registered memory at the same virtual address as the host
  shares a unified address space with the host
  supports allocating managed memory on this system
  can coherently access managed memory concurrently with the host
  the host cannot directly access managed memory on the device without migration
  supports launching cooperative kernels via cudaLaunchCooperativeKernel()
  supports launching cooperative kernels via cudaLaunchCooperativeKernelMultiDevice()

CUDA flags
  thread policy:                   default
  pinned host memory allocations:  enabled
  kernel host memory reuse:       disabled

CUDA limits
  printf buffer size:                 1 MB
  stack size:                         1 kB
  malloc heap size:                   8 MB
  runtime sync depth:                    2
  runtime pending launch count:       2048

CUDA device 1: Tesla T4
  compute capability:          7.5 (sm_75)
  streaming multiprocessors:            40
  CUDA cores:                         2560
  single to double performance:       32:1
  compute mode:           default (shared)
  memory:  15009 MB free /  15109 MB total
  constant memory:                   64 kB
  L2 cache size:                   4096 kB
  L1 cache mode:   local and global memory

Other capabilities
  can map host memory into the CUDA address space for use with cudaHostAlloc()/cudaHostGetDevicePointer()
  does not support coherently accessing pageable memory without calling cudaHostRegister() on it
  cannot access pageable memory via the host's page tables
  can access host registered memory at the same virtual address as the host
  shares a unified address space with the host
  supports allocating managed memory on this system
  can coherently access managed memory concurrently with the host
  the host cannot directly access managed memory on the device without migration
  supports launching cooperative kernels via cudaLaunchCooperativeKernel()
  supports launching cooperative kernels via cudaLaunchCooperativeKernelMultiDevice()

CUDA flags
  thread policy:                   default
  pinned host memory allocations:  enabled
  kernel host memory reuse:       disabled

CUDA limits
  printf buffer size:                 1 MB
  stack size:                         1 kB
  malloc heap size:                   8 MB
  runtime sync depth:                    2
  runtime pending launch count:       2048

CUDAService fully initialized

cmsbuild · 2021-09-02T00:19:53Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35117/25012

This PR adds an extra 28KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35117/25012/code-format.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35117/25012/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

fwyzard · 2021-09-02T06:38:05Z

please test

fwyzard · 2021-09-02T06:39:24Z

please test

cmsbuild · 2021-09-02T06:45:23Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35117/25018

This PR adds an extra 28KB to repository

cmsbuild · 2021-09-02T06:45:50Z

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

Configuration/StandardSequences (operations)
HLTrigger/Configuration (hlt)
HeterogeneousCore/CUDAServices (heterogeneous)

@perrotta, @makortel, @Martin-Grunewald, @fwyzard, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@fabiocos, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @Martin-Grunewald, @lecriste, @mtosi, @ebrondol, @mmusich, @dgulhan, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2021-09-17T08:42:07Z

Finally everything looks good:

%MSG-i CUDAService:  (NoModuleName) 17-Sep-2021 07:48:11 UTC pre-events
CUDA runtime version 11.4, driver version 11.2, NVIDIA driver version 460.27.04
CUDA device 0: Tesla V100S-PCIE-32GB (sm_70)
%MSG

fwyzard · 2021-09-17T08:42:23Z

+heterogeneous

Martin-Grunewald · 2021-09-17T13:06:32Z

+1

perrotta · 2021-09-19T19:03:07Z

@fwyzard @makortel do I understand correctly that this PR depends on #35298, which has to be merged first then.
Otherwise we can test it without #35298 and just merge this one first, if the tests report no issues

fwyzard · 2021-09-19T19:09:20Z

The current implementation should be fine by itself.

fwyzard · 2021-09-19T19:09:34Z

please test

fwyzard · 2021-09-19T19:11:35Z

Once #35298 is merged, we can remove

process.load("FWCore.MessageService.MessageLogger_cfi")

from Configuration/StandardSequences/python/Services_cff.py

cmsbuild · 2021-09-19T23:39:23Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-39073b/18741/summary.html
COMMIT: 1a5a345
CMSSW: CMSSW_12_1_X_2021-09-19-0000/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35117/18741/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19735
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19729
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: no differences found

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 7 differences found in the comparisons
DQMHistoTests: Total files compared: 40
DQMHistoTests: Total histograms compared: 3211080
DQMHistoTests: Total failures: 11
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 3211046
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.004 KiB( 39 files compared)
DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
Checked 169 log files, 37 edm output root files, 40 DQM output files
TriggerResults: no differences found

perrotta · 2021-09-20T06:06:23Z

+1

cmsbuild · 2021-09-20T06:06:47Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

fwyzard changed the title ~~Cuda service verbosity~~ CUDAService verbosity Sep 2, 2021

cmsbuild added this to the CMSSW_12_1_X milestone Sep 2, 2021

cmsbuild added code-checks-pending heterogeneous-pending hlt-pending operations-pending orp-pending pending-signatures tests-pending labels Sep 2, 2021

fwyzard mentioned this pull request Sep 2, 2021

Update to CUDA 11.4.1 cms-sw/cmsdist#7257

Merged

cmsbuild added code-checks-rejected and removed code-checks-pending labels Sep 2, 2021

fwyzard force-pushed the CUDAService_verbosity branch from 39f0e40 to 0147cfe Compare September 2, 2021 06:37

cmsbuild added code-checks-pending and removed code-checks-rejected labels Sep 2, 2021

cmsbuild added tests-started and removed tests-pending labels Sep 2, 2021

fwyzard force-pushed the CUDAService_verbosity branch from 0147cfe to 421f61f Compare September 2, 2021 06:39

cmsbuild added code-checks-approved and removed code-checks-pending labels Sep 2, 2021

fwyzard force-pushed the CUDAService_verbosity branch from 421f61f to c31541a Compare September 2, 2021 08:43

cmsbuild added code-checks-pending tests-pending and removed tests-started code-checks-approved labels Sep 2, 2021

cmsbuild added the tests-approved label Sep 17, 2021

cmsbuild added heterogeneous-approved and removed heterogeneous-pending labels Sep 17, 2021

cmsbuild added hlt-approved and removed hlt-pending labels Sep 17, 2021

cmsbuild mentioned this pull request Sep 18, 2021

Update GPU workflows #35331

Merged

cmsbuild added tests-started and removed requires-external tests-approved labels Sep 19, 2021

cmsbuild added tests-approved and removed tests-started labels Sep 19, 2021

cmsbuild added fully-signed operations-approved orp-approved and removed operations-pending pending-signatures orp-pending labels Sep 20, 2021

cmsbuild merged commit 8e41bcd into cms-sw:master Sep 20, 2021

cmsbuild mentioned this pull request Sep 20, 2021

CT-PPS: Association cut update #35248

Merged

fwyzard deleted the CUDAService_verbosity branch July 31, 2022 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDAService verbosity #35117

CUDAService verbosity #35117

fwyzard commented Sep 2, 2021 •

edited

Loading

cmsbuild commented Sep 2, 2021

fwyzard commented Sep 2, 2021

fwyzard commented Sep 2, 2021

cmsbuild commented Sep 2, 2021

cmsbuild commented Sep 2, 2021

fwyzard commented Sep 17, 2021

fwyzard commented Sep 17, 2021

Martin-Grunewald commented Sep 17, 2021

perrotta commented Sep 19, 2021

fwyzard commented Sep 19, 2021 via email

fwyzard commented Sep 19, 2021 via email

fwyzard commented Sep 19, 2021

cmsbuild commented Sep 19, 2021

perrotta commented Sep 20, 2021

cmsbuild commented Sep 20, 2021

CUDAService verbosity #35117

CUDAService verbosity #35117

Conversation

fwyzard commented Sep 2, 2021 • edited Loading

PR description:

PR validation:

cmsbuild commented Sep 2, 2021

fwyzard commented Sep 2, 2021

fwyzard commented Sep 2, 2021

cmsbuild commented Sep 2, 2021

cmsbuild commented Sep 2, 2021

fwyzard commented Sep 17, 2021

fwyzard commented Sep 17, 2021

Martin-Grunewald commented Sep 17, 2021

perrotta commented Sep 19, 2021

fwyzard commented Sep 19, 2021 via email

fwyzard commented Sep 19, 2021 via email

fwyzard commented Sep 19, 2021

cmsbuild commented Sep 19, 2021

GPU Comparison Summary

Comparison Summary

perrotta commented Sep 20, 2021

cmsbuild commented Sep 20, 2021

fwyzard commented Sep 2, 2021 •

edited

Loading