Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace remaining cuda::device operations with native CUDA calls. #408

Merged
merged 9 commits into from
Nov 26, 2019
Merged

Replace remaining cuda::device operations with native CUDA calls. #408

merged 9 commits into from
Nov 26, 2019

Conversation

waredjeb
Copy link

@waredjeb waredjeb commented Nov 9, 2019

PR description:

This PR is part of #386. It replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native cuda calls.

As in #404 HeterogeneousCore/Product and HeterogeneousCore/Producer have not been modified since they will be removed.

PR validation:

Code compiles, unit test runs.

Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments of which 1-2 are relevant for this PR.

@@ -44,8 +44,7 @@ namespace impl {
stream_ = cudautils::getCUDAStreamCache().getCUDAStream();
}

CUDAScopedContextBase::CUDAScopedContextBase(const CUDAProductBase& data)
: currentDevice_(data.device()) {
CUDAScopedContextBase::CUDAScopedContextBase(const CUDAProductBase& data) : currentDevice_(data.device()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: I apparently forgot one scram b code-format, and this one fixes the formatting. Thanks!

@@ -52,7 +53,7 @@ namespace heterogeneous {
// Create the CUDA stream for this module-edm::Stream pair
auto current_device = cuda::device::current::get();
cudaStream_ = std::make_unique<cuda::stream_t<>>(
current_device.create_stream(cuda::stream::no_implicit_synchronization_with_default_stream));
current_device.create_stream(cuda::stream::no_implicit_synchronization_with_default_stream));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there are changes in this file (not that it matters much though)?


int CUDAService::getCurrentDevice() const { return cuda::device::current::get().id(); }
int CUDAService::getCurrentDevice() const { return cudautils::currentDevice(); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably remove these functions in favor of cudaSetDevice() and cudautils::currentDevice() (in a separate PR?).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - could you create an issue so we don't forget ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - could you create an issue so we don't forget ?

Done #415.

#include <cuda_runtime.h>

namespace cudautils {
inline int cudaDeviceCount() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we haven't really discussed about naming conventions, but I'd drop cuda from the name and call the function just deviceCount().

(but @waredjeb please don't change before others have commented as well)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes make sense to drop cuda. I'll wait for new comments!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that deviceCount() would be fine.
Can you make a follow up PR ?

@@ -191,7 +192,7 @@ void ClusterTPAssociationHeterogeneous::acquireGPUCuda(const edm::HeterogeneousE
edm::Handle<CUDAProduct<TrackingRecHit2DCUDA>> gh;
iEvent.getByToken(tGpuHits, gh);
// temporary check (until the migration)
assert(gd->device() == cuda::device::current::get().id());
assert(gd->device() == cudautils::currentDevice());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is obsoleted by #409 .
I think it is easier for everybody not to apply this change here

@VinInn
Copy link

VinInn commented Nov 22, 2019

I think one should also remove all
#include <cuda/api_wrappers.h>
and
the corresponding use from the BuildFiles

@makortel
Copy link

@waredjeb Thanks. Can you confirm that with the last commit the API wrappers are not used anywhere except in HeterogeneousCore/{Producer,Product} packages?

@fwyzard
Copy link

fwyzard commented Nov 26, 2019

Validation summary

Reference release CMSSW_11_0_0_pre11 at 5b0a828
Development branch CMSSW_11_0_X_Patatrack at 614ee0b
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.86452.png
zoom-136.86452.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/b99f8f9fd113ed0fe1ac14de1f0bb92e8eb70326/log .

@waredjeb
Copy link
Author

waredjeb commented Nov 26, 2019

@waredjeb Thanks. Can you confirm that with the last commit the API wrappers are not used anywhere except in HeterogeneousCore/{Producer,Product} packages?

@makortel Well, not at all. DataFormats/Math/test/cudaMathTest.cu, DataFormats/Math/test/cudaAtan2Test.cu
, HeterogeneousCore/CUDACore/interface/GPUCuda.h, HeterogeneousCore/CUDACore/interface/CUDAScopedContext.h, HeterogeneousCore/CUDACore/interface/CUDAESProduct.h, and some files in CUDAUtilities still have the API Wrappers.

(HeterogeneousCore/CUDAUtilities/interface/host_noncached_unique_ptr.h
HeterogeneousCore/CUDAUtilities/interface/launch.h
HeterogeneousCore/CUDAUtilities/src/allocate_host.cc
HeterogeneousCore/CUDAUtilities/src/allocate_device.cc
HeterogeneousCore/CUDAUtilities/BuildFile.xml)
I have errors when I try to remove the api-wrapper.

@makortel
Copy link

@waredjeb Thanks for the list. I'd leave dealing with these to a follow-up PR(s) of this PR. The remaining cases seem to be either soon-obsolete, or cuda::throw_if_error() (more details below).

HeterogeneousCore/CUDACore/interface/GPUCuda.h

This file will be removed once #409 gets merged.

HeterogeneousCore/CUDACore/interface/CUDAScopedContext.h

The only use seems to be

cuda::throw_if_error(ret, "Failed to make a stream to wait for an event");

to be replaced with cudaCheck() (as discussed before, a variant taking a message argument would be nice).

HeterogeneousCore/CUDACore/interface/CUDAESProduct.h

Also here (after this PR) the only call is

cuda::throw_if_error(ret, "Failed to make a stream to wait for an event");

to be replace with cudaCheck().

and some files in CUDAUtilities still have the API Wrappers.

(HeterogeneousCore/CUDAUtilities/interface/host_noncached_unique_ptr.h

cuda::throw_if_error() -> cudaCheck()

HeterogeneousCore/CUDAUtilities/interface/launch.h

I don't see API wrappers being used in this file.

HeterogeneousCore/CUDAUtilities/src/allocate_host.cc
HeterogeneousCore/CUDAUtilities/src/allocate_device.cc

cuda::throw_if_error() -> cudaCheck() in both

@fwyzard
Copy link

fwyzard commented Nov 26, 2019

No impact on physics or throughput (as expected).

@@ -1,5 +1,4 @@
// nvcc -O3 CholeskyDecomp_t.cu -Icuda-api-wrappers/src/ --expt-relaxed-constexpr -gencode arch=compute_61,code=sm_61 --compiler-options="-Ofast -march=native"
// add -DDOPROF to run nvprof --metrics all
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove this line ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't my intention, thanks for the fix.

@@ -3,8 +3,7 @@
#include "FWCore/Utilities/interface/Likely.h"

#include "getCachingHostAllocator.h"

#include <cuda/api_wrappers.h>
#include "cuda/api_wrappers.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If still needed, <cuda/api_wrappers.h> is more correct than "cuda/api_wrappers.h".

@fwyzard fwyzard merged commit 9ff6ec5 into cms-patatrack:CMSSW_11_0_X_Patatrack Nov 26, 2019
@waredjeb waredjeb deleted the replace_cudaDevice branch November 27, 2019 09:31
fwyzard pushed a commit that referenced this pull request Oct 7, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 7, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard added a commit that referenced this pull request Nov 27, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard added a commit that referenced this pull request Nov 28, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard added a commit that referenced this pull request Dec 26, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard added a commit that referenced this pull request Dec 26, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Jan 13, 2021
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
fwyzard pushed a commit that referenced this pull request Jan 15, 2021
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants