-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace remaining cuda::device operations with native CUDA calls. #408
Replace remaining cuda::device operations with native CUDA calls. #408
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments of which 1-2 are relevant for this PR.
@@ -44,8 +44,7 @@ namespace impl { | |||
stream_ = cudautils::getCUDAStreamCache().getCUDAStream(); | |||
} | |||
|
|||
CUDAScopedContextBase::CUDAScopedContextBase(const CUDAProductBase& data) | |||
: currentDevice_(data.device()) { | |||
CUDAScopedContextBase::CUDAScopedContextBase(const CUDAProductBase& data) : currentDevice_(data.device()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: I apparently forgot one scram b code-format
, and this one fixes the formatting. Thanks!
@@ -52,7 +53,7 @@ namespace heterogeneous { | |||
// Create the CUDA stream for this module-edm::Stream pair | |||
auto current_device = cuda::device::current::get(); | |||
cudaStream_ = std::make_unique<cuda::stream_t<>>( | |||
current_device.create_stream(cuda::stream::no_implicit_synchronization_with_default_stream)); | |||
current_device.create_stream(cuda::stream::no_implicit_synchronization_with_default_stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there are changes in this file (not that it matters much though)?
|
||
int CUDAService::getCurrentDevice() const { return cuda::device::current::get().id(); } | ||
int CUDAService::getCurrentDevice() const { return cudautils::currentDevice(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably remove these functions in favor of cudaSetDevice()
and cudautils::currentDevice()
(in a separate PR?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed - could you create an issue so we don't forget ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed - could you create an issue so we don't forget ?
Done #415.
#include <cuda_runtime.h> | ||
|
||
namespace cudautils { | ||
inline int cudaDeviceCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we haven't really discussed about naming conventions, but I'd drop cuda
from the name and call the function just deviceCount()
.
(but @waredjeb please don't change before others have commented as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes make sense to drop cuda
. I'll wait for new comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that deviceCount()
would be fine.
Can you make a follow up PR ?
@@ -191,7 +192,7 @@ void ClusterTPAssociationHeterogeneous::acquireGPUCuda(const edm::HeterogeneousE | |||
edm::Handle<CUDAProduct<TrackingRecHit2DCUDA>> gh; | |||
iEvent.getByToken(tGpuHits, gh); | |||
// temporary check (until the migration) | |||
assert(gd->device() == cuda::device::current::get().id()); | |||
assert(gd->device() == cudautils::currentDevice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is obsoleted by #409 .
I think it is easier for everybody not to apply this change here
I think one should also remove all |
@waredjeb Thanks. Can you confirm that with the last commit the API wrappers are not used anywhere except in |
Validation summaryReference release CMSSW_11_0_0_pre11 at 5b0a828 Validation plots/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW
/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW
/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW
Throughput plots/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53logs and
|
@makortel Well, not at all. DataFormats/Math/test/cudaMathTest.cu, DataFormats/Math/test/cudaAtan2Test.cu (HeterogeneousCore/CUDAUtilities/interface/host_noncached_unique_ptr.h |
@waredjeb Thanks for the list. I'd leave dealing with these to a follow-up PR(s) of this PR. The remaining cases seem to be either soon-obsolete, or This file will be removed once #409 gets merged. The only use seems to be
to be replaced with cudaCheck() (as discussed before, a variant taking a message argument would be nice).
Also here (after this PR) the only call is
to be replace with cudaCheck() .
I don't see API wrappers being used in this file.
|
No impact on physics or throughput (as expected). |
@@ -1,5 +1,4 @@ | |||
// nvcc -O3 CholeskyDecomp_t.cu -Icuda-api-wrappers/src/ --expt-relaxed-constexpr -gencode arch=compute_61,code=sm_61 --compiler-options="-Ofast -march=native" | |||
// add -DDOPROF to run nvprof --metrics all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove this line ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't my intention, thanks for the fix.
@@ -3,8 +3,7 @@ | |||
#include "FWCore/Utilities/interface/Likely.h" | |||
|
|||
#include "getCachingHostAllocator.h" | |||
|
|||
#include <cuda/api_wrappers.h> | |||
#include "cuda/api_wrappers.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If still needed, <cuda/api_wrappers.h>
is more correct than "cuda/api_wrappers.h"
.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.
PR description:
This PR is part of #386. It replaces the usage of
cuda::device::count()
,cuda::device::get()
,cuda::device::set()
andcuda::device::current::get()
with native cuda calls.As in #404
HeterogeneousCore/Product
andHeterogeneousCore/Producer
have not been modified since they will be removed.PR validation:
Code compiles, unit test runs.