Replace remaining cuda::device operations with native CUDA calls. #408

waredjeb · 2019-11-09T11:13:43Z

PR description:

This PR is part of #386. It replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native cuda calls.

As in #404 HeterogeneousCore/Product and HeterogeneousCore/Producer have not been modified since they will be removed.

PR validation:

Code compiles, unit test runs.

…udautils calls

makortel

Some comments of which 1-2 are relevant for this PR.

makortel · 2019-11-11T20:29:54Z

HeterogeneousCore/CUDACore/src/CUDAScopedContext.cc

@@ -44,8 +44,7 @@ namespace impl {
    stream_ = cudautils::getCUDAStreamCache().getCUDAStream();
  }

-  CUDAScopedContextBase::CUDAScopedContextBase(const CUDAProductBase& data)
-      : currentDevice_(data.device()) {
+  CUDAScopedContextBase::CUDAScopedContextBase(const CUDAProductBase& data) : currentDevice_(data.device()) {


Note to self: I apparently forgot one scram b code-format, and this one fixes the formatting. Thanks!

makortel · 2019-11-11T20:30:29Z

HeterogeneousCore/CUDACore/src/GPUCuda.cc

@@ -52,7 +53,7 @@ namespace heterogeneous {
    // Create the CUDA stream for this module-edm::Stream pair
    auto current_device = cuda::device::current::get();
    cudaStream_ = std::make_unique<cuda::stream_t<>>(
-        current_device.create_stream(cuda::stream::no_implicit_synchronization_with_default_stream));
+         current_device.create_stream(cuda::stream::no_implicit_synchronization_with_default_stream));


Why there are changes in this file (not that it matters much though)?

makortel · 2019-11-11T20:33:42Z

HeterogeneousCore/CUDAServices/src/CUDAService.cc


-int CUDAService::getCurrentDevice() const { return cuda::device::current::get().id(); }
+int CUDAService::getCurrentDevice() const { return cudautils::currentDevice(); }


We should probably remove these functions in favor of cudaSetDevice() and cudautils::currentDevice() (in a separate PR?).

agreed - could you create an issue so we don't forget ?

agreed - could you create an issue so we don't forget ?

Done #415.

makortel · 2019-11-11T20:36:54Z

HeterogeneousCore/CUDAUtilities/interface/cudaDeviceCount.h

+#include <cuda_runtime.h>
+
+namespace cudautils {
+  inline int cudaDeviceCount() {


I know we haven't really discussed about naming conventions, but I'd drop cuda from the name and call the function just deviceCount().

(but @waredjeb please don't change before others have commented as well)

Yes make sense to drop cuda. I'll wait for new comments!

Agreed that deviceCount() would be fine.
Can you make a follow up PR ?

VinInn · 2019-11-17T14:34:21Z

SimTracker/TrackerHitAssociation/plugins/ClusterTPAssociationHeterogeneous.cc

@@ -191,7 +192,7 @@ void ClusterTPAssociationHeterogeneous::acquireGPUCuda(const edm::HeterogeneousE
  edm::Handle<CUDAProduct<TrackingRecHit2DCUDA>> gh;
  iEvent.getByToken(tGpuHits, gh);
  // temporary check (until the migration)
-  assert(gd->device() == cuda::device::current::get().id());
+  assert(gd->device() == cudautils::currentDevice());


this is obsoleted by #409 .
I think it is easier for everybody not to apply this change here

VinInn · 2019-11-22T14:57:35Z

I think one should also remove all
#include <cuda/api_wrappers.h>
and
the corresponding use from the BuildFiles

makortel · 2019-11-26T14:26:22Z

@waredjeb Thanks. Can you confirm that with the last commit the API wrappers are not used anywhere except in HeterogeneousCore/{Producer,Product} packages?

fwyzard · 2019-11-26T15:09:29Z

Validation summary

Reference release CMSSW_11_0_0_pre11 at 5b0a828
Development branch CMSSW_11_0_X_Patatrack at 614ee0b
Testing PRs:

Replace remaining cuda::device operations with native CUDA calls. #408 at 2a1084e

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/b99f8f9fd113ed0fe1ac14de1f0bb92e8eb70326/log .

waredjeb · 2019-11-26T15:14:12Z

@waredjeb Thanks. Can you confirm that with the last commit the API wrappers are not used anywhere except in HeterogeneousCore/{Producer,Product} packages?

@makortel Well, not at all. DataFormats/Math/test/cudaMathTest.cu, DataFormats/Math/test/cudaAtan2Test.cu
, HeterogeneousCore/CUDACore/interface/GPUCuda.h, HeterogeneousCore/CUDACore/interface/CUDAScopedContext.h, HeterogeneousCore/CUDACore/interface/CUDAESProduct.h, and some files in CUDAUtilities still have the API Wrappers.

(HeterogeneousCore/CUDAUtilities/interface/host_noncached_unique_ptr.h
HeterogeneousCore/CUDAUtilities/interface/launch.h
HeterogeneousCore/CUDAUtilities/src/allocate_host.cc
HeterogeneousCore/CUDAUtilities/src/allocate_device.cc
HeterogeneousCore/CUDAUtilities/BuildFile.xml)
I have errors when I try to remove the api-wrapper.

makortel · 2019-11-26T15:31:46Z

@waredjeb Thanks for the list. I'd leave dealing with these to a follow-up PR(s) of this PR. The remaining cases seem to be either soon-obsolete, or cuda::throw_if_error() (more details below).

HeterogeneousCore/CUDACore/interface/GPUCuda.h

This file will be removed once #409 gets merged.

HeterogeneousCore/CUDACore/interface/CUDAScopedContext.h

The only use seems to be

cmssw/HeterogeneousCore/CUDACore/src/CUDAScopedContext.cc

Line 83 in 614ee0b

cuda::throw_if_error(ret, "Failed to make a stream to wait for an event");

to be replaced with cudaCheck() (as discussed before, a variant taking a message argument would be nice).

HeterogeneousCore/CUDACore/interface/CUDAESProduct.h

Also here (after this PR) the only call is

cmssw/HeterogeneousCore/CUDACore/interface/CUDAESProduct.h

Line 62 in 614ee0b

cuda::throw_if_error(ret, "Failed to make a stream to wait for an event");

to be replace with cudaCheck().

and some files in CUDAUtilities still have the API Wrappers.

(HeterogeneousCore/CUDAUtilities/interface/host_noncached_unique_ptr.h

cuda::throw_if_error() -> cudaCheck()

HeterogeneousCore/CUDAUtilities/interface/launch.h

I don't see API wrappers being used in this file.

HeterogeneousCore/CUDAUtilities/src/allocate_host.cc
HeterogeneousCore/CUDAUtilities/src/allocate_device.cc

cuda::throw_if_error() -> cudaCheck() in both

fwyzard · 2019-11-26T17:31:09Z

No impact on physics or throughput (as expected).

fwyzard · 2019-11-26T17:32:58Z

DataFormats/Math/test/CholeskyInvert_t.cu

@@ -1,5 +1,4 @@
 // nvcc -O3 CholeskyDecomp_t.cu -Icuda-api-wrappers/src/ --expt-relaxed-constexpr -gencode arch=compute_61,code=sm_61 --compiler-options="-Ofast -march=native"
-// add -DDOPROF to run  nvprof --metrics all


why do you remove this line ?

It wasn't my intention, thanks for the fix.

fwyzard · 2019-11-26T17:35:10Z

HeterogeneousCore/CUDAUtilities/src/allocate_host.cc

@@ -3,8 +3,7 @@
 #include "FWCore/Utilities/interface/Likely.h"

 #include "getCachingHostAllocator.h"
-
-#include <cuda/api_wrappers.h>
+#include "cuda/api_wrappers.h"


If still needed, <cuda/api_wrappers.h> is more correct than "cuda/api_wrappers.h".

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

waredjeb added 4 commits November 8, 2019 18:33

Replace cuda::device::current::get() and cuda::device::count() with c…

abb1197

…udautils calls

Replace cuda::device::get and cuda::device::set with native cuda

6a561c5

Fixed conflict with #407

e4c90ce

Apply code-format

93d2458

makortel reviewed Nov 11, 2019

View reviewed changes

Fix code-format

7273a06

VinInn reviewed Nov 17, 2019

View reviewed changes

Avoid conflicts with #409

2331234

Remove cuda-api-wrappers includes, code-format applied

2a1084e

fwyzard reviewed Nov 26, 2019

View reviewed changes

fwyzard added 2 commits November 26, 2019 18:37

Add back missing comment

95a7521

Use bracket for external header

ceb9ff2

fwyzard merged commit 9ff6ec5 into cms-patatrack:CMSSW_11_0_X_Patatrack Nov 26, 2019

This was referenced Nov 26, 2019

Remove the use of CUDA API wrappers #386

Closed

Remove getCurrentDevice() and setCurrentDevice() from CUDAService #415

Closed

migrate cluster track associator #409

Closed

waredjeb deleted the replace_cudaDevice branch November 27, 2019 09:31

fwyzard pushed a commit that referenced this pull request Oct 7, 2020

Replace cuda::device operations with native CUDA calls (#408)

faa6a43

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard mentioned this pull request Oct 7, 2020

Patatrack integration - calorimeters shared code (6/N) cms-sw/cmssw#31704

Merged

fwyzard pushed a commit that referenced this pull request Oct 7, 2020

Replace cuda::device operations with native CUDA calls (#408)

ed6f206

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 8, 2020

Replace cuda::device operations with native CUDA calls (#408)

d771569

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 8, 2020

Replace cuda::device operations with native CUDA calls (#408)

fd2056b

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 8, 2020

Replace cuda::device operations with native CUDA calls (#408)

4168ad1

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

This was referenced Oct 8, 2020

Patatrack integration - Pixel local reconstruction (9/N) cms-sw/cmssw#31721

Merged

Patatrack integration - Pixel track reconstruction (10/N) cms-sw/cmssw#31722

Merged

Patatrack integration - Pixel vertex reconstruction (11/N) cms-sw/cmssw#31723

Merged

fwyzard pushed a commit that referenced this pull request Oct 19, 2020

Replace cuda::device operations with native CUDA calls (#408)

b3d0303

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 20, 2020

Replace cuda::device operations with native CUDA calls (#408)

0bfd795

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 20, 2020

Replace cuda::device operations with native CUDA calls (#408)

bc806eb

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 23, 2020

Replace cuda::device operations with native CUDA calls (#408)

8566914

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Oct 23, 2020

Replace cuda::device operations with native CUDA calls (#408)

d355627

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

Replace cuda::device operations with native CUDA calls (#408)

701d20c

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

Replace cuda::device operations with native CUDA calls (#408)

61e87d7

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

Replace cuda::device operations with native CUDA calls (#408)

23d259d

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Nov 16, 2020

Replace cuda::device operations with native CUDA calls (#408)

819def4

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Nov 16, 2020

Replace cuda::device operations with native CUDA calls (#408)

f64968e

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard added a commit that referenced this pull request Nov 27, 2020

Replace cuda::device operations with native CUDA calls (#408)

530f852

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard added a commit that referenced this pull request Nov 28, 2020

Replace cuda::device operations with native CUDA calls (#408)

8ce6fe2

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Dec 25, 2020

Replace cuda::device operations with native CUDA calls (#408)

5aa5c4d

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard added a commit that referenced this pull request Dec 26, 2020

Replace cuda::device operations with native CUDA calls (#408)

d7ee292

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard added a commit that referenced this pull request Dec 26, 2020

Replace cuda::device operations with native CUDA calls (#408)

5f9e845

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Dec 29, 2020

Replace cuda::device operations with native CUDA calls (#408)

b70aeb6

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Dec 29, 2020

Replace cuda::device operations with native CUDA calls (#408)

a32cd8f

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Dec 29, 2020

Replace cuda::device operations with native CUDA calls (#408)

cfe209f

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Dec 29, 2020

Replace cuda::device operations with native CUDA calls (#408)

6b0c079

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Jan 13, 2021

Replace cuda::device operations with native CUDA calls (#408)

44a67c1

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

fwyzard pushed a commit that referenced this pull request Jan 15, 2021

Replace cuda::device operations with native CUDA calls (#408)

1ab5beb

Replaces the usage of cuda::device::count(), cuda::device::get(), cuda::device::set() and cuda::device::current::get() with native CUDA calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace remaining cuda::device operations with native CUDA calls. #408

Replace remaining cuda::device operations with native CUDA calls. #408

waredjeb commented Nov 9, 2019

makortel left a comment

makortel Nov 11, 2019

makortel Nov 11, 2019

makortel Nov 11, 2019

fwyzard Nov 26, 2019

makortel Nov 26, 2019

makortel Nov 11, 2019

waredjeb Nov 12, 2019

fwyzard Nov 26, 2019

VinInn Nov 17, 2019

VinInn commented Nov 22, 2019

makortel commented Nov 26, 2019

fwyzard commented Nov 26, 2019 •

edited

Loading

waredjeb commented Nov 26, 2019 •

edited

Loading

makortel commented Nov 26, 2019

fwyzard commented Nov 26, 2019

fwyzard Nov 26, 2019

waredjeb Nov 26, 2019

fwyzard Nov 26, 2019


		int CUDAService::getCurrentDevice() const { return cuda::device::current::get().id(); }
		int CUDAService::getCurrentDevice() const { return cudautils::currentDevice(); }

		@@ -1,5 +1,4 @@
		// nvcc -O3 CholeskyDecomp_t.cu -Icuda-api-wrappers/src/ --expt-relaxed-constexpr -gencode arch=compute_61,code=sm_61 --compiler-options="-Ofast -march=native"
		// add -DDOPROF to run nvprof --metrics all

Replace remaining cuda::device operations with native CUDA calls. #408

Replace remaining cuda::device operations with native CUDA calls. #408

Conversation

waredjeb commented Nov 9, 2019

PR description:

PR validation:

makortel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VinInn commented Nov 22, 2019

makortel commented Nov 26, 2019

fwyzard commented Nov 26, 2019 • edited Loading

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

waredjeb commented Nov 26, 2019 • edited Loading

makortel commented Nov 26, 2019

fwyzard commented Nov 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Nov 26, 2019 •

edited

Loading

logs and `nvprof`/`nvvp` profiles

waredjeb commented Nov 26, 2019 •

edited

Loading