R2D: use GPU::SimpleVector for the error unpacking #14

calabria · 2018-02-16T15:06:53Z

My changes seem to work fine (always same number and type of errors as in the serial code), but there is a crash due to the calibration part. I don't know if this was expected because still under development.

-bash-4.2$ cmsRun tkreco.py
%MSG-i ThreadStreamSetup: (NoModuleName) 16-Feb-2018 15:42:01 CET pre-events
setting # threads 8
setting # streams 8
%MSG
16-Feb-2018 15:42:12 CET Initiating request to open file file:/data/patatrack/innocent/run2017/JetHT_raw304797HL.root
16-Feb-2018 15:42:13 CET Successfully opened file file:/data/patatrack/innocent/run2017/JetHT_raw304797HL.root
Begin processing the 1st record. Run 304797, Event 105123496, LumiSection 70 on stream 2 at 16-Feb-2018 15:42:32.267 CET
%MSG-w HcalSeverityLevelComputer: HBHEPhase1Reconstructor:hbheprereco 16-Feb-2018 15:42:34 CET Run: 304797 Event: 105123496
HcalSeverityLevelComputer: Error: RecHitFlag >>HFDigiTime<< unknown. Ignoring.
%MSG
caching calibs for 1856 pixel detectors of size 1647360
sizes 1 1 2
precisions g 1.23308 0.0761627
1440 1856
cmsRun: /afs/cern.ch/work/c/calabria/private/CMSSW_10_1_0_pre1/src/EventFilter/SiPixelRawToDigi/plugins/SiPixelFedCablingMapGPU.cc:164: void processGainCalibration(const SiPixelGainCalibrationForHLT&, const TrackerGeometry&, SiPixelGainForHLTonGPU*&, SiPixelGainForHLTonGPU::DecodingStructure*&): Assertion `p!=ind.end() && p->detid==dus[i]->geographicalId()' failed.

A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

VinInn · 2018-02-16T15:12:17Z

1440 1856

this is the problem: the calibration are the worng one.... (fro phase0) please add # load HLT payload process.GlobalTag = GlobalTag(process.GlobalTag, '100X_dataRun2_asv2plusPixelGainfromHLT_v1', '') to your config... v.

fwyzard · 2018-02-16T16:43:18Z

EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu

+            temp_err.word = ww;
+            temp_err.fedId = fedId;
+            temp_err.rawId = rawId;
+            err->push_back(temp_err);


can you use emplace instead of creating a temporary object ?

actually I have already tried this:
err->emplace_back(rawId, ww, error, fedId);
and it does not compile:

Compiling /afs/cern.ch/work/c/calabria/private/DUMMY_GPU/forse/CMSSW_10_1_0_pre1/src/EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu
/afs/cern.ch/work/c/calabria/private/DUMMY_GPU/forse/CMSSW_10_1_0_pre1/src/HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h(68): error: this pack expansion produced more than one expression, and a single expression is needed here
detected during instantiation of "int GPU::SimpleVector::emplace_back(Ts &&...) [with T=error_obj, Ts=<uint32_t &, uint32_t &, unsigned char, unsigned char>]"
/afs/cern.ch/work/c/calabria/private/DUMMY_GPU/forse/CMSSW_10_1_0_pre1/src/EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu(491): here

/afs/cern.ch/work/c/calabria/private/DUMMY_GPU/forse/CMSSW_10_1_0_pre1/src/HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h(68): error: no suitable constructor exists to convert from "uint32_t" to "error"
detected during instantiation of "int GPU::SimpleVector::emplace_back(Ts &&...) [with T=error_obj, Ts=<uint32_t &, uint32_t &, unsigned char, unsigned char>]"
/afs/cern.ch/work/c/calabria/private/DUMMY_GPU/forse/CMSSW_10_1_0_pre1/src/EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu(491): here

2 errors detected in the compilation of "/tmp/calabria/tmpxft_00009396_00000000-8_RawToDigiGPU.compute_61.cpp1.ii".
config/SCRAM/GMake/Makefile.rules:2079: recipe for target 'tmp/slc7_amd64_gcc630/src/EventFilter/SiPixelRawToDigi/plugins/EventFilterSiPixelRawToDigiGPUPlugins/RawToDigiGPU.o' failed
gmake: *** [tmp/slc7_amd64_gcc630/src/EventFilter/SiPixelRawToDigi/plugins/EventFilterSiPixelRawToDigiGPUPlugins/RawToDigiGPU.o] Error 1
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

emplace_back() requires a constructor (I have stumbled on this as well).

I googled this, and there is a defect report to the C++ standard since 2011... with discussions ongoing more or less continuously ever since: a proposal to fix it since 2015, and further comments just few weeks ago...

...bottom line: add a constructor and use emplace_back

fwyzard · 2018-02-16T16:43:31Z

EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu

+          temp_err.word = ww;
+          temp_err.fedId = fedId;
+          temp_err.rawId = rawId;
+          err->push_back(temp_err);


... emplace ...

fwyzard · 2018-02-16T16:45:14Z

EventFilter/SiPixelRawToDigi/plugins/SiPixelFedCablingMapGPU.cc

-    LogDebug("SiPixelFedCablingMapGPU") << i << std::setw(20) << (bool) badRocs[i] << std::setw(20) << (bool) modToUnp[i] << std::endl;
+    LogDebug("SiPixelFedCablingMapGPU") << i << std::setw(20) << fedMap[i]  << std::setw(20) << linkMap[i]  << std::setw(20) << rocMap[i] << std::endl;
+    LogDebug("SiPixelFedCablingMapGPU") << i << std::setw(20) << RawId[i]   << std::setw(20) << rocInDet[i] << std::setw(20) << moduleId[i] << std::endl;
+    LogDebug("SiPixelFedCablingMapGPU") << i << std::setw(20) << (int)badRocs[i] << std::setw(20) << (int)modToUnp[i] << std::endl;


I would leave them as bool, since that' what they are used as

fwyzard · 2018-02-16T16:46:03Z

EventFilter/SiPixelRawToDigi/plugins/SiPixelRawToDigiGPU.cc

-  cudaMallocHost(&errRawID_h, sizeof(uint32_t)*WSIZE);
-  cudaMallocHost(&errWord_h,  sizeof(uint32_t)*WSIZE);
-  cudaMallocHost(&errFedID_h, sizeof(uint32_t)*WSIZE);
+  uint32_t VSIZE = sizeof(GPU::SimpleVector<error_obj>);


use uppercase only for #define

fwyzard · 2018-02-16T16:47:19Z

EventFilter/SiPixelRawToDigi/plugins/SiPixelRawToDigiGPU.cc

-  cudaMallocHost(&errFedID_h, sizeof(uint32_t)*WSIZE);
+  uint32_t VSIZE = sizeof(GPU::SimpleVector<error_obj>);
+  uint32_t ESIZE = sizeof(error_obj);
+  bool success = cudaMallocHost(&error_h, VSIZE) == cudaSuccess &&


can you add calls to cudaCheck() around all cudaMallocHost(), and remove the assert ?

fwyzard · 2018-02-16T16:50:22Z

HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h

@@ -83,6 +83,11 @@ template <class T> struct SimpleVector {
  __inline__ __host__ __device__ int capacity() const { return m_capacity; }

  __inline__ __host__ __device__ T *data() const { return m_data; }
+
+  __inline__ __host__ __device__ void set_size(int size) { m_size = size; }


I would call this resize(), unless people find it confusing ?

I agree on resize(), the effect would anyway be similar to std::vector. (or remove as it is not currently used).

fwyzard · 2018-02-16T16:52:29Z

HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h

+
+  __inline__ __host__ __device__ void set_size(int size) { m_size = size; }
+
+  __inline__ __host__ __device__ void set_data(T * data) { m_data = data; }


I don't know if I like or not a set_data() method... other comments ?

I don't particularly like. If kept it should be accompanied with capacity parameter.

In principle I'd expect the same be achieved with move/copy assignment. On the other hand, the pointed memory areas are copied CPU<->GPU outside of the SimpleVector so there could be cases where it is handier (with the current design of SimpleVector to use set_data().

(I must say it took me a while to figure out what exactly happens in the code that uses the 'set_data(). I'm afraid that having the memory owned outside of the SimpleVectorand transferring theSimpleVector` members and the data memory separately between CPU and GPU may lead to difficult-to-understand code. I'm fine with that now, but eventually we should aim to simplify.)

Yes, I agree it should set also capacity (and reduce size if it exceeds the new capacity).

And I agree long term we should come up with a better approach (or get unified memory to work).

On further thought, if one resets the data pointer, the object doesn't anymore know what would be the correct size, so it would have to be given from outside as well (e.g. if the data has valid elements already). To me this sounds like the job of move assignment (I came to the conclusion that copying should be forbidden as two copies would just have the same data pointers). But I can imagine that with the outsourced memory management also the results move operations may become difficult to follow.

fwyzard · 2018-02-16T16:53:20Z

@makortel @felicepantaleo can you comment on the SimpleVector interface changes ?

makortel · 2018-02-16T21:12:53Z

EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.h

+    uint32_t word;
+    unsigned char errorType;
+    unsigned char fedId;
+    } error_obj;


Is there any reason to not be fully C++ here (i.e. drop typedef and the trailing name)?

makortel · 2018-02-16T21:16:48Z

EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu

+      cudaCheck(cudaMemcpy(error_h, c.error_d, VSIZE, cudaMemcpyDeviceToHost));
+      error_h->set_data(data_h);
+      int size = error_h->size();
+      cudaCheck(cudaMemcpy(data_h, c.data_d, size*ESIZE, cudaMemcpyDeviceToHost));


Why change to synchronous copy?

(or does cudaGetLastError() create a synchronization point?)

no, it doesn't - though it may fail to report errors from asynchronous calls

Sorry, can you please summarize the discussion? What should I do with this? Thanks.

fwyzard · 2018-02-17T14:46:28Z

cudaMemcpy should stay cudaMemcpyAsync .A

felicepantaleo · 2018-02-19T13:36:15Z

EventFilter/SiPixelRawToDigi/plugins/RawToDigiGPU.cu

+      cudaCheck(cudaMemcpyAsync(error_h, c.error_d, vsize, cudaMemcpyDeviceToHost, c.stream));
+      error_h->set_data(data_h);
+      int size = error_h->size();
+      cudaCheck(cudaMemcpyAsync(data_h, c.data_d, size*esize, cudaMemcpyDeviceToHost, c.stream));
  }
  cudaStreamSynchronize(c.stream);


this synchronization should go right after the cudaCheck(cudaMemcpyAsync(error_h, c.error_d, vsize, cudaMemcpyDeviceToHost, c.stream));
Otherwise you have a race condition as the size could be transferred from the GPU after you've used it in
int size = error_h->size();

felicepantaleo · 2018-02-19T13:41:44Z

EventFilter/SiPixelRawToDigi/plugins/SiPixelRawToDigiGPU.cc

-    if (errType_h[i] != 0) {
-      SiPixelRawDataError error(errWord_h[i], errType_h[i], errFedID_h[i]+1200);
-      errors[errRawID_h[i]].push_back(error);
+  uint32_t size = error_h->size();


can you replace uint32_t with auto, for readability, same in the loop...

cmsbot · 2018-02-19T17:41:20Z

A new Pull Request was created by @calabria (Cesare Calabria) for CMSSW_10_1_X_Patatrack.

It involves the following packages:

EventFilter/SiPixelRawToDigi
HeterogeneousCore/CUDAUtilities

The following packages do not have a category, yet:

HeterogeneousCore/CUDAUtilities
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

Fix indentation

…ms (L1Trigger/TrackFindingTMTT) (cms-sw#29381) * create separate PRs for the two L1TK packages * Improved KF efficiency at high eta * Moved MC data files to cms-data * Removed old file * Removed KF HLS to put instead in external library * Ran scram b code-format * Delete KF4ParamsComb.h.bak * Delete KF4ParamsCombIV.bak * Delete KF4ParamsCombV2.bak * Delete KF5ParamsComb.h.bak * Delete KF4ParamsComb.cc.bak * Delete KF4ParamsCombIV.bak * Delete KF4ParamsCombV2.bak * Delete KF5ParamsComb.cc.bak * L1 tk integration tmtt pre5 (#7) * Added CMS code style fixes * Removed old file * Reapplied stub b code-format * All code review changes (#13) * Fix clang errors (#14) * fixed clang error * directory for MC txt files * Fixed clang warnings + minor simplifications (#15) * tweak * tweak * Fixed clang warnings and small simplifications * Fixed clang warnings and small simplifications * All remaining review comments addressed (#16) * Replaced vector size with empty function * Simplified DegradeBend and StubWindowSuggest * Fixed more review comments * More review comments * code reformat * Ran exhaustive clang tidy * Added library to BuildFile.xml (#17) * Deleted TrackFindingTMT/data/README (#18) * Added library to BuildFile.xml (This was already done yesterday. Not sure why it appears again) * README file in data directory deleted * Fix review comments (#20) Co-authored-by: Louise Skinnari <louise.skinnari@cern.ch>

Hgcal eol pulse update 112 x bis

Cesare added 5 commits February 16, 2018 14:12

using simple vector + other fixes

ac73063

fixes

a3229ae

cleanup

8170be2

cleanup

7359ce4

fix

3b72af2

fwyzard requested changes Feb 16, 2018

View reviewed changes

makortel reviewed Feb 16, 2018

View reviewed changes

Cesare added 2 commits February 17, 2018 15:04

some of the requested fixes

1cb24fe

fix async copy

34ffb68

fix async copy

9111ec6

felicepantaleo requested changes Feb 19, 2018

View reviewed changes

felicepantaleo reviewed Feb 19, 2018

View reviewed changes

fwyzard approved these changes Feb 19, 2018

View reviewed changes

required changes

e6447cd

cmsbot added comparison-pending labels Feb 19, 2018

fwyzard merged commit 2f625fc into cms-patatrack:CMSSW_10_1_X_Patatrack Feb 20, 2018

fwyzard removed comparison-notrun labels Apr 10, 2018

fwyzard removed reconstruction-pending labels Apr 10, 2018

fwyzard pushed a commit that referenced this pull request Dec 7, 2018

Merge pull request #14 from andrzejnovak/switch

fc232c4

Fix indentation

fwyzard pushed a commit that referenced this pull request Jul 12, 2020

Merge pull request #14 from PFCal-dev/hgcal_eol_pulse_update_112X_bis

d7249db

Hgcal eol pulse update 112 x bis

fwyzard pushed a commit that referenced this pull request Oct 8, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

ae9e1e7

fwyzard mentioned this pull request Oct 8, 2020

Patatrack integration - Pixel local reconstruction (9/N) cms-sw/cmssw#31721

Merged

fwyzard pushed a commit that referenced this pull request Oct 19, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

5603b1c

fwyzard pushed a commit that referenced this pull request Oct 20, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

2abc3b6

fwyzard pushed a commit that referenced this pull request Oct 23, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

5b7f638

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

808872f

fwyzard pushed a commit that referenced this pull request Nov 16, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

d46d0b1

fwyzard pushed a commit that referenced this pull request Dec 25, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

01db991

fwyzard pushed a commit that referenced this pull request Dec 29, 2020

R2D: use GPU::SimpleVector for the error unpacking (#14)

1dfe4f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2D: use GPU::SimpleVector for the error unpacking #14

R2D: use GPU::SimpleVector for the error unpacking #14

calabria commented Feb 16, 2018

VinInn commented Feb 16, 2018 via email

fwyzard Feb 16, 2018

calabria Feb 16, 2018 •

edited

Loading

makortel Feb 16, 2018

fwyzard Feb 17, 2018

fwyzard Feb 17, 2018

fwyzard Feb 16, 2018

fwyzard Feb 16, 2018

fwyzard Feb 16, 2018

fwyzard Feb 16, 2018

fwyzard Feb 16, 2018

makortel Feb 16, 2018

fwyzard Feb 16, 2018

makortel Feb 16, 2018

fwyzard Feb 16, 2018 •

edited

Loading

makortel Feb 16, 2018

fwyzard commented Feb 16, 2018

makortel Feb 16, 2018

fwyzard Feb 17, 2018

makortel Feb 16, 2018

fwyzard Feb 17, 2018

calabria Feb 17, 2018

fwyzard commented Feb 17, 2018 via email

felicepantaleo Feb 19, 2018

felicepantaleo Feb 19, 2018

cmsbot commented Feb 19, 2018


		__inline__ __host__ __device__ void set_size(int size) { m_size = size; }

		__inline__ __host__ __device__ void set_data(T * data) { m_data = data; }

R2D: use GPU::SimpleVector for the error unpacking #14

R2D: use GPU::SimpleVector for the error unpacking #14

Conversation

calabria commented Feb 16, 2018

VinInn commented Feb 16, 2018 via email

Choose a reason for hiding this comment

calabria Feb 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard Feb 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Feb 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Feb 17, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsbot commented Feb 19, 2018

calabria Feb 16, 2018 •

edited

Loading

fwyzard Feb 16, 2018 •

edited

Loading