-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU: better hits #81
GPU: better hits #81
Conversation
A new Pull Request was created by @VinInn (Vincenzo Innocente) for CMSSW_10_2_X_Patatrack. It involves the following packages: Geometry/TrackerGeometryBuilder @cmsbot, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
||
cudaCheck(cudaMalloc((void**) & gpu_d, sizeof(HitsOnGPU))); | ||
cudaCheck(cudaMemcpy(gpu_d, &gpu_, sizeof(HitsOnGPU), cudaMemcpyDefault)); | ||
cudaCheck(cudaDeviceSynchronize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor detail, but the cudaMemcpy
could be made asynchronous by passing the CUDA stream from here
cmssw/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitHeterogeneous.cc
Lines 153 to 155 in 67bbc84
void SiPixelRecHitHeterogeneous::beginStreamGPUCuda(edm::StreamID streamId, cuda::stream_t<>& cudaStream) { | |
gpuAlgo_ = std::make_unique<pixelgpudetails::PixelRecHitGPUKernel>(); | |
} |
for (int i=0;i<10;++i) std::cout << phase1PixelTopology::layerName[i] << ':' << hitsLayerStart[i] << ' '; | ||
std::cout << "end:" << hitsLayerStart[10] << std::endl; | ||
|
||
cudaCheck(cudaMemcpyAsync(gpu_.hitsLayerStart_d, hitsLayerStart, (11) * sizeof(uint32_t), cudaMemcpyDefault, stream.id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not safe because hitsLayerStart
is a local array. It should have a life time longer than the async copy takes (effectively to be a member of PixelRecHitGPUKernel
).
|
||
output->shrink_to_fit(); | ||
output->collection.shrink_to_fit(); | ||
iEvent.put(std::move(output)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be changed to iEvent.put<Output>(std::move(output))
to produce the HeterogeneousProduct
for CPU.
siPixelRecHitsPreSplitting = siPixelRecHits.clone( | ||
src = 'siPixelClustersPreSplitting' | ||
) | ||
|
||
|
||
from RecoLocalTracker.SiPixelRecHits.siPixelRecHitHeterogeneous_cfi import siPixelRecHitHeterogeneous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #77 when doing the same for raw2cluster I put this import to RecoLocalTracker_cff.py
.
|
||
iEvent.put(std::move(output)); | ||
iEvent.put<Output>(std::move(output), [this, hits, hclusters](const GPUProduct&, CPUProduct& cpu) { | ||
this->convertGPUtoCPU(hclusters, hits, cpu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid capturing hits
by using the HitsOnCPU
as the GPUProduct
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the lifetime of the lambda? in principle in this way once the lambda is gone hits are gone as well
while as part of GPUProduct will stay in the event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda is store as part of the HeterogeneousProduct' (as is the
GPUProduct`), so it will stay to the end of the event anyway.
from RecoLocalTracker.SiPixelRecHits.siPixelRecHitHeterogeneous_cfi import siPixelRecHitHeterogeneous | ||
|
||
from RecoLocalTracker.SiPixelRecHits.siPixelRecHitHeterogeneousConverter_cfi import siPixelRecHitHeterogeneousConverter as _siPixelRecHitHeterogeneousConverter | ||
gpu.toReplaceWith(siPixelRecHitsPreSplitting, _siPixelRecHitHeterogeneousConverter.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing siPixelRecHits
before the clone()
to ...PreSplitting
would work as well (then in principle the default src
should be kept as siPixelClusters
).
Combining with my comment above, this file could be left untouched.
constexpr char const * layerName[10] = {"BL1","BL2","BL3","BL4", | ||
"E+1", "E+2", "E+3", | ||
"E-1", "E-2", "E-3" | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that this introduces yet another convention for naming pixel layers. It seems to be only for debug prints, so I don't object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makortel what are the "usual" pixel layer names, and where are they defined ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We (=CMS) have many conventions in different places
- Seeding layers are specified with
BPixN
andFPixN_pos
/FPixN_neg
PixelLayerTriplets.layerList = cms.vstring('BPix1+BPix2+BPix3', 'BPix1+BPix2+FPix1_pos', 'BPix1+BPix2+FPix1_neg', 'BPix1+FPix1_pos+FPix2_pos', 'BPix1+FPix1_neg+FPix2_neg' ) PixelSubdetector
usesenum
sPixelBarrel
andPixelForward
enum SubDetector {PixelBarrel=1,PixelEndcap=2}; GeomDetEnumerators
usesenum
sPixelBarrel
,PixelEndcap
,P1PXB
,P1PXEC
,P2PXB
,P2PXEC
enum SubDetector {PixelBarrel, PixelEndcap, TIB, TOB, TID, TEC, CSC, DT, RPCBarrel, RPCEndcap, GEM, ME0, P2OTB, P2OTEC, P1PXB, P1PXEC, P2PXB, P2PXEC, TimingBarrel, TimingEndcap, invalidDet}; TrackerTopology
usespxb
andpxf
in function prefixescmssw/DataFormats/TrackerCommon/interface/TrackerTopology.h
Lines 160 to 163 in 96559f3
unsigned int pxbModule(const DetId &id) const { return ((id.rawId()>>pbVals_.moduleStartBit_)& pbVals_.moduleMask_); } unsigned int pxfModule(const DetId &id) const { TrackingNtuple
python library usesBPixN
andFPixN+
/FPixN-
in printouts
cmssw/Validation/RecoTrack/python/plotting/ntupleDataFormat.py
Lines 100 to 112 in 96559f3
if subdet in [SubDet.FPix, SubDet.TID, SubDet.TEC] or isPhase2OTBarrel: sideNum = get("side") if sideNum == 1: side = "-" elif sideNum == 2: side = "+" elif isPhase2OTBarrel and sideNum == 3: side = "" else: side = "?" return "%s%d%s" % (SubDet.toString(subdet), getattr(self._tree, self._prefix+"_layer")[self._index], side)
I don't think none of these is authoritative enough to suggest a change in here (since they're for printouts only, for configuration input's I'd probably suggest the seeding layers' convention). Anyway these have the nice property (on purpose?) that the length of BPix and FPix strings are the same (that none of the other conventions have).
This comment has been minimized.
This comment has been minimized.
Apart from the requested changes and merge conflicts, I have the impression that the GPU workflows (10824.8) do not produce any tracks any more ? |
From https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/42f6f40e4973cddd77d4138005948039b75ff1ca/RelValTTbar_13-CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/10824.8/plots_summary.html |
impressive! |
btw what is the new baseline w/r/t I have to sync? |
CMSSW_10_2_0_pre5_Patatrack - I think the instructions [here](at https://patatrack.web.cern.ch/patatrack/wiki/patatrackdevelopment.html) are up to date. |
I started from CMSSW_10_2_0_pre5_Patatrack ... |
This comment has been minimized.
This comment has been minimized.
Indeed, now this PR reproduces the results of the development branch. |
Validation summaryReference release CMSSW_10_2_0_pre5 at 30c7b03
|
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
|
||
std::cout << "hit layerStart "; | ||
for (int i=0;i<10;++i) std::cout << phase1PixelTopology::layerName[i] << ':' << hitsLayerStart_[i] << ' '; | ||
std::cout << "end:" << hitsLayerStart_[10] << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, now we have a per-event (non-thread safe) printout from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be removed at next iteration when the sorting will be implemented as well
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrate PixelRecHit EDProducer to HeterogeneousEDProducer, including the cpu product. Data structures on gpu now include everything needed for Doublets, CA and fit. Layer splitting done: phi sorting (or partial sorting) requires #69. Includes some cleanup and bug fixes.
Migrated PixelRecHit producer to Heterogeneous (including a cpu product)
Data structures on gpu now include everything needed for Doublets, CA and fit.
Layer splitting done: phi sorting (or partial sorting) requires #69.
took the opportunity for some cleanup and bug fixes.