Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUvsCPU: updates to modules and workflows for pixel reconstruction #37617

Merged
merged 2 commits into from
May 3, 2022

Conversation

AdrianoDee
Copy link
Contributor

@AdrianoDee AdrianoDee commented Apr 19, 2022

PR description:

Updates for pixel only workflows:

  • rearranging .502 and .503 to avoid running the CPU chain when a GPU is available (due to the fact that the DQM consumes hit SoA and that is only available on CPU);
  • introducing a module to copy GPU SoA to CPU;
  • running both CPU and GPU chains in .503;
  • introducing gpuPixelValidation modifier;
  • dropping .501 that only generates confusion.

PR validation:

Running .502 and .503.

@AdrianoDee
Copy link
Contributor Author

enable gpu

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37617/29386

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

  • CUDADataFormats/TrackingRecHit (heterogeneous, reconstruction)
  • DQM/SiPixelPhase1Heterogeneous (dqm)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoPixelVertexing/Configuration (reconstruction)
  • RecoPixelVertexing/PixelTrackFitting (reconstruction)

@emanueleusai, @makortel, @fwyzard, @ahmad3213, @cmsbuild, @jfernan2, @clacaputo, @slava77, @jpata, @pmandrik, @micsucmed, @rvenditti can you please review it and eventually sign? Thanks.
@ferencek, @hdelanno, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @gpetruc, @OzAmram, @fioriNTU, @jandrea, @mtosi, @idebruyn, @mmusich, @dkotlins, @threus, @dgulhan, @tvami this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

view->m_hitsModuleStart = m_hitsModuleStart;

//store transfer
if constexpr (std::is_same<Traits, cms::cudacompat::GPUTraits>::value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion:

Suggested change
if constexpr (std::is_same<Traits, cms::cudacompat::GPUTraits>::value) {
if constexpr (std::is_same_v<Traits, cms::cudacompat::GPUTraits>) {

@fwyzard
Copy link
Contributor

fwyzard commented Apr 19, 2022

The c++ part looks good to me :-)
Now to understand the python part ...

@fwyzard
Copy link
Contributor

fwyzard commented Apr 19, 2022

enable gpu

@fwyzard
Copy link
Contributor

fwyzard commented Apr 19, 2022

please test

@AdrianoDee
Copy link
Contributor Author

@mmusich you are right, I hadn't pushed the modifications to Configuration package. Now it's there. For the possible conflicts we can choose how to proceed and which goes in first. No preference from my side.

@fwyzard may be helpful: find here the dependency graphs for .502 and .503 when a GPU is available (or not). The idea is that:

  • we have both siPixelRecHitsPreSplittingCUDA and siPixelRecHitsPreSplittingCPU in the task having only one of the two running (being consumed by the downstream reconstruction);
  • the siPixelRecHitsSoA is now a SwitchProducer(s) that either convert the CUDA to SoA if running on GPU (used only if requested by DQM) either is an alias for SiPixelRecHitsSoAFromLegacy

For the suggestion to move std::is_same with std::is_same_v I've supposed the suggestion was to move it everywhere.

Anyway, I'll squash all the commits at the end of the review.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37617/29389

@jpata
Copy link
Contributor

jpata commented May 3, 2022

+reconstruction

  • no reco changes (DQM-related reorganization)

1 similar comment
@jpata
Copy link
Contributor

jpata commented May 3, 2022

+reconstruction

  • no reco changes (DQM-related reorganization)

@fwyzard
Copy link
Contributor

fwyzard commented May 3, 2022

Thanks @AdrianoDee for the dependency graphs.

Here is a comparison of the .502 workflow running only on CPU before this PR
502_cpu
and after:
502_cpu

The difference is that the siPixelRecHitsPreSplittingSoA module has been replaced by a SwitchProducer, and the module doing the reconstruction on CPU now is siPixelRecHitsPreSplittingCPU.

Here is the same comparison for the .502 workflow running on GPU, before
502
and after
502

The siPixelRecHitsPreSplittingSoA module hre is replaced by the @cuda branch of the SwitchProducer, which runs the SiPixelRecHitSoAFromCUDA module instead of the SiPixelRecHitSoAFromLegacy one.

@fwyzard
Copy link
Contributor

fwyzard commented May 3, 2022

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor

fwyzard commented May 3, 2022

@AdrianoDee with these changes, how does the .502 workflow running on CPU compare with the .501 workflow ?

@perrotta
Copy link
Contributor

perrotta commented May 3, 2022

+1

@cmsbuild cmsbuild merged commit 300b12d into cms-sw:master May 3, 2022
@AdrianoDee
Copy link
Contributor Author

AdrianoDee commented May 3, 2022

@fwyzard, I checked the dependency graphs (in pdf here) and the tasks and there is a misalignment in the names in the local reco:

  • in .501 the siPixelRecHitsPresplitting holds the legacy and the SoA and it is a SwitchProducedCUDA with the @cpu branch only;

  • in .502 on GPU the siPixelRecHitsPresplitting holds only the legacy hits (either through and EDAlias for @cpu or with the SiPixelRecHitFromCUDA new producer) and the SoA is in siPixelRecHitsPreSplittingSoA (again either through and EDAlias for @cpu or with the SiPixelRecHitsSoAFromCUDA new producer), being both a SwitchProducedCUDA with both branches;

For the rest (and the final result) they run the same. Is this naming mismatch a possible issue? In case I'm trying to figure out a possible workaround given that, as far as I have understood, SwitchProducedCUDA can't hold a single branch with an EDAlias and a producer may not be replaced with with an EDAlias.

501vs502

@AdrianoDee
Copy link
Contributor Author

Also, do we need a backport of this? I missed the ORP today, sorry.

@fwyzard
Copy link
Contributor

fwyzard commented May 3, 2022

Also, do we need a backport of this? I missed the ORP today, sorry.

No, I don't think so.

@fwyzard
Copy link
Contributor

fwyzard commented May 3, 2022

SwitchProducedCUDA can't hold a single branch with an EDAlias

Actually, I don't know if that would be a problem -- is it something you tried and didn't work ?

@qliphy
Copy link
Contributor

qliphy commented May 4, 2022

@AdrianoDee There is a IB issue after this PR gets merged, would you please have a check?
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc10/CMSSW_12_4_X_2022-05-03-2300/pyRelValMatrixLogs/run/39434.501_TTbar_14TeV+2026D88_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D88_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal.log#/

cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/5525f5050df49f51ff7c29cd32fd5dea/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-05-03-2300/src/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc:144: virtual void SiPixelRecHitSoAFromLegacy::produce(edm::StreamID, edm::Event&, const edm::EventSetup&) const: Assertion gind < nMaxModules' failed. cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/5525f5050df49f51ff7c29cd32fd5dea/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-05-03-2300/src/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc:144: virtual void SiPixelRecHitSoAFromLegacy::produce(edm::StreamID, edm::Event&, const edm::EventSetup&) const: Assertion gind < nMaxModules' failed.

@AdrianoDee
Copy link
Contributor Author

@AdrianoDee There is a IB issue after this PR gets merged, would you please have a check?

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc10/CMSSW_12_4_X_2022-05-03-2300/pyRelValMatrixLogs/run/39434.501_TTbar_14TeV+2026D88_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D88_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+DigiTrigger+RecoGlobal+HARVESTGlobal.log#/

cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/5525f5050df49f51ff7c29cd32fd5dea/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-05-03-2300/src/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc:144: virtual void SiPixelRecHitSoAFromLegacy::produce(edm::StreamID, edm::Event&, const edm::EventSetup&) const: Assertion `gind < nMaxModules' failed.

cmsRun: /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/5525f5050df49f51ff7c29cd32fd5dea/opt/cmssw/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-05-03-2300/src/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc:144: virtual void SiPixelRecHitSoAFromLegacy::produce(edm::StreamID, edm::Event&, const edm::EventSetup&) const: Assertion `gind < nMaxModules' failed.

@qliphy on it

@AdrianoDee AdrianoDee mentioned this pull request May 4, 2022
@AdrianoDee AdrianoDee deleted the gpu_dqm_pixels_124 branch May 4, 2022 10:56
@AdrianoDee
Copy link
Contributor Author

@qliphy fixed in #37798

Actually, I don't know if that would be a problem -- is it something you tried and didn't work ?

@fwyzard I was doing some other stupid mistake. It actually works and I implemented in #37798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants