-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECAL RecHit producer Alpaka migration #46453
ECAL RecHit producer Alpaka migration #46453
Conversation
cms-bot internal usage |
type ecal |
enable gpu |
A new Pull Request was created by @thomreis for master. It involves the following packages:
@AdrianoDee, @Moanwar, @antoniovagnerini, @antoniovilela, @atpathak, @cmsbuild, @consuegs, @davidlange6, @fabiocos, @francescobrivio, @jfernan2, @kskovpen, @mandrenguyen, @miquork, @nothingface0, @perrotta, @rappoccio, @rvenditti, @srimanob, @subirsarkar, @sunilUIET, @syuvivida, @tjavaid can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
+1 Size: This PR adds an extra 132KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
Hello, just out of curiosity, I imagine this development will eventually enter the HLT menu for 2025. |
Hi @mmusich the portable rechit produer in this PR does not have the full functionality of the CPU producer currently used in the HLT menu. Just like the CUDA version it lacks the algorithms for recovery of dead channels. However, in the current pp HLT menu there does not seem to be any energy recovery done neither so it may be possible to actually use this in 2025 already. This needs to be checked in more detail however. We can add a customization function to this PR but I think we should only activate it once it is confirmed that the portable producer gives the same results than the currently used one at the HLT. Of course we could also do this in a separate PR if that is preferred. |
assign heterogeneous ? |
Hi @cms-sw/alca-l2 @cms-sw/db-l2 @cms-sw/pdmv-l2 @cms-sw/dqm-l2 do you have any comments on this PR? |
+pdmv
|
+dqm |
Hi @cms-sw/alca-l2 @cms-sw/db-l2 please take a look at this PR and let us know if you have comments. |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
This broke gcc13 builds,
as the compiler helpfully tells us:
With the include, |
this PR also adds the following dicts in DataFormats/EcalRecHit/src/classes_def.xml file.
The lost dictionary checker thinks that these should be defined in CUDADataFormats/EcalRecHitSoA package (as package name matches the dict names). So should we move these dicts ( along with the headers) to |
Note that all |
ok, so for now I will update https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/duplicateReflexLibrarySearch.py#L31-L80 to not complain about these dicts |
PR description:
Migration of the ECAL RecHit producer from CUDA to Alpaka, including the required portable data and conditions formats and an extension of the DQM module to compare RecHits produced on the CPU or GPU.
While being a direct replacement of the existing CUDA RecHit producer for the most part, the migrated Alpaka version adds the RecHit time variable, which the CUDA version did not calculate. In addition the Alpaka version adds support for Phase 2, where no inputs from the endcaps will be existing anymore.
In comparison with the legacy CPU producer the Alpaka algorithm still lacks the recovery of dead channels and can therefore not yet be used to replace the legacy producer in production.
PR validation:
A comparison of the legacy CPU code vs. CUDA comparison (12834.513) with the legacy CPU code vs. Alpaka (on GPU) comparison (13834.413) with 9k TTbar events shows almost identical results between CUDA and Alpaka (with the exception of the time variables as mentioned above) and very good agreement with the legacy CPU version for both implementations.
In addition, a comparison of the Alpaka module running on CPU gives almost identical results to the module running on GPU (nvidia).