Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneous Workflow (CLUE) in Phase2 HLT Simplified Menu #45178

Merged
merged 5 commits into from
Jul 2, 2024

Conversation

rovere
Copy link
Contributor

@rovere rovere commented Jun 8, 2024

PR description:

This PR introduces the first heterogeneous workflow in the Phase2 HLT Simplified menu. Key features include:

  • Dedicated procModifier: The new procModifier (heterogeneousCLUE) configures the menu specifically for its timing variant, not the default one.
  • During the review it was agreed to reuse the already existing alpaka procModifier.
  • Dependency: Requires this external to compile correctly.

Currently, no workflow utilizes this feature publicly. Private tests on the HLT Phase2 timing server show an increase in throughput, consistent with CE-E clustering in HGCAL being less than 5% of overall HLT Phase2 timing.

Bug Fix Included: This PR addresses a bug affecting the position of layer clusters. Rarely, the log-weighted algorithm returns all negative (i.e., 0) weights, defaulting to (0,0) in the transverse plane. This caused issues at later stages when eta and phi of the clusters were needed. The new default in these rare cases is to use the seed position.

NOTE: This PR, via the included procModifier, only configures the timing variant of the menu, not the default one!

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 8, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 8, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45178/40533

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 8, 2024

A new Pull Request was created by @rovere for master.

It involves the following packages:

  • Configuration/ProcessModifiers (operations)
  • DataFormats/HGCalReco (reconstruction, upgrade)
  • DataFormats/Portable (heterogeneous)
  • HLTrigger/Configuration (hlt)
  • RecoLocalCalo/HGCalRecProducers (reconstruction, upgrade)

@makortel, @fwyzard, @mandrenguyen, @cmsbuild, @Martin-Grunewald, @davidlange6, @antoniovilela, @jfernan2, @rappoccio, @fabiocos, @subirsarkar, @mmusich, @srimanob can you please review it and eventually sign? Thanks.
@cseez, @sameasy, @edjtscott, @makortel, @youyingli, @Martin-Grunewald, @silviodonato, @apsallid, @hatakeyamak, @vandreev11, @lgray, @missirol, @fabiocos, @sethzenz, @SohamBhattacharya, @bsunanda, @pfs, @felicepantaleo, @lecriste this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor

fwyzard commented Jun 9, 2024

please test with cms-sw/cmsdist#9231

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 9, 2024

-1

Failed Tests: Build HeaderConsistency ClangBuild
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7ed35b/39803/summary.html
COMMIT: 164e86b
CMSSW: CMSSW_14_1_X_2024-06-09-0000/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45178/39803/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7ed35b/39803/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7ed35b/39803/git-merge-result

Build

I found compilation error when building:

>> Subsystem HLTrigger built
>> Subsystem Utilities built
Copying tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync/libRecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync/libRecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync/libRecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1864: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync/libRecoLocalCaloHGCalRecProducersPluginsPortableROCmAsync_rocm.a] Error 1
>> Entering Package FWCore/Version
>> Leaving Package FWCore/Version
>> Package FWCore/Version built
Entering library rule at FWCore/Version
>> Compiling  src/FWCore/Version/src/GetFileFormatVersion.cc


Clang Build

I found compilation error while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

>> Entering Package FWCore/Version
>> Entering Package DataFormats/HGCalReco
>> Entering Package DataFormats/Portable
>> Entering Package RecoLocalCalo/HGCalRecProducers
>> Compile sequence completed for CMSSW CMSSW_14_1_X_2024-06-09-0000
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1
+ eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-06-09-0000/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)'
++ scram build outputlog
>> Entering Package FWCore/Version
Entering library rule at FWCore/Version
>> Compiling  src/FWCore/Version/src/GetFileFormatVersion.cc


@smuzaffar
Copy link
Contributor

test parameters:

@smuzaffar
Copy link
Contributor

please test

@mandrenguyen
Copy link
Contributor

Looking at the comparisons here we see small changes to quantities downstream from TICL.

Is that expected? From the PR description there is a bug fix mentioned, but I had the impression that should be rare.

@rovere
Copy link
Contributor Author

rovere commented Jun 26, 2024

Looking at the comparisons here we see small changes to quantities downstream from TICL.

Is that expected? From the PR description, there is a bug fix mentioned, but I had the impression that should be rare.

Ciao @mandrenguyen, thanks for looking into this. There are two aspects at play here. First, there's the bug fix mentioned in the PR description. While this bug is rare, it is indeed a genuine issue. Second, there's a change in the logic inside the calculateDistanceToHigher function. Although not a major alteration, the original if statement has been extended to ensure better reproducibility between CPU and GPU. This change accounts for the fact that HGCAL consists of an extremely regular set of cells, where the distances between cells are identical. The differences observed in Tracksters and derived quantities stem from the variations in layer clusters. I've reviewed the DQM bin-to-bin comparison histograms and didn't spot anything suspicious.

@mandrenguyen
Copy link
Contributor

mandrenguyen commented Jun 26, 2024

+1
Changes in comparisons are expected, as per comment: #45178 (comment)

@mmusich
Copy link
Contributor

mmusich commented Jun 26, 2024

@rovere just to be sure, should we wait for a commit to solve https://github.com/cms-sw/cmssw/pull/45178/files#r1652101107 (and following)?

@fwyzard
Copy link
Contributor

fwyzard commented Jun 26, 2024

@rovere just to be sure, should we wait for a commit to solve https://github.com/cms-sw/cmssw/pull/45178/files#r1652101107 (and following)?

I think it looks fixed in the latest commit. Can you confirm ?

@mmusich
Copy link
Contributor

mmusich commented Jun 26, 2024

I think it looks fixed in the latest commit. Can you confirm ?

admittedly I missed the last iteration, so I am not sure what was there before. It would be good to resolve the comment if it was solved already.
At any rate looks good to me.

@mmusich
Copy link
Contributor

mmusich commented Jun 26, 2024

+hlt

  • for rovere@d996d3d
  • the branch could use a squash, but I am not insisting for that to avoid re-triggering signatures

@mmusich
Copy link
Contributor

mmusich commented Jun 26, 2024

@cms-sw/upgrade-l2 I think we'd like to have this in time for the next pre-release, please take a look.

@rovere
Copy link
Contributor Author

rovere commented Jun 28, 2024

A gentle ping to @cms-sw/upgrade-l2

@srimanob
Copy link
Contributor

+Upgrade

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-sw/cmsdist#9231

@rappoccio
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants