Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calotower phase-out from PF code #43612

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

swagata87
Copy link
Contributor

PR description:

To address #43264.
In SuperClusterImporter.cc, the value of H/E from old method (calotower-based) and new method (rechit-based) were same. (This is checked for a few events).

Offline reco config and Phase2 HLT config were accordingly modified. Note that for Run3 HLT config, no change is needed, because Run3 HLT does not use SuperClusterImporter.

PR validation:

Checked with 12434.0_TTbar_14TeV+2023

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 19, 2023

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43612/38262

  • This PR adds an extra 40KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @swagata87 (Swagata Mukherjee) for master.

It involves the following packages:

  • HLTrigger/Configuration (hlt)
  • RecoHI/Configuration (reconstruction)
  • RecoParticleFlow/PFProducer (reconstruction)

@Martin-Grunewald, @mandrenguyen, @mmusich, @jfernan2, @cmsbuild can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @mandrenguyen, @yenjie, @missirol, @rovere, @dgulhan, @mmarionncern, @yetkinyilmaz, @lgray, @kurtejung, @MiheeJo, @seemasharmafnal, @hatakeyamak, @silviodonato, @jazzitup, @SohamBhattacharya this is something you requested to watch as well.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@Martin-Grunewald
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8bc232/36596/summary.html
COMMIT: a32e105
CMSSW: CMSSW_14_0_X_2023-12-19-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43612/36596/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@swagata87 swagata87 marked this pull request as ready for review December 20, 2023 11:18
@Martin-Grunewald
Copy link
Contributor

Hi @swagata87
I see there are differences (eg, reco comparisons), while according to the PR description, you seem to assume there should not be any differences. Could you please comment?

@mmusich
Copy link
Contributor

mmusich commented Jan 2, 2024

I see there are differences (eg, reco comparisons), while according to the PR description, you seem to assume there should not be any differences. Could you please comment?

hello @swagata87 (happy new year!). Do you have any new input on the question above from Martin?
Thank you.

@swagata87
Copy link
Contributor Author

hey Marco, Martin, all,
Happy New year!
My new year was going very happily until you reminded me of calotowers :-D :-D
I will get back to this soon, hopefully by the end of this week. Maybe a re-test of the PR will be useful to be sure that the differences we see are not transient. In any case, I will check these soon.
Cheers,
Swagata

@Martin-Grunewald
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 3, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8bc232/36701/summary.html
COMMIT: a32e105
CMSSW: CMSSW_14_0_X_2024-01-02-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43612/36701/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 156 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3247277
  • DQMHistoTests: Total failures: 16
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3247239
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 161 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 4, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@swagata87
Copy link
Contributor Author

type pf

@cmsbuild cmsbuild added the pf label Jan 4, 2024
@swagata87
Copy link
Contributor Author

FYI @bellan

@rappoccio
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit dc368ad into cms-sw:master Jan 8, 2024
11 checks passed
@mmusich
Copy link
Contributor

mmusich commented Jan 9, 2024

@swagata87,
after merging this PR (in CMSSW_14_0_X_2024-01-08-2300) there are widespread failures in IB relvals, with exceptions of the type:

---- Begin Fatal Exception 09-Jan-2024 03:22:17 CET-----------------------
An exception of category 'Conditions mismatch' occurred while
   [0] Processing  Event run: 301998 lumi: 76 event: 77970383 stream: 0
   [1] Running path 'dqmoffline_9_step'
   [2] Prefetching for module SMPDQM/'SMPDQM'
   [3] Prefetching for module MuonProducer/'muons'
   [4] Prefetching for module PFProducer/'particleFlowTmp'
   [5] Calling method for module PFBlockProducer/'particleFlowBlock'
Exception Message:
Requested conditions of type HcalPFCuts for cell (0x4538403f) (HE 16,63,3) got conditions for cell (0x0) 
----- End Fatal Exception ------------------------------------------------- 

See e.g. here
curiously, as far as I can tell all affected workflows are for 2017, perhaps there is a condition problem in the relevant IoV or the run2_data GT? (tagging @cms-sw/alca-l2).

@swagata87
Copy link
Contributor Author

ok I will check shortly. Probably we can use it only for Run3 and Phase2 by using a era, and keep things as before for Run2/Run1.

@swagata87
Copy link
Contributor Author

it's not clear to me how to reproduce the crash.
if someone can share a recipe to reproduce it, then it will be easier to look into the problem.
thanks!

@mmusich
Copy link
Contributor

mmusich commented Jan 9, 2024

it's not clear to me how to reproduce the crash.

cmsrel CMSSW_14_0_X_2024-01-08-2300
cd CMSSW_14_0_X_2024-01-08-2300/src
cmsenv
runTheMatrix.py -l 136.8 -t 4 -j 8 --ibeos

reproduced for me.

@swagata87
Copy link
Contributor Author

ok I think I have understood the issue. Let me try to explain and then we can discuss how to solve it.

errors are like this

Requested conditions of type HcalPFCuts for cell (0x4538403f) (HE 16,63,3) got conditions for cell (0x0)

now, in run3 if we look at the 16th ieta tower , the HE is only depth 4.
So in run3 GTs, (HE 16, whatever, 3) does not exist, and that's fine.
But, run2 GTs need to contain the (HE 16, whatever, 3) entry, because if we look at the 2017 photo, its clear that at the 16th ieta tower , the HE is only depth 3.
hcal_16

So my question is, is it possible to include this missing depth in GT? tagging @abdoulline

Otherwise we have to change the code and make sure that run1 / run2 don't use the HCalCutsFromDB feature.
But since the feature exists also in Run2 GTs, maybe adding missing entries is doable?
Let me know!

@abdoulline
Copy link

Let me try to understand what's happening and to look back specifically at 2017 PFCuts - if there is any omission. 2017 had a special HE module with Phase1 (like 2018), but the bulk HE was Phase0.
Actually I'm wondering why there was no issues with 2017 wfs previously... 🤔

@swagata87
Copy link
Contributor Author

Previously we always used this for Run3 and Phase2 only. So the issue was not uncovered.
This time I did not use the Era.

@abdoulline
Copy link

abdoulline commented Jan 9, 2024

Previously we always used this for Run3 and Phase2 only. So the issue was not uncovered. This time I did not use the Era.

OK, thanks. At the moment I cannot exclude that Run1,2 may need to avoid using the HCalCutsFromDB feature.
I mean if there is a deficiency for 2017 IOVs in all the earlier produced conditions, it would mean quite massive GTs reshuffling (making new HCAL multi-IOV tags and then producing new GTs and implementing their usage everywhere)...

At the moment I can only say that in 2017 Geometry (IOV in question 287446 - 309054)
the readout/cell (HE 16,63,3) doesn't exist in any HCAL conditions, only (HE 16,63,2) and (HE 16,63,4)
In 2017 there was a special HE module (with Phase1 prototype SiPMs in 7 depths, re-merged into Phase0 RecHits at the end of RECO ) like this with iphi=63-66.

@mmusich
Copy link
Contributor

mmusich commented Jan 9, 2024

I mean if there is a deficiency for 2017 IOVs in all the earlier produced conditions, it would mean quite massive GTs reshuffling (remaking HCAL tags and then producing new GTs)...

if it's confirmed there's a deficiency in the GTs, it's IMHO better to remove bugged conditions in the long run. Sweeping the dust under the carpet will certainly lead to a time bomb.

@abdoulline
Copy link

@mmusich
Don't get me wrong, I meant "short-term" removal of Run1/2 from conditions usage as a simplest option, so that in longer-term GTs could be "repaired" and then reconsidered once again.

I've added to the previous comment that I don't see conditions problem at the moment...

@mmusich
Copy link
Contributor

mmusich commented Jan 9, 2024

I've added to the previous comment that I don't see conditions problem at the moment...

At the moment I can only say that in 2017 Geometry (IOV in question 287446 - 309054)
the readout/cell (HE 16,63,3) doesn't exist in any HCAL conditions, only (HE 16,63,2) and (HE 16,63,4)

@abdoulline, if it helps, there are also failures in MC (wf 25215.17: GluGluHToGG_M125_Pow_MINLO_NNLOPS_py8_13UP17) in which the missing cell is not 16,63,3:

---- Begin Fatal Exception 09-Jan-2024 05:37:03 CET-----------------------
An exception of category 'Conditions mismatch' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 3 stream: 1
   [1] Running path 'dqmofflineOnPAT_1_step'
   [2] Prefetching for module SingleTopTChannelLeptonDQM_miniAOD/'singleTopElectronMediumDQM_miniAOD'
   [3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
   [4] Prefetching for module PATMuonSelector/'selectedPatMuons'
   [5] Prefetching for module PATMuonProducer/'patMuons'
   [6] Prefetching for module MuonProducer/'muons'
   [7] Prefetching for module PFProducer/'particleFlowTmp'
   [8] Calling method for module PFBlockProducer/'particleFlowBlock'
Exception Message:
Requested conditions of type HcalPFCuts for cell (0x45184440) (HE 17,64,1) got conditions for cell (0x0) 
----- End Fatal Exception -------------------------------------------------

see e.g. here

@abdoulline
Copy link

abdoulline commented Jan 9, 2024

Thanks, Marco.
A special HEP17 module included all the ietas from 16 to 29 (and iphi = 63-66), so it's not only ieta=16 which is involved, indeed...

The good news is that I've understood what's the culprit...
All other HCAL conditions for this HEP17 Phase1 module contain "primary" (Phase1) ieta/iphi/depth, but after "collapsing" all the Phase1 cells of this special module to make them correspond to the bulk of 2017 Phase0 HE, there is a reassignment of the HEP17 depths back to Phase0.
It's OK for all HCAL conditions, applied to "primary" Digis (Phase1 for HEP17), but PFCuts, which are applied on the resulting/"collapsed" HcalRecHits...

So, I suppose that 2016 PFCuts would work for 2017. I'll try to check it next days (by the end of the week), but unfortunately after Xmas break there appeared several urgent issues requiring my attention...

@mmusich
Copy link
Contributor

mmusich commented Jan 9, 2024

@abdoulline thank you for the detailed analysis.

So, I suppose that 2016 PFCuts would work for 2017. I'll try to check it next days (by the end of the week), but unfortunately after Xmas break there appeared several urgent issues requiring my attention...

do you mean using the 2016 PFCuts tout-court ?
I tried to do that, by adding the following snipped:

process.GlobalTag.toGet = cms.VPSet(
    cms.PSet(record = cms.string("HcalPFCutsRcd"),
             tag = cms.string("HcalPFCuts_2016_v1.0_mc"),
             connect = cms.string("frontier://FrontierProd/CMS_CONDITIONS")
             )
)

(this should make the job consume the payload with hash fae877e66505ad86ecd01daa103b79a97c144b16, which also corresponds to the [IOV 1-287446] of the data tag HcalPFCuts_v1.0_offline)

to the configuration resulting from running the step3 of wf 136.8 (as in #43612 (comment)), but that immediately fails with:

----- Begin Fatal Exception 09-Jan-2024 12:39:21 CET-----------------------
An exception of category 'Conditions mismatch' occurred while
   [0] Processing  Event run: 301998 lumi: 76 event: 78030333 stream: 1
   [1] Running path 'dqmoffline_9_step'
   [2] Prefetching for module SMPDQM/'SMPDQM'
   [3] Prefetching for module MuonProducer/'muons'
   [4] Prefetching for module PFProducer/'particleFlowTmp'
   [5] Calling method for module PFBlockProducer/'particleFlowBlock'
Exception Message:
Requested conditions of type HcalPFCuts for cell (0x43183815) (HB 14,21,1) got conditions for cell (0x43101423) (HB -5,35,1)
----- End Fatal Exception -------------------------------------------------

@abdoulline
Copy link

Thanks for a quick check, Marco !
Yes, I thought the way you've tried it would work, but I forgot about the "dense index"(for given DetId) stored in the DB. And which is different for the same physics (HB, HE) channel ieta/iphi/depth in 2016 and 2017.
Now it seems to be more complicated, I don't have an immediate solution...

@abdoulline
Copy link

abdoulline commented Jan 9, 2024

At the end there seems to be no simple/elegant solution for 2017 DB PFCuts issue...
The only workaround I could imagine is to use
usePFThresholdsFromDB = False
specifically for 2017 using existing modifier like in some other configs:
https://cmssdt.cern.ch/lxr/source/SimCalorimetry/HcalSimProducers/python/hcalSimParameters_cfi.py#0137

Sorry for not having foreseen this issue earlier...

@mmusich
Copy link
Contributor

mmusich commented Jan 9, 2024

At the end there seems to be no simple/elegant solution for 2017 DB PFCuts issue...

maybe I am missing something trivial, but could it not be populated using the existing 2017 payload, but adding more "fake" cells for the special HEP17 module included all the ietas from 16 to 29 (and iphi = 63-66) ?

@abdoulline
Copy link

abdoulline commented Jan 9, 2024

maybe I am missing something trivial, but could it not be populated using the existing 2017 payload, but adding more "fake" cells for the special HEP17 module included all the ietas from 16 to 29 (and iphi = 63-66) ?

DB array is populated sequentially using dense index (sequential one calculated by Geometry methods for given detector topology with given number of channels), so
for 2017 Geometry the cell in question (HE 16,63,3) doesn't exist ("illegal") and is not included in the dense index...
So added "fake" cells will not be properly parsed to store them into DB.

@swagata87
Copy link
Contributor Author

tried to put in a fix here: #43681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants