Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use akCs4PFJets for candidate-based tagInfos for HI workflows #31674

Merged
merged 2 commits into from
Oct 6, 2020

Conversation

mandrenguyen
Copy link
Contributor

@mandrenguyen mandrenguyen commented Oct 5, 2020

PR description:

Update candidate based tagInfos to use HI jet collection (akCs4PFJets) in HI workflows.
This fixes an issue created in #30898, which is causing crashes in several HI workflows ( #31670 ).
The issue is that the b-tagging validation clones the 'patJets', but changes from the track-based tagInfos that we use to the candidate-based ones used nowadays in pp.
The solution is to update the candidate-based tagInfos to use akCs4PFJets, which was anyway on the to-do list for Run 3, when we plan update to those in the HI b-tagging sequence. This issue was spotted running wfs like 158.1.

There was also a trivial change to the name of the cleaned heavy ion genJet module, which had a conflict (affected wf 150, for example).

PR validation:

Tested on wf 150, 158.1 and 158.01. Please test on the other wfs identified in #31670

if this PR is a backport please specify the original PR and why you need to backport that PR:

Before submitting your pull requests, make sure you followed this checklist:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31674/18802

  • This PR adds an extra 20KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

A new Pull Request was created by @mandrenguyen (Matthew Nguyen) for master.

It involves the following packages:

DQMOffline/RecoB
PhysicsTools/PatAlgos
RecoBTag/ImpactParameter
RecoBTag/SoftLepton

@perrotta, @andrius-k, @kmaeshima, @ErnestaP, @cmsbuild, @jfernan2, @fioriNTU, @slava77, @jpata, @santocch can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @hatakeyamak, @emilbols, @peruzzim, @seemasharmafnal, @mmarionncern, @ahinzmann, @smoortga, @jdolen, @ferencek, @rociovilar, @jdamgov, @nhanvtran, @gkasieczka, @schoef, @andrzejnovak, @clelange, @riga, @JyothsnaKomaragiri, @gpetruc, @mariadalfonso this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Oct 5, 2020

@cmsbuild please test workflow 158.1, 158.2, 158.3, 159.1, 159.3, 159.4, 300.0, 301.0, 302.0, 130.0, 311.0, 312.0

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

The tests are being triggered in jenkins.
Test Parameters:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

-1

Tested at: 8474eb7

CMSSW: CMSSW_11_2_X_2020-10-05-1200
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e08935/9741/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test TestDQMOfflineConfiguration100 had ERRORS
---> test TestDQMOfflineConfiguration50 had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

Comparison job queued.

@slava77
Copy link
Contributor

slava77 commented Oct 5, 2020

---> test TestDQMOfflineConfiguration100 had ERRORS

Exception: 'Phase2C11M9' is not a valid option for '--era'

looks unrelated to this PR

---> test TestDQMOfflineConfiguration50 had ERRORS

has the same issue.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e08935/9741/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/130.0_SinglePiPt10+SinglePiPt10+DIGI+RECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/158.1_QCD_Pt_80_120_13_HI+QCD_Pt_80_120_13_HI+DIGIHI2018PPRECO+RECOHI2018PPRECO+HARVESTHI2018PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/158.2_PhotonJets_Pt_10_13_HI+PhotonJets_Pt_10_13_HI+DIGIHI2018PPRECO+RECOHI2018PPRECO+HARVESTHI2018PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/158.3_ZEEMM_13_HI+ZEEMM_13_HI+DIGIHI2018PPRECO+RECOHI2018PPRECO+HARVESTHI2018PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/159.1_QCD_Pt_80_120_14_HI_2021+QCD_Pt_80_120_14_HI_2021+DIGIHI2021PPRECO+RECOHI2021PPRECO+HARVESTHI2021PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/159.3_ZMM_14_HI_2021+ZMM_14_HI_2021+DIGIHI2021PPRECO+RECOHI2021PPRECO+HARVESTHI2021PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/159.4_ZEE_14_HI_2021+ZEE_14_HI_2021+DIGIHI2021PPRECO+RECOHI2021PPRECO+HARVESTHI2021PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/300.0_Pyquen_GammaJet_pt20_2760GeV+Pyquen_GammaJet_pt20_2760GeV+DIGIHIMIX+RECOHIMIX+HARVESTHI2018PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/301.0_Pyquen_DiJet_pt80to120_2760GeV+Pyquen_DiJet_pt80to120_2760GeV+DIGIHIMIX+RECOHIMIX+HARVESTHI2018PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/311.0_Pyquen_DiJet_pt80to120_2760GeV_2021+Pyquen_DiJet_pt80to120_2760GeV_2021+DIGIHI2021MIX+RECOHI2021MIX+HARVESTHI2021PPRECO
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-e08935/312.0_Pyquen_ZeemumuJets_pt10_2760GeV_2021+Pyquen_ZeemumuJets_pt10_2760GeV_2021+DIGIHI2021MIX+RECOHI2021MIX+HARVESTHI2021PPRECO

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 146 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2542225
  • DQMHistoTests: Total failures: 1616
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2540587
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@slava77
Copy link
Contributor

slava77 commented Oct 6, 2020

The solution is to update the candidate-based tagInfos to use akCs4PFJets, which was anyway on the to-do list for Run 3, when we plan update to those in the HI b-tagging sequence.

so, this is not really a bigfix, it's a feature change for AOD default b-tagging.
Please update the PR title to reflect this.

I'm still assuming that we stay in 11_X for HI miniAOD.
A more appropriate/targeted bugfix will be needed if there is still intent for a backport.

Copy link
Contributor

@slava77 slava77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like a limited bugfix would be to modify patJetsBDHadron in Validation/RecoB/python/BDHadronTrackValidation_cff.py

Updates in soft lepton taginfos are apparently not related to the crashes.
Are the updates made here going to produce usable results or is it likely that everything will need to be reconfigured in a significant way?


from Configuration.Eras.Modifier_pp_on_AA_2018_cff import pp_on_AA_2018
from Configuration.Eras.Modifier_pp_on_PbPb_run3_cff import pp_on_PbPb_run3
(pp_on_AA_2018 | pp_on_PbPb_run3).toModify(pfImpactParameterTagInfos, jets = "akCs4PFJets")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enough to correctly run pfImpactParameterTagInfos for HI?
(similar question applies to the rest).

Copy link
Contributor Author

@mandrenguyen mandrenguyen Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was not sufficient to modify pfImpactParamterTagInfos. The soft lepton tag infos had to use the consistent jet collection to avoid crashes. I first modified the input to pfImpactParamterTagInfos, but the code was still crashing until I updated the other tag infos. All of the modifications made are necessary to avoid crashes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean the soft lepton taggers.
My question was if the switch of the jet for pfImpactParameterTagInfos expected to provide performance of pfImpactParameterTagInfos acceptable for physics in HI or if more significant changes are needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between impactParamterTagInfos and pfImpactParameterTagInfos is basically technical. There should not be any significant difference in performance. I verified this some time ago. We will check this again before actually calculating b-tag discriminators based on the candidate tagInfos. At the moment, they are not really used for anything in the HI workflow besides this validation package, AFAIK.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa7725/9758/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 142 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2542113
  • DQMHistoTests: Total failures: 1625
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2540466
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -773.364 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 10024.0,... ): -48.369 KiB L1T/L1TStage2CaloLayer1
  • DQMHistoSizes: changed ( 136.731 ): -47.829 KiB L1T/L1TStage2CaloLayer1
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@silviodonato
Copy link
Contributor

merge
@cms-sw/analysis-l2 @cms-sw/dqm-l2 please let us know if you have comments

@Martin-Grunewald
Copy link
Contributor

It seems there is still a problem in an HIon workflow (HLT validation tests suite):

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_11_2_X_2020-10-07-2300/slc7_amd64_gcc820/RelVal_RECO_HIon_MC.log

----- Begin Fatal Exception 08-Oct-2020 07:51:28 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 12 event: 3302 stream: 3
   [1] Running path 'validation_step'
   [2] Calling method for module B2GHadronicHLTValidation/'b2gDiJetHLTValidatio\
n'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for a container with elements of type: reco::Jet
Looking for module label: ak8PFJetsPuppi
Looking for productInstanceName:

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exc\
eption,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSe\
t in the configuration.

----- End Fatal Exception -------------------------------------------------

@mandrenguyen
Copy link
Contributor Author

Hi @Martin-Grunewald
So this is different than the addOnTest I checked, right?:
addOnTests.py -t hlt_mc_HIon
Pease give me a command to reproduce this error, and I'll have a look.

@Martin-Grunewald
Copy link
Contributor

Yes, it runs some more steps after reco: --step=RAW2DIGI,L1Reco,RECO,EI,PAT,VALIDATION,DQM

In a recent developer area:

...
cd src
cmsenv
git cms-addpkg HLTrigger/Configuration
scram b
cd HLTrigger/Configuration/test
# create job files (takes a few minutes)
./cmsDriver.csh HIon
# prepare input file using cmsRun:
cmsRun RelVal_DigiL1RawHLT_HIon_MC.py >& RelVal_DigiL1RawHLT_HIon_MC.log
# run problematic cmsRunjob:
cmsRun RelVal_RECO_HIon_MC.py >& RelVal_RECO_HIon_MC.log

@mandrenguyen
Copy link
Contributor Author

This error was not picked up in the standard relval wf 159 because the validation is limited to VALIDATION:@standardValidationNoHLT+@miniAODValidation
I verified that no other crashes are present when simply removing b2gHLTriggerValidation from HLTriggerOffline.Common.HLTValidation_cff
I'm not immediately sure why an empty version of ak8PFJetsPuppi is not present. I will see if I can straighten that out before resorting to removing the validation sequence (although I can't imagine it's useful in heavy ions)

@slava77
Copy link
Contributor

slava77 commented Oct 8, 2020

I think that eventually the goal should be to cleanup the DQM/Validation for the HI setup.
It may be more practical to patch this up by inserting empty/dummy collections from the reco side for now.

@mandrenguyen
Copy link
Contributor Author

@slava77 Agreed. Is this jet collection dropped from the following commit you made?
d225fa1
It looks like it disables applySubstructure, which is where this jet collection is produced, right?

@santocch
Copy link

santocch commented Oct 8, 2020

+1

@slava77
Copy link
Contributor

slava77 commented Oct 8, 2020

It looks like it disables applySubstructure, which is where this jet collection is produced, right?

right, the creation of all modules is short-circuited

@jfernan2
Copy link
Contributor

jfernan2 commented Oct 8, 2020

+1
DQM Btag folder in wf140.56 has as many as 1617 histograms modified by this PR, I understand they are produced by the change of PF candidates
E.g. https://tinyurl.com/yyu62zpm

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants