Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tau ID data format #28417

Conversation

swozniewski
Copy link
Contributor

@swozniewski swozniewski commented Nov 18, 2019

PR description:

Features of this PR:

  • new data format for tau discriminators introduced (details see below). Changes AOD format. miniAOD and nanoAOD format remain unchanged, only changes in the processing at this level.
  • removes deprecated cut based IDs
  • negligible changes in output of combinedIsolationDeltaBetaCorr3Hits with dR=0.5 due to different input setting blessed by Tau Conveners.

New data format

  • defines struct of std::vector and std::vector to store multiple raw values and WP flags per tau, called SingleTauDiscriminatorContainer. Actual AOD data format TauDiscriminatorContainer (to replace PFTauDiscriminator in many places) defined as ValueMap of SingleTauDiscriminatorContainer. (PFTauDiscriminator is association vector of floats). This format is also used at PAT level to replace PATTauDiscriminator.
    see DataFormats/TauReco
    class definitions are updated

  • New instances of the TauIDProducerBase template compatible with the new data format are created

  • RECO producers for MVAID, cut based IDs, antiElectron/Muon ID, CutMultiplexer produce new data format and only run once for all WPs. In case of MVA IDs, the raw producer stages out raw value (and category value if applicable) in one object read by the cut multiplexer but not stored in AOD since Cut Multiplexer puts these values together with WP flags into a new instance of TauDiscriminatorContainer which is the stored in MiniAOD.
    Lists of objects to keep in AOD are updated and reduces drastically.

  • PATTauProducer is changed so that it can read both new and old data format, creating PATTaus in the same format as before. During initialization IDs are sorted so that IDs from one container will be read together.

  • ID producers running on MiniAOD create new data format for PAT Taus which is only transient but used to avoid unnecessary repetitive calls of the producers. CutMultiplexer is merged with its version for PFTaus into a templated version because of parallel structure and to apply major changes due to the new data format only once. PATTauIDEmbedder is adapted in a similar way as the PATTauProducer to read the new data format as well.

  • Deep Tau ID producer is changed such that it produces one instance of the new data format per e,mu,jet branch instead of many product instances. This required to change the internal dictionary structure for the working points by a vector structure in order keep the order of working points inserted into the new format safe.

  • Configurations of PATTauProducer and PATTauIDEmbedder uses helper functions defined in the respective python configurations in order to get the right indexes from the configuration of the input module. PATTauProducer in contrast is finding the right indexes via provenance which is a little more overhead but needed as this producer is also used to read stored data from disk (which can be produced with different configuration) while PATTauIDEmbedder only takes input that is created within the same workflow.

  • Configurations at NanoAOD level for IDs run on MiniAOD are updated

  • Kept producers with format where needed for HLT. HLT so far unchanged, apart from a minor formal config update that does not cause pyhsical changes. New data format could be introduced to HLT sequences in a separate step.

PR validation:

passed code checks
passed unit tests apart from some failing file accesses
passed matrix tests
central tau production lines run before/after and comparing resulting ID quantities:
reco + pat step run on a test AOD sample: comparison plots in /afs/cern.ch/user/s/swozniew/public/TauPOG/Dataformat/recoAndPATstep
nanoAOD step run on a test miniAOD sample: comparison plots in /afs/cern.ch/user/s/swozniew/public/TauPOG/Dataformat/nanoAOD
no differences in both cases apart from the small known difference mentioned above in the overview

no backport intended

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28417/12801

  • This PR adds an extra 252KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@swozniewski
Copy link
Contributor Author

Sorry, I only ran scram build code-checks, not code-format, will apply the formatting

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28417/12802

  • This PR adds an extra 276KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @swozniewski for master.

It involves the following packages:

CommonTools/ParticleFlow
Configuration/Skimming
DPGAnalysis/Skims
DQMOffline/L1Trigger
DQMOffline/Trigger
DataFormats/PatCandidates
DataFormats/TauReco
HLTrigger/Configuration
HLTriggerOffline/Tau
PhysicsTools/NanoAOD
PhysicsTools/PatAlgos
RecoTauTag/Configuration
RecoTauTag/RecoTau
Validation/RecoTau

@perrotta, @kmaeshima, @benkrikler, @cmsbuild, @chayanit, @zhenhu, @fwyzard, @schneiml, @andrius-k, @pgunnell, @rekovic, @jfernan2, @fgolf, @fioriNTU, @Martin-Grunewald, @slava77, @santocch, @peruzzim can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @hatakeyamak, @emilbols, @Martin-Grunewald, @ahinzmann, @geoff-smith, @seemasharmafnal, @mmarionncern, @kreczko, @JyothsnaKomaragiri, @makortel, @smoortga, @acaudron, @jhgoh, @peruzzim, @jdolen, @HuguesBrun, @cericeci, @ferencek, @trocino, @rociovilar, @rovere, @jdamgov, @missirol, @nhanvtran, @gkasieczka, @schoef, @mtosi, @andrzejnovak, @clelange, @HeinerTholen, @riga, @battibass, @Fedespring, @calderona, @mverzett, @cbernet, @gpetruc, @mariadalfonso, @pvmulder, @folguera this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@perrotta
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 18, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3523/console Started: 2019/11/18 17:56

@geoff-smith
Copy link
Contributor

@cmsbuild (or whoever): Could I please be removed from whatever list automatically includes me as a watcher in cmssw PRs? I have left the field and would like to stop receiving these emails.

@rekovic
Copy link
Contributor

rekovic commented Apr 6, 2020

+1

@silviodonato
Copy link
Contributor

@peruzzim ?

@peruzzim
Copy link
Contributor

peruzzim commented Apr 9, 2020

+xpog

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 9, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@silviodonato
Copy link
Contributor

I created two issues related to this PR, #29447 is very urgent because is breaking 11 workflows

cmsbuild added a commit that referenced this pull request Apr 14, 2020
Update PR #28417, TauSkimPFTausSelectedForMuTau reading anti-muon ID from new data format
cmsbuild added a commit that referenced this pull request Apr 14, 2020
Update PR #28417, add era modifier to TauRefProducer to read old data format from existing files
@swozniewski swozniewski deleted the CMSSW_11_0_X_tau-pog_restructureTauDiscriminators branch June 19, 2020 13:21
@silviodonato
Copy link
Contributor

Please see #30324. There is a typo in DQMOffline/L1Trigger/src/L1TTauOffline.cc workingsPoints vs workingPoints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.