-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Tau ID data format #28417
New Tau ID data format #28417
Conversation
The code-checks are being triggered in jenkins. |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28417/12801
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
Sorry, I only ran scram build code-checks, not code-format, will apply the formatting |
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28417/12802
|
A new Pull Request was created by @swozniewski for master. It involves the following packages: CommonTools/ParticleFlow @perrotta, @kmaeshima, @benkrikler, @cmsbuild, @chayanit, @zhenhu, @fwyzard, @schneiml, @andrius-k, @pgunnell, @rekovic, @jfernan2, @fgolf, @fioriNTU, @Martin-Grunewald, @slava77, @santocch, @peruzzim can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins. |
@cmsbuild (or whoever): Could I please be removed from whatever list automatically includes me as a watcher in cmssw PRs? I have left the field and would like to stop receiving these emails. |
+1 |
+xpog |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
I created two issues related to this PR, #29447 is very urgent because is breaking 11 workflows |
Update PR #28417, TauSkimPFTausSelectedForMuTau reading anti-muon ID from new data format
Update PR #28417, add era modifier to TauRefProducer to read old data format from existing files
Please see #30324. There is a typo in |
PR description:
Features of this PR:
New data format
defines struct of std::vector and std::vector to store multiple raw values and WP flags per tau, called SingleTauDiscriminatorContainer. Actual AOD data format TauDiscriminatorContainer (to replace PFTauDiscriminator in many places) defined as ValueMap of SingleTauDiscriminatorContainer. (PFTauDiscriminator is association vector of floats). This format is also used at PAT level to replace PATTauDiscriminator.
see DataFormats/TauReco
class definitions are updated
New instances of the TauIDProducerBase template compatible with the new data format are created
RECO producers for MVAID, cut based IDs, antiElectron/Muon ID, CutMultiplexer produce new data format and only run once for all WPs. In case of MVA IDs, the raw producer stages out raw value (and category value if applicable) in one object read by the cut multiplexer but not stored in AOD since Cut Multiplexer puts these values together with WP flags into a new instance of TauDiscriminatorContainer which is the stored in MiniAOD.
Lists of objects to keep in AOD are updated and reduces drastically.
PATTauProducer is changed so that it can read both new and old data format, creating PATTaus in the same format as before. During initialization IDs are sorted so that IDs from one container will be read together.
ID producers running on MiniAOD create new data format for PAT Taus which is only transient but used to avoid unnecessary repetitive calls of the producers. CutMultiplexer is merged with its version for PFTaus into a templated version because of parallel structure and to apply major changes due to the new data format only once. PATTauIDEmbedder is adapted in a similar way as the PATTauProducer to read the new data format as well.
Deep Tau ID producer is changed such that it produces one instance of the new data format per e,mu,jet branch instead of many product instances. This required to change the internal dictionary structure for the working points by a vector structure in order keep the order of working points inserted into the new format safe.
Configurations of PATTauProducer and PATTauIDEmbedder uses helper functions defined in the respective python configurations in order to get the right indexes from the configuration of the input module. PATTauProducer in contrast is finding the right indexes via provenance which is a little more overhead but needed as this producer is also used to read stored data from disk (which can be produced with different configuration) while PATTauIDEmbedder only takes input that is created within the same workflow.
Configurations at NanoAOD level for IDs run on MiniAOD are updated
Kept producers with format where needed for HLT. HLT so far unchanged, apart from a minor formal config update that does not cause pyhsical changes. New data format could be introduced to HLT sequences in a separate step.
PR validation:
passed code checks
passed unit tests apart from some failing file accesses
passed matrix tests
central tau production lines run before/after and comparing resulting ID quantities:
reco + pat step run on a test AOD sample: comparison plots in
/afs/cern.ch/user/s/swozniew/public/TauPOG/Dataformat/recoAndPATstep
nanoAOD step run on a test miniAOD sample: comparison plots in
/afs/cern.ch/user/s/swozniew/public/TauPOG/Dataformat/nanoAOD
no differences in both cases apart from the small known difference mentioned above in the overview
no backport intended