-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add boosted taus to NanoAOD #150
Add boosted taus to NanoAOD #150
Conversation
…_tauIDtoolsDev Updates to tauID python tool
…eliger/cmssw into BoostedTauNanoAOD_CMSSW_11_3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aloeliger, thanks again for this development! I don't have further general comments in addition to the previous email. Please have a look at the detailed comments of this review. I think it's fine if you approach XPOG as soon as you feel that the remaining things are under control.
|
||
finalBoostedTaus = cms.EDFilter("PATTauRefSelector", | ||
src = cms.InputTag("slimmedTausBoostedNewID"), | ||
cut = cms.string("pt > 40 && tauID('decayModeFindingNewDMs') && (tauID('byLooseCombinedIsolationDeltaBetaCorr3Hits') || tauID('byVLooseIsolationMVArun2v1DBnewDMwLT') || tauID('byVLooseIsolationMVArun2v1DBdR03oldDMwLT') || tauID('byVVLooseIsolationMVArun2v1DBoldDMwLT') || tauID('byVVLooseIsolationMVArun2017v2DBoldDMwLT2017') || tauID('byVVLooseIsolationMVArun2017v2DBoldDMdR0p3wLT2017') || tauID('byVVLooseIsolationMVArun2017v2DBnewDMwLT2017'))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be aware that BDT based IDs will be dropped for standard taus ( cms-sw/cmssw@master...cms-tau-pog:CMSSW_11_3_X_tau-pog_nanoAODcleanUp ). In case of boosted taus where you don't have deepTauID (at the moment) you may want to keep it for now. However, please rethink which MVA ID versions you really want to keep for boosted taus. E.g. I imagine that 2017v1 could be dropped, maybe also newDM-version. In any case, keep in sync with tau tables below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all ID working points except for 2017v2 MVA working points
idDecayMode = Var("tauID('decayModeFinding')", bool), | ||
idDecayModeNewDMs = Var("tauID('decayModeFindingNewDMs')", bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to stay in sync with clean up linked in previous comment. This, however, is not necessarily settled yet (what concerns the DMs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New DM finding is required as one of the cuts. for the moment, since it is not taking up too many branches, I think it may be good to keep these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That it is already in the selection is actually the reason why we removed the quantity in the clean up. It's anyway always true. Therefore, I think you can remove it. (The selection does not depend on this definition here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't forget this issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have now been removed.
dxy = Var("leadChargedHadrCand().dxy()",float, doc="d_{xy} of lead track with respect to PV, in cm (with sign)",precision=10), | ||
dz = Var("leadChargedHadrCand().dz()",float, doc="d_{z} of lead track with respect to PV, in cm (with sign)",precision=14), | ||
|
||
# these are too many, we may have to suppress some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been removed
_mvaIsoVars2015 = cms.PSet( | ||
rawMVAnewDM = Var( "tauID('byIsolationMVArun2v1DBnewDMwLTraw')",float, doc="byIsolationMVArun2v1DBoldDMwLT raw output discriminator (2015)",precision=10), | ||
rawMVAoldDM = Var( "tauID('byIsolationMVArun2v1DBoldDMwLTraw')",float, doc="byIsolationMVArun2v1DBoldDMwLT raw output discriminator (2015)",precision=10), | ||
idMVAnewDM = _tauId6WPMask( "by%sIsolationMVArun2v1DBnewDMwLT", doc="IsolationMVArun2v1DBnewDMwLT ID working point (2015)"), | ||
idMVAoldDM = _tauId6WPMask( "by%sIsolationMVArun2v1DBoldDMwLT", doc="IsolationMVArun2v1DBoldDMwLT ID working point (2015)"), | ||
) | ||
_mvaIsoVars2017v1 = cms.PSet( | ||
rawMVAoldDM2017v1 = Var( "tauID('byIsolationMVArun2v1DBoldDMwLTraw')",float, doc="byIsolationMVArun2v1DBoldDMwLT raw output discriminator (2017v1)",precision=10), | ||
idMVAoldDM2017v1 = _tauId7WPMask( "by%sIsolationMVArun2v1DBoldDMwLT", doc="IsolationMVArun2v1DBoldDMwLT ID working point (2017v1)") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, these IDs are not needed. See above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a complete cleaning of all non 2017MVAv2 ID working points.
_mvaIsoVars2017v2 = cms.PSet( | ||
rawMVAoldDM2017v2=Var("tauID('byIsolationMVArun2017v2DBoldDMwLTraw2017')",float, doc="byIsolationMVArun2017v2DBoldDMwLT raw output discriminator (2017v2)",precision=10), | ||
rawMVAnewDM2017v2 = Var("tauID('byIsolationMVArun2017v2DBnewDMwLTraw2017')",float,doc='byIsolationMVArun2017v2DBnewDMwLT raw output discriminator (2017v2)',precision=10), | ||
rawMVAoldDMdR032017v2 = Var("tauID('byIsolationMVArun2017v2DBoldDMdR0p3wLTraw2017')",float,doc='byIsolationMVArun2017v2DBoldDMdR0p3wLT raw output discriminator (2017v2)'), | ||
idMVAnewDM2017v2 = _tauId7WPMask("by%sIsolationMVArun2017v2DBnewDMwLT2017", doc="IsolationMVArun2017v2DBnewDMwLT ID working point (2017v2)"), | ||
idMVAoldDM2017v2=_tauId7WPMask("by%sIsolationMVArun2017v2DBoldDMwLT2017",doc="IsolationMVArun2017v2DBoldDMwLT ID working point (2017v2)"), | ||
idMVAoldDMdR032017v2 = _tauId7WPMask("by%sIsolationMVArun2017v2DBoldDMdR0p3wLT2017",doc="IsolationMVArun2017v2DBoldDMdR0p3wLT ID working point (2017v2)"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the full set needed?
isoTrackSequence + jetLepSequence + # must be after all the leptons | ||
linkedObjects + | ||
jetTables + muonTables + tauTables + electronTables + photonTables + globalTables +vertexTables+ metTables+simpleCleanerTable + isoTrackTables | ||
jetTables + muonTables + tauTables + boostedTauTables + electronTables + photonTables + globalTables +vertexTables+ metTables+simpleCleanerTable + isoTrackTables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
) | ||
nanoSequenceOnlyFullSim = cms.Sequence(triggerObjectTables + l1bits) | ||
nanoSequenceOnlyData = cms.Sequence(protonTables + lhcInfoTable) | ||
|
||
nanoSequence = cms.Sequence(nanoSequenceCommon + nanoSequenceOnlyData + nanoSequenceOnlyFullSim) | ||
|
||
nanoSequenceFS = cms.Sequence(genParticleSequence + genVertexTables + particleLevelSequence + nanoSequenceCommon + jetMC + muonMC + electronMC + photonMC + tauMC + metMC + ttbarCatMCProducers + globalTablesMC + btagWeightTable + genWeightsTable + genVertexTable + genParticleTables + particleLevelTables + lheInfoTable + ttbarCategoryTable ) | ||
nanoSequenceFS = cms.Sequence(genParticleSequence + genVertexTables + particleLevelSequence + nanoSequenceCommon + jetMC + muonMC + electronMC + photonMC + tauMC + boostedTauMC + metMC + ttbarCatMCProducers + globalTablesMC + btagWeightTable + genWeightsTable + genVertexTable + genParticleTables + particleLevelTables + lheInfoTable + ttbarCategoryTable ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. In the lines below, this is done for something else and you can do it the same way.
# "deepTau2017v1", | ||
#"deepTau2017v2p1", | ||
# "DPFTau_2016_v0", | ||
# "DPFTau_2016_v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this, even if deepTau will be reintroduced for boosted taus some day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been removed as of #ddf4e8c as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like I worked on an outdated session. Sorry for that!
@@ -371,6 +397,7 @@ def nanoAOD_customizeCommon(process): | |||
addParticleNet=nanoAOD_addDeepInfoAK8_switch.nanoAOD_addParticleNet_switch, | |||
jecPayload=nanoAOD_addDeepInfoAK8_switch.jecPayload) | |||
(run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1).toModify(process, lambda p : nanoAOD_addTauIds(p)) | |||
(run2_nanoAOD_94XMiniAODv1 | run2_nanoAOD_94X2016 | run2_nanoAOD_94XMiniAODv2 | run2_nanoAOD_102Xv1 | run2_nanoAOD_106Xv1 | run2_nanoAOD_106Xv2).toModify(process, lambda p : nanoAOD_addBoostedTauIds(p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way era modifiers are used here looks unconventional to me (also in the line above which I guess you copied). To be clarified with XPOG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my message in the main thread. In case of standard taus, the reproduction of IDs is not the default, but necessery for several eras, which explains the line above. In your case, producing boosted tau IDs will be the default for now and you need to deactivate it for past eras, i.e. run2_nanoAOD_106Xv1 and earlier.
|
||
|
||
finalBoostedTaus = cms.EDFilter("PATTauRefSelector", | ||
src = cms.InputTag("slimmedTausBoostedNewID"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test sequence this crashes because there are only slimmedTausBoosted. Is this maybe a remainder from the miniAOD fix? Please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What era are you performing tests under? This sounds like the IDs are not being added and therefore the new slimmedTausBoostedNewID collection does not get made.
My command for reference:
cmsDriver.py step1 --filein "/store/relval/CMSSW_11_3_0_pre1/RelValTTbar_14TeV/MINIAODSIM/PU_113X_mcRun3_2021_realistic_v1-v1/10000/e41c73d6-dc6b-405a-8aa4-2f79d974a1ab.root" --mc --eventcontent NANOAODSIM --datatier NANOAODSIM --conditions auto:run2_mc --step NANO --nThreads 8 --era Run2_2016,run2_nanoAOD_106Xv2 -n 1000 --fileout file:testRelval.root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a 2016 sample but probably missed the run2_nanoAOD_106Xv2 modifier. Will try again. Thanks for the reference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, with this modifier, it runs successfully. I think what is currently problematic is that the boosted taus are always added but the IDs only rerun with the given modifiers. But after rearranging the modifiers as stated earlier, all cases should be runnable (independently from the question whether boosted taus are produced in the respective cases)
@swozniewski I have done a complete cleaning of all ID working points except the 2017v2 ones, and some of the commenting. I am still unsure which eras boosted taus should be enabled or disabled for. Which era would be appropriate to ensure that this is integrated successfully into nanoAODv9? |
@aloeliger thanks for the quick follow up! I will go through it and leave further comments if applicable. I just got confirmation from XPOG that we should introduce boosted taus by default (simply add it without using modifiers) and deactivate it for all past modifiers. This way, it works for all future campaigns. I hope that the following list of past modifiers is rather complete. If you are aware of other ones, feel free to add them. Otherwise XPOG can comment during final review. |
@swozniewski I am having some trouble figuring out the correct procedure for excluding boosted taus from previous eras properly, This is what I have for excluding the main sequence/tables
Is this correct? Later on in the configuration where the Tau ID functions are called I have:
which adds in the ID sequences by default for everything, but I am unsure of how to safely exclude previous eras. My first thought was some sort of |
Hi @aloeliger I see your point. The first part looks fine to me. Don't forget to treat the nanoSequenceFS similarly. |
Thanks @swozniewski, I think I have something that works now to remove the sequences properly. Are there further changes that need to be made? |
process.boostedTauSequence.insert(process.boostedTauSequence.index(getattr(process, "finalBoostedTaus")), | ||
process.rerunMvaIsolationSequenceBoosted) | ||
|
||
process.boostedTauSequence.insert(process.boostedTauSequence.index(getattr(process, "finalBoostedTaus")), | ||
getattr(process, updatedBoostedTauName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be done as for standard taus above? Btw. in the current case you don't need the getattr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean with only inserting the rerunMvaIsolationSequenceBoosted
? I tried that originally, but I believe this set-up is required. The above uses the tool to add to an already existing Tau ID task/sequence, but for boosted taus this does not exist, so it has to be added explicitly by the second. Or at least, that is my recollection of how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see. Thanks for clarification. Indeed there is an existing ID sequence for std taus.
I added another small comment/question already. I will have a closer look ~tomorrow and also run some checks. @mbluj in case you'd like to have a look as well and are not aware of this PR location. |
@swozniewski I looked at doing this, but the problem is that the the python registered type of
is
but the type of
is
So I would guess someone somewhere explicitly overwrote/defined the |
@aloeliger I checked again and noticed that the |
some polishing
@swozniewski I have looked at, tested and added your modifications. |
Ok, thanks! I've launched the standard integration test and if successful, will continue with integration to central cmssw. |
|
||
boostedTauTable.variables = _boostedTauVarsBase | ||
|
||
tauGenJets.GenParticles = cms.InputTag("prunedGenParticles") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This redefines tauGenJets
producer which is also used in configuration for standard taus. This cannot be done like this even if in both cases configurations of tauGenJets
are identical. So this should be removed from here or if necessary (i.e. collection from tau_cff cannot be used) a clone should be created.
Then, is new, boosted specific, table of genVisTaus needed or one created for standard taus can be reused (see further comments)?
Finally: I think prunedGenParticles
should be replaced by finalGenParticles
here for correct mother indexing (it will happen also for standard taus)
|
||
genVisBoostedTaus = cms.EDProducer("GenVisTauProducer", | ||
src = cms.InputTag("tauGenJetsSelectorAllHadrons"), | ||
srcGenParticles = cms.InputTag("prunedGenParticles") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed as one can use similar guys for standard taus (genVisTaus
).
If kept, I think prunedGenParticles
should be replaced by finalGenParticles
here for correct mother indexing (it will happen also for standard taus), consistently with L76.
srcGenParticles = cms.InputTag("prunedGenParticles") | ||
) | ||
|
||
genVisBoostedTauTable = cms.EDProducer("SimpleCandidateFlatTableProducer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this table needed? It basically does not differ from such table for standard taus called genVisTauTable
|
||
boostedTausMCMatchHadTauForTable = cms.EDProducer("MCMatcher", # cut on deltaR, deltaPt/Pt; pick best by deltaR | ||
src = boostedTauTable.src, # final reco collection | ||
matched = cms.InputTag("genVisBoostedTaus"), # generator level hadronic tau decays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above - probably genVisTaus for standard taus can be used instead.
|
||
boostedTauSequence = cms.Sequence(finalBoostedTaus) | ||
boostedTauTables = cms.Sequence(boostedTauTable) | ||
boostedTauMC = cms.Sequence(tauGenJets + tauGenJetsSelectorAllHadrons + genVisBoostedTaus + genVisBoostedTauTable + boostedTausMCMatchLepTauForTable + boostedTausMCMatchHadTauForTable + boostedTauMCTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust MC sequence to removals proposed in earlier comments.
Hello, Finally, I think that nanoAOD DQM configuration (PhysicsTools/NanoAOD/python/nanoDQM_cfi.py) should be also updated such that a set of new histgrams is added corresponding with new braches for boosted taus. I think that there is a tool to add them automatically (basing on a nanoAOD root file), but as histogram axis ranges should be anyway adjusted by hand it could be simpler just to copy counterpart entries of standard taus and then modify their names. |
@mbluj, I think if I understand correctly, this removes all MC based boosted tau sequences, because the gen-information is handled in the standard tau configuration? This also removes the need for the Also, I am not familiar with nanoAOD DQM. What is the best way to test my changes and additions for boosted tau code? |
I think you should do the following:
What concerns DQM: I have never tested it myself - there was a central test in nanoAOD gitlab testing suite. But adding histograms to the DQM is quite strightforward. If you want to check an automated way, please produce a nanoAOD root file with boosted taus and then run |
@swozniewski, @mbluj, I have made the requested changes and added boosted taus into the nanoAOD DQM |
The changes are fine with me - thanks! I assume that you tested that gen-info behave as expected, i.e. genVisTau jets are as in the previous version (modulo pT threshold). +1 |
I still need to check this, but it shouldn't take too long. |
Looks fine to me, too! I'm running the integration tests in parallel... |
@mbluj The GenVisTau stuff seems to behave the same way as before, and I removed any changes I made to any pt thresholds, I think it is fine? But I can check out a version without my changes and do individual comparisons if necessary. |
I believe it is fine, thank you. |
Tests are ok. Will proceed with PR to master. |
* Merge pull request cms-sw#33150 from cms-tau-pog/CMSSW_11_3_X_tau-pog_tauIDtoolsDev Updates to tauID python tool * Initial working commit of boosted taus in CMSSW_11_3 NanoAOD * Remove 2015 anti-E for 10_6v2 era compatibility in boosted taus * Initial working commit of boosted taus in CMSSW_11_3 NanoAOD * Remove 2015 anti-E for 10_6v2 era compatibility in boosted taus * Remove commented boosted tau nanoAOD code * Remove main nanoAOD config comments * Remove leading charged hadronic candidate dxy and dz * Update boosted tau configuration to remove excess ID variables * Fix removal of boosted tau vars base * Remove boosted tau sequences from previous eras * Remove redundant decay mode information * some polishing * change Gen information for boosted taus * Update nanoDQM for boosted taus Co-authored-by: cmsbuild <cmsbuild@cern.ch> Co-authored-by: Andrew Loeliger <aloelige@cern.ch> Co-authored-by: Andrew David Loeliger <andrew.loeliger@cern.ch>
No description provided.