-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ParticleNet mass regression #33483
Add ParticleNet mass regression #33483
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33483/22207
|
A new Pull Request was created by @hqucms (Huilin Qu) for master. It involves the following packages: PhysicsTools/NanoAOD @perrotta, @gouskos, @cmsbuild, @fgolf, @slava77, @jpata, @mariadalfonso can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
@mariadalfonso This needs to be tested together with cms-data/RecoBTag-Combined#44 |
please abort |
test parameters: pull_request = cms-data/RecoBTag-Combined#44 |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1c4177/14377/summary.html Comparison SummarySummary:
|
How does this part scale with the number of threads? How does this compare to the other taggers? was there some review in the ML group(s) done to confirm that this can not be somewhat safely reduced? |
we have now |
@slava77 The memory usage does not scale with the number of threads as the ONNXRuntime session is shared by all threads. |
|
I see, thanks for clarifying that this is semantically different. On a remotely related subject, is it eventually possible for a single model to compute the mass as well as the probabilities? |
Yes that's an interesting direction to go and we plan to look into that for the next iteration. |
Any comments @mariadalfonso @gouskos ? |
Per discussions with Slava, to prevent issues with relvals from externals, I removed the reco signature until cms-sw/cmsdist#6844 is merged. |
+reconstruction
|
kind reminder @cms-sw/xpog-l2 |
|
@@ -341,6 +344,7 @@ def nanoAOD_customizeCommon(process): | |||
nanoAOD_addDeepDoubleX_switch = cms.untracked.bool(False), | |||
nanoAOD_addDeepDoubleXV2_switch = cms.untracked.bool(False), | |||
nanoAOD_addParticleNet_switch = cms.untracked.bool(False), | |||
nanoAOD_addParticleNetMass_switch = cms.untracked.bool(True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this nanoAOD_addParticleNetMass_switch always True
here means that we will re-run always the regression and it's expensive (about 10% of the whole nanoAOD production).
line 347 should be set to False as default
then activate for the past (
i.e. nanoAOD_addParticleNetMass_switch = cms.untracked.bool(True), in line 360 and 367)
and then finally add a block
run2_nanoAOD_106Xv2.toModify(
nanoAOD_addDeepInfoAK8_switch,
nanoAOD_addParticleNetMass_switch = True
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariadalfonso I believe we need to re-run it until it becomes available in MiniAOD, so setting False
as the default does not seem a future-proof way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the miniAOD as the come out from this master (either Run2 or Run3): will contain or not the particleNetMass ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will contain. I guess maybe I am a bit confused with the many modifiers in NanoAOD. Specifically, does the modifier run2_nanoAOD_106Xv2
correspond specifically to NanoAOD V9, or generally any Nano (V10, V11...) as long as it runs over 106X MiniAODv2? If the former, then it means we will need to modify the code to enable re-calculating the mass regression every time a new NanoAOD version is introduced (as long as we are still running on 106X MiniAODv2), right? If the latter then I agree False
is a better default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nano modifiers are used to identify the mini in input.
So for the UL we have in production:
MiniAODv1 --> these have been used as input for the nanov8 and so all recipes need to be called with run2_nanoAOD_106Xv1 --> when nano is being produced one call it v8
MiniAODv2 were started in December '20 -->nano recipes will need to be called with the run2_nanoAOD_106Xv2 --> these nano will be also called v9
if NanoV10 will read MiniAODv2 or a newer mini we do not know.
Master anyway make fresh mini with the particleNetMass included, so we can avoid to recalculate.
I hope it's clearer.
Said that, we will need to do a major cleanup soon to prepare Run3 config, so we can restore the False as default at that time. I leave to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the clarification @mariadalfonso !
If you are fine then I would prefer to postpone the change to False
for the future cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's do like this
+xpog
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
Requires: cms-data/RecoBTag-Combined#44 (DNN model)
PR description:
The ParticleNet mass regression is a new algorithm that uses PF candidates and the ParticleNet architecture to improve the AK8 jet mass resolution, particularly for 2-prong jets (X->bb/X->cc/X->qq). The model is trained with the flat-m(H) HH4Q samples and QCD samples (same as used in the ParticleNet-MD tagger), and uses the same ParticleNet architecture. The training target is the generated particle mass for the Higgs jets, and gen-jet soft drop mass for the QCD jets. The resulting regressed mass shows a significant improvement in the mass resolution compared to the soft drop mass and also reduces the tails at low/high SD mass values. This new algorithm is being used by the full Run2 VHcc merged-jet topology analysis and the VBF HH4b boosted analysis and brings significant gain in sensitivity. More details of the algorithm can be found in the talks at ML Forum and JMAR.
This algorithm is requested to be included in NanoAODv9 by a few analyses. Therefore, a new training using the UL samples is performed and used in this PR. The performance of the UL training is summarized in this presentation.
PR validation:
Validated with a UL NanoAOD workflow.
Memory/time increase should be the same as
pfMassDecorrelatedParticleNetJetTags
as the network architecture is the same, i.e., ~20M and ~30ms per high pT AK8 jet.@selvaggi @gouskos @pmaksim1