Add ParticleNet mass regression #33483

hqucms · 2021-04-20T23:35:05Z

Requires: cms-data/RecoBTag-Combined#44 (DNN model)

PR description:

The ParticleNet mass regression is a new algorithm that uses PF candidates and the ParticleNet architecture to improve the AK8 jet mass resolution, particularly for 2-prong jets (X->bb/X->cc/X->qq). The model is trained with the flat-m(H) HH4Q samples and QCD samples (same as used in the ParticleNet-MD tagger), and uses the same ParticleNet architecture. The training target is the generated particle mass for the Higgs jets, and gen-jet soft drop mass for the QCD jets. The resulting regressed mass shows a significant improvement in the mass resolution compared to the soft drop mass and also reduces the tails at low/high SD mass values. This new algorithm is being used by the full Run2 VHcc merged-jet topology analysis and the VBF HH4b boosted analysis and brings significant gain in sensitivity. More details of the algorithm can be found in the talks at ML Forum and JMAR.

This algorithm is requested to be included in NanoAODv9 by a few analyses. Therefore, a new training using the UL samples is performed and used in this PR. The performance of the UL training is summarized in this presentation.

PR validation:

Validated with a UL NanoAOD workflow.

Memory/time increase should be the same as pfMassDecorrelatedParticleNetJetTags as the network architecture is the same, i.e., ~20M and ~30ms per high pT AK8 jet.

@selvaggi @gouskos @pmaksim1

cmsbuild · 2021-04-20T23:41:08Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33483/22207

This PR adds an extra 52KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File PhysicsTools/NanoAOD/python/nano_cff.py modified in PR(s): Refactoring gen weight storage in EDM + Nano integration #32167

cmsbuild · 2021-04-20T23:41:30Z

A new Pull Request was created by @hqucms (Huilin Qu) for master.

It involves the following packages:

PhysicsTools/NanoAOD
PhysicsTools/PatAlgos
RecoBTag/ONNXRuntime

@perrotta, @gouskos, @cmsbuild, @fgolf, @slava77, @jpata, @mariadalfonso can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @jdolen, @swertz, @JyothsnaKomaragiri, @ahinzmann, @schoef, @emilbols, @swozniewski, @jdamgov, @mbluj, @nhanvtran, @gkasieczka, @clelange, @hatakeyamak, @ferencek, @gpetruc, @andrzejnovak, @mariadalfonso, @seemasharmafnal, @mmarionncern this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

mariadalfonso · 2021-04-21T08:14:41Z

please test

hqucms · 2021-04-21T08:18:04Z

please test

@mariadalfonso This needs to be tested together with cms-data/RecoBTag-Combined#44

mariadalfonso · 2021-04-21T08:39:23Z

please abort

mariadalfonso · 2021-04-21T08:42:12Z

test parameters:

pull_request = cms-data/RecoBTag-Combined#44

mariadalfonso · 2021-04-21T08:42:47Z

please test

cmsbuild · 2021-04-21T11:53:16Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1c4177/14377/summary.html
COMMIT: 52973d8
CMSSW: CMSSW_12_0_X_2021-04-20-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33483/14377/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 3632 differences found in the comparisons
DQMHistoTests: Total files compared: 38
DQMHistoTests: Total histograms compared: 2877046
DQMHistoTests: Total failures: 1
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2877023
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.292 KiB( 37 files compared)
DQMHistoSizes: changed ( 1325.81,... ): 0.146 KiB Physics/NanoAODDQM
Checked 160 log files, 37 edm output root files, 38 DQM output files
TriggerResults: no differences found

slava77 · 2021-04-27T13:13:06Z

For the memory, it's actually ~20M:

How does this part scale with the number of threads?
It looks like 15MB are per thread.

How does this compare to the other taggers?

was there some review in the ML group(s) done to confirm that this can not be somewhat safely reduced?

slava77 · 2021-04-27T13:18:42Z

we have now pfMassDecorrelatedParticleNetJetTags and pfParticleNetMassRegressionJetTags
as "modifiers" of pfParticleNetJetTags.
There is clearly some inconsistency in naming (prefix vs postfix); sticking to the existing pattern is often a way to go.
This better be resolved, unless there is a good justification.

hqucms · 2021-04-27T17:01:16Z

For the memory, it's actually ~20M:

How does this part scale with the number of threads?
It looks like 15MB are per thread.

How does this compare to the other taggers?

was there some review in the ML group(s) done to confirm that this can not be somewhat safely reduced?

@slava77 The memory usage does not scale with the number of threads as the ONNXRuntime session is shared by all threads.

hqucms · 2021-04-27T17:03:58Z

we have now pfMassDecorrelatedParticleNetJetTags and pfParticleNetMassRegressionJetTags
as "modifiers" of pfParticleNetJetTags.
There is clearly some inconsistency in naming (prefix vs postfix); sticking to the existing pattern is often a way to go.
This better be resolved, unless there is a good justification.

pfMassDecorrelatedParticleNetJetTags is a jet tagger (i.e., a classifier) while pfParticleNetMassRegressionJetTags is a mass regression algorithm, so I think it's better to make a distinction in the names. In fact, if it's not that the btagging code assumes the name ends with JetTags I would remove that in pfParticleNetMassRegressionJetTags to be more clear.

slava77 · 2021-04-27T18:11:31Z

we have now pfMassDecorrelatedParticleNetJetTags and pfParticleNetMassRegressionJetTags
as "modifiers" of pfParticleNetJetTags.
There is clearly some inconsistency in naming (prefix vs postfix); sticking to the existing pattern is often a way to go.
This better be resolved, unless there is a good justification.

pfMassDecorrelatedParticleNetJetTags is a jet tagger (i.e., a classifier) while pfParticleNetMassRegressionJetTags is a mass regression algorithm, so I think it's better to make a distinction in the names. In fact, if it's not that the btagging code assumes the name ends with JetTags I would remove that in pfParticleNetMassRegressionJetTags to be more clear.

I see, thanks for clarifying that this is semantically different.
Please keep JetTags at the end, as this still follows the TagInfos->JetTags sequence.
So, no action necessary in the code.

On a remotely related subject, is it eventually possible for a single model to compute the mass as well as the probabilities?

hqucms · 2021-04-27T19:23:19Z

On a remotely related subject, is it eventually possible for a single model to compute the mass as well as the probabilities?

Yes that's an interesting direction to go and we plan to look into that for the next iteration.

hqucms · 2021-04-28T09:39:31Z

Any comments @mariadalfonso @gouskos ?

jpata · 2021-04-29T12:31:58Z

Per discussions with Slava, to prevent issues with relvals from externals, I removed the reco signature until cms-sw/cmsdist#6844 is merged.

jpata · 2021-04-30T15:20:23Z

+reconstruction

new ParticleNet mass regression in MiniAOD and NanoAOD
no unexpected reco differences, we observe a new discriminator in the list
computing performance is under control: Add ParticleNet mass regression #33483 (comment)
requires Add AK8 ParticleNet mass regression model cms-data/RecoBTag-Combined#44 and Update tag for RecoBTag-Combined to V01-11-00 cmsdist#6844 (both are merged)

silviodonato · 2021-05-03T07:34:32Z

kind reminder @cms-sw/xpog-l2

gouskos · 2021-05-04T14:59:18Z

kind reminder @cms-sw/xpog-l2
@silviodonato Soon we will be done with the nanoAOD tests

mariadalfonso · 2021-05-06T15:57:46Z

PhysicsTools/NanoAOD/python/nano_cff.py

@@ -341,6 +344,7 @@ def nanoAOD_customizeCommon(process):
        nanoAOD_addDeepDoubleX_switch = cms.untracked.bool(False),
        nanoAOD_addDeepDoubleXV2_switch = cms.untracked.bool(False),
        nanoAOD_addParticleNet_switch = cms.untracked.bool(False),
+        nanoAOD_addParticleNetMass_switch = cms.untracked.bool(True),


so this nanoAOD_addParticleNetMass_switch always True here means that we will re-run always the regression and it's expensive (about 10% of the whole nanoAOD production).

line 347 should be set to False as default
then activate for the past (
i.e. nanoAOD_addParticleNetMass_switch = cms.untracked.bool(True), in line 360 and 367)

and then finally add a block
run2_nanoAOD_106Xv2.toModify(
nanoAOD_addDeepInfoAK8_switch,
nanoAOD_addParticleNetMass_switch = True
)

@mariadalfonso I believe we need to re-run it until it becomes available in MiniAOD, so setting False as the default does not seem a future-proof way to go.

the miniAOD as the come out from this master (either Run2 or Run3): will contain or not the particleNetMass ?

Yes it will contain. I guess maybe I am a bit confused with the many modifiers in NanoAOD. Specifically, does the modifier run2_nanoAOD_106Xv2 correspond specifically to NanoAOD V9, or generally any Nano (V10, V11...) as long as it runs over 106X MiniAODv2? If the former, then it means we will need to modify the code to enable re-calculating the mass regression every time a new NanoAOD version is introduced (as long as we are still running on 106X MiniAODv2), right? If the latter then I agree False is a better default.

Nano modifiers are used to identify the mini in input.

So for the UL we have in production:
MiniAODv1 --> these have been used as input for the nanov8 and so all recipes need to be called with run2_nanoAOD_106Xv1 --> when nano is being produced one call it v8
MiniAODv2 were started in December '20 -->nano recipes will need to be called with the run2_nanoAOD_106Xv2 --> these nano will be also called v9

if NanoV10 will read MiniAODv2 or a newer mini we do not know.

Master anyway make fresh mini with the particleNetMass included, so we can avoid to recalculate.

I hope it's clearer.

Said that, we will need to do a major cleanup soon to prepare Run3 config, so we can restore the False as default at that time. I leave to you.

Thanks a lot for the clarification @mariadalfonso !

If you are fine then I would prefer to postpone the change to False for the future cleanup.

ok, let's do like this

mariadalfonso · 2021-05-07T08:36:01Z

+xpog

add one the mass regression variable to complement the ParticleNet classifiers.
1 training is used all the nano eras and years

cmsbuild · 2021-05-07T08:36:25Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

qliphy · 2021-05-07T12:11:01Z

+1

hqucms added 3 commits March 17, 2021 14:41

Add ParticleNet AK8 jet mass regression.

407f0a0

Add ParticleNet mass regression to MiniAOD.

67c1aef

Add ParticleNet mass regression to NanoAOD.

52973d8

cmsbuild added this to the CMSSW_12_0_X milestone Apr 20, 2021

cmsbuild added code-checks-pending orp-pending pending-signatures reconstruction-pending tests-pending xpog-pending labels Apr 20, 2021

cmsbuild added code-checks-approved and removed code-checks-pending labels Apr 20, 2021

cmsbuild added tests-started and removed tests-pending labels Apr 21, 2021

mariadalfonso mentioned this pull request Apr 21, 2021

features for V9 nanoAOD cms-nanoAOD/cmssw#555

Closed

55 tasks

cmsbuild added tests-pending and removed tests-started labels Apr 21, 2021

cmsbuild added the requires-external label Apr 21, 2021

cmsbuild added tests-started and removed tests-pending labels Apr 21, 2021

mariadalfonso mentioned this pull request Apr 21, 2021

energy regression for H(Z)-->bb/cc cms-nanoAOD/cmssw#563

Closed

2 tasks

cmsbuild removed the tests-started label Apr 21, 2021

jpata mentioned this pull request Apr 27, 2021

Add AK8 ParticleNet mass regression model cms-data/RecoBTag-Combined#44

Merged

mrodozov mentioned this pull request Apr 27, 2021

Update tag for RecoBTag-Combined to V01-11-00 cms-sw/cmsdist#6844

Merged

cmsbuild added reconstruction-pending and removed reconstruction-approved labels Apr 29, 2021

hqucms mentioned this pull request Apr 30, 2021

[106X] Add ParticleNet mass regression #33588

Merged

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Apr 30, 2021

cmsbuild mentioned this pull request May 3, 2021

Clean up tau related nanoAOD content #33513

Merged

mariadalfonso reviewed May 6, 2021

View reviewed changes

cmsbuild added fully-signed xpog-approved and removed pending-signatures xpog-pending labels May 7, 2021

cmsbuild added orp-approved and removed orp-pending labels May 7, 2021

cmsbuild merged commit 074f0c6 into cms-sw:master May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ParticleNet mass regression #33483

Add ParticleNet mass regression #33483

hqucms commented Apr 20, 2021 •

edited

Loading

cmsbuild commented Apr 20, 2021

cmsbuild commented Apr 20, 2021

mariadalfonso commented Apr 21, 2021

hqucms commented Apr 21, 2021

mariadalfonso commented Apr 21, 2021

mariadalfonso commented Apr 21, 2021

mariadalfonso commented Apr 21, 2021

cmsbuild commented Apr 21, 2021

slava77 commented Apr 27, 2021

slava77 commented Apr 27, 2021

hqucms commented Apr 27, 2021

hqucms commented Apr 27, 2021

slava77 commented Apr 27, 2021 •

edited

Loading

hqucms commented Apr 27, 2021

hqucms commented Apr 28, 2021

jpata commented Apr 29, 2021

jpata commented Apr 30, 2021

silviodonato commented May 3, 2021

gouskos commented May 4, 2021

mariadalfonso May 6, 2021

hqucms May 6, 2021

mariadalfonso May 6, 2021 •

edited

Loading

hqucms May 6, 2021 •

edited

Loading

mariadalfonso May 6, 2021

hqucms May 6, 2021

mariadalfonso May 7, 2021

mariadalfonso commented May 7, 2021

cmsbuild commented May 7, 2021

qliphy commented May 7, 2021

Add ParticleNet mass regression #33483

Add ParticleNet mass regression #33483

Conversation

hqucms commented Apr 20, 2021 • edited Loading

PR description:

PR validation:

cmsbuild commented Apr 20, 2021

cmsbuild commented Apr 20, 2021

mariadalfonso commented Apr 21, 2021

hqucms commented Apr 21, 2021

mariadalfonso commented Apr 21, 2021

mariadalfonso commented Apr 21, 2021

mariadalfonso commented Apr 21, 2021

cmsbuild commented Apr 21, 2021

Comparison Summary

slava77 commented Apr 27, 2021

slava77 commented Apr 27, 2021

hqucms commented Apr 27, 2021

hqucms commented Apr 27, 2021

slava77 commented Apr 27, 2021 • edited Loading

hqucms commented Apr 27, 2021

hqucms commented Apr 28, 2021

jpata commented Apr 29, 2021

jpata commented Apr 30, 2021

silviodonato commented May 3, 2021

gouskos commented May 4, 2021

mariadalfonso May 6, 2021

Choose a reason for hiding this comment

hqucms May 6, 2021

Choose a reason for hiding this comment

mariadalfonso May 6, 2021 • edited Loading

Choose a reason for hiding this comment

hqucms May 6, 2021 • edited Loading

Choose a reason for hiding this comment

mariadalfonso May 6, 2021

Choose a reason for hiding this comment

hqucms May 6, 2021

Choose a reason for hiding this comment

mariadalfonso May 7, 2021

Choose a reason for hiding this comment

mariadalfonso commented May 7, 2021

cmsbuild commented May 7, 2021

qliphy commented May 7, 2021

hqucms commented Apr 20, 2021 •

edited

Loading

slava77 commented Apr 27, 2021 •

edited

Loading

mariadalfonso May 6, 2021 •

edited

Loading

hqucms May 6, 2021 •

edited

Loading