Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes to variables for lepton MVA #45754

Closed
wants to merge 4 commits into from

Conversation

namapane
Copy link
Contributor

@namapane namapane commented Aug 20, 2024

PR description:

We recently realized that the inputs for the muonPROMPTMVA (and likewise for electronPROMPTMVA) are almost, but not fully recoverable from nanoAODs. This means that it is not possible to check data/MC agreement for input variables from central productions, nor to test new trainings.
This can be fixed easily and cheaply with two small changes (using muons for illustration, same applies to electrons):

  • add 1 float variable Muon_jetDF, corresponding to the LepGood_jetDF MVA input
  • fix an inconsistency in the definition of Muon_jetRelIso with respect to the variable that is intended to correspond to, LepGood_jetPtRatio. With the current definition, jetRelIso = (1/ptRatio-1) if a jet is present, pfRelIso04_all otherwise. The (1/x-1) transformation was intended to save space. Unfortunately the actual MVA input variable is defined in a slightly different way, with a max(ptRatio, 1.5) applied in the case a jet is associated. Since it is not possible to figure out unambiguously if this was the case, recovering the exact definition of LepGood_jetPtRatio that was used for the MVA is tricky.

Our proposal is to replace Muon_jetRelIso with the plain ptRatio, without a transformation, and with a default of -1 if no jet is matched. This has some advantages:

  • much easier to recover the MVA input variable correctly
  • Marks the case of no jet is found unambiguously (-1)
  • Does not mix ptRatio with pfRelIso04_all, which is already available in its own variable. This improves clarity and also saves some disk space as -1 gets compressed better
  • if anybody needs Muon_jetRelso as defined now, it can be computed easily as well (but I doubt there is much use of it apart for studies related to the MVA, for which having the plain ptRatio is more handy anyhow)
  • changing the name of the variable ensures that the change will not go unnoticed (in case anybody is using Muon_jetPtRatio explicitly)
  • Same considerations for the corresponding Electron variables. In this case, Electron_pfRelIso04_all is also added since this variable, used in Electron_jetRelso, was not present.

PR validation:

  • tested processing a DYJets sample and checking distributions, and sizes with inspectNanoFile.py [size report]
  • To ensure that removing the (1/x-1) transformation does not increase the rounding error, compared the rounding error of deriving ptRatio from the curent Muon_jetPtRatio with the one of storing ptRatio directly. We found that precision=10 gives a smaller rounding error while still reducing the space from 2.4 b/muon to 2.2 b/muon (this is because we fill -1 instead of pfRelIso04_all when no jet is found).
  • Likewise, Electron_jetPtRatio takes 2.3 b/item instead of 2.4 b/item of Electron_jetRelIso.
  • Electron_pfRelIso04_all costs 2.5 b/item
  • The addition of Muon_jetDF and Electron_jetDF cost 2.2 b/item each
  • The overall effect of this proposed fix is +2.0 b/muon and +4.6 b/electron.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 20, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45754/41460

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @namapane for master.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)

@cmsbuild, @ftorrresd, @hqucms, @vlimant can you please review it and eventually sign? Thanks.
@AnnikaStein, @gpetruc this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45754/41494

@cmsbuild
Copy link
Contributor

Pull request #45754 was updated. @cmsbuild, @ftorrresd, @hqucms, @vlimant can you please check and sign again.

@namapane namapane changed the title Minor fixes to variables for muon MVA Minor fixes to variables for lepton MVA Aug 22, 2024
@@ -341,8 +341,9 @@ def _get_bitmapVIDForEle_docstring(modules,WorkingPoints):
miniPFRelIso_all = Var("userFloat('miniIsoAll')/pt",float,doc="mini PF relative isolation, total (with scaled rho*EA PU Winter22V1 corrections)"),
pfRelIso03_chg = Var("userFloat('PFIsoChg')/pt",float,doc="PF relative isolation dR=0.3, charged component"),
pfRelIso03_all = Var("userFloat('PFIsoAll')/pt",float,doc="PF relative isolation dR=0.3, total (with rho*EA PU Winter22V1 corrections)"),
jetRelIso = Var("?userCand('jetForLepJetVar').isNonnull()?(1./userFloat('ptRatio'))-1.:userFloat('PFIsoAll04')/pt",float,doc="Relative isolation in matched jet (1/ptRatio-1, pfRelIso04_all if no matched jet)",precision=8),
jetPtRatio = Var("?userCand('jetForLepJetVar').isNonnull()?userFloat('ptRatio'):-1.",float,doc="ratio of electron pt to the associated jet pt (-1. if none)",precision=10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that PFIsoAll04 is not stored in NanoAOD, so if we make this change we will need to add it.

Copy link
Contributor Author

@namapane namapane Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, @hqucms. I assumed that was included in the same way as it it for muons.

But actually, there is somewhat an inconsistency as of now. Electron_jetRelIso takes PFIsoAll04 for Run3 and
PFIsoAll04_Fall17V2 for Run2, but what is used for LepGood_jetPtRatio in the electronPROMPTMVA is PFIsoAll04_Fall17V2 in all cases. I suppose that this is because the MVA training was done witth the Fall17V2 version, but then Electron_jetRelIso is inconsistent with the MVA input for Run3.
I'm unsure on how to proceed then: it depends if the goal is to be able to recover the variable used as input in the electronPROMPTMVA (then we need to add PFIsoAll04_Fall17V2), or to be able to recover the current Electron_jetRelIso (then we have to add PFIsoAll04 for Run3 or PFIsoAll04_Fall17V2 for Run2, but then I miss what's the point of having this variable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code suggests that one should use PFIsoAll04 for Run3 and PFIsoAll04_Fall17V2 for Run2, so I suppose when the electronPROMPTMVA is re-trained for Run3 it should move to using PFIsoAll04?

Maybe @cms-sw/egamma-pog-l2 could comment and clarify on the plans for the update of electronPROMPTMVA for Run3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added pfRelIso04_all.

@namapane
Copy link
Contributor Author

namapane commented Aug 23, 2024 via email

@RSalvatico
Copy link
Contributor

Hi,

Thanks @namapane for looking into this and for the proposal. I was indeed looking at the fact that with the current solution we'd essentially drop PFIsoAll04 from the collection. The current version of the electron PromptMVA still uses the Run2 training but, to be completely fair, the Run2-Run3 comparisons were made using the EGM isolated electron ID as input for the Run3 one, because at the time the Run3 noIso ID (which as you can see is one of the input variables) was bugged. As far as I know, it was never checked again using the noIso ID.

In any case it would seems strange to me to have PFIsoAll04_Fall17V2 as the only Run2 electron variable in Run3 NanoAOD... But on the other hand the two PFIsoAll04 versions differ only in terms of the effective areas used - so shouldn't people actually use PFIsoAll04 and not the Fall17V2 version with updated EAs to fetch the MVA value in Run3, despite what the MVA was trained over?

@namapane
Copy link
Contributor Author

Thanks @namapane for looking into this and for the proposal. I was indeed looking at the fact that with the current solution we'd essentially drop PFIsoAll04 from the collection. The current version of the electron PromptMVA still uses the Run2 training but, to be completely fair, the Run2-Run3 comparisons were made using the EGM isolated electron ID as input for the Run3 one, because at the time the Run3 noIso ID (which as you can see is one of the input variables) was bugged. As far as I know, it was never checked again using the noIso ID.

Thanks @RSalvatico for this clarification.

In any case it would seems strange to me to have PFIsoAll04_Fall17V2 as the only Run2 electron variable in Run3 NanoAOD... But on the other hand the two PFIsoAll04 versions differ only in terms of the effective areas used - so shouldn't people actually use PFIsoAll04 and not the Fall17V2 version with updated EAs to fetch the MVA value in Run3, despite what the MVA was trained over?

Thinking a bit more on this, I would say that storing the "Run3" version of input variables makes sense as it will allow to retrain the MVA and recompute it over the new training, if aybody ever wants to. In the meanwhile, the existing promptMVA variable can be used and is fully self-consistent. What it would miss by storing PFIsoAll04 only is the ability to perform data/MC comparisons on the exact inputs used for the current training - but this is not the case even without the modifications proposed in this PR.

So I would stick to @hqucms 's initial suggestion and add only PFIsoAll04. I will update the draft PR tomorrow.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45754/41539

@cmsbuild
Copy link
Contributor

Pull request #45754 was updated. @cmsbuild, @ftorrresd, @hqucms, @vlimant can you please check and sign again.

@namapane
Copy link
Contributor Author

The latest commit introduces pfRelIso04_all, with the Run2/Run3 EAs. The PR description is updated with details and size checks.
Note that I set precision=10 for this, even if the former Electron_jetRelIso had precision=8 - I choose to be a bit conservative to align with Electron_jetPtRatio, but also because the other existing pfRelIso variables have no precision set, which would have been probably a bit too much for this one.

@namapane
Copy link
Contributor Author

I think @RSalvatico and/or other egamma experts should check carefully all electron-related diffs; then, if everybody is OK with the proposal, I can turn the draft PR into a regular one.

@hqucms
Copy link
Contributor

hqucms commented Aug 26, 2024

enable nano

@hqucms
Copy link
Contributor

hqucms commented Aug 26, 2024

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 12KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fc82df/41133/summary.html
COMMIT: 2b99b7d
CMSSW: CMSSW_14_1_X_2024-08-26-1100/el8_amd64_gcc12
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45754/41133/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 198 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3328085
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3328065
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9.244 KiB( 43 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): 0.715 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 13234.0,... ): 0.471 KiB Physics/NanoAODDQM
  • Checked 191 log files, 161 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

NANO Comparison Summary

Summary:

  • You potentially added 1961 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 40 differences found in the comparisons
  • DQMHistoTests: Total files compared: 21
  • DQMHistoTests: Total histograms compared: 54849
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 54849
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 8.882 KiB( 20 files compared)
  • DQMHistoSizes: changed ( 2500.001,... ): 1.119 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.011,... ): 0.737 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.101,... ): 0.715 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.111,... ): 0.471 KiB Physics/NanoAODDQM
  • Checked 102 log files, 58 edm output root files, 21 DQM output files
  • TriggerResults: no differences found

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.001 2.798 2.788 0.010 ( +0.3% ) 3.30 3.30 -0.2% 6.669 6.029
2500.002 2.909 2.901 0.008 ( +0.3% ) 2.95 2.96 -0.2% 7.026 6.406
2500.003 2.856 2.848 0.008 ( +0.3% ) 3.07 3.07 +0.1% 7.141 6.354
2500.011 1.459 1.450 0.009 ( +0.6% ) 5.66 5.72 -1.1% 2.400 2.400
2500.012 1.916 1.909 0.007 ( +0.4% ) 3.10 3.14 -1.3% 2.594 2.592
2500.013 1.774 1.765 0.009 ( +0.5% ) 4.50 4.54 -0.8% 2.504 2.497
2500.021 0.022 0.022 0.000 ( +0.0% ) 0.97 0.97 +0.3% 2.377 2.374
2500.022 0.022 0.022 0.000 ( +0.0% ) 0.92 0.93 -2.0% 2.371 2.371
2500.023 0.022 0.022 0.000 ( +0.0% ) 0.91 0.92 -0.6% 2.238 2.239
2500.024 0.022 0.022 0.000 ( +0.0% ) 0.68 0.71 -3.8% 2.476 2.456
2500.031 0.035 0.035 0.000 ( +0.0% ) 0.86 0.88 -2.1% 2.446 2.443
2500.032 0.036 0.036 0.000 ( +0.0% ) 0.88 0.90 -1.6% 2.404 2.404
2500.033 0.037 0.037 0.000 ( +0.0% ) 0.79 0.78 +0.5% 2.485 2.481
2500.034 0.036 0.036 0.000 ( +0.0% ) 0.79 0.82 -2.7% 2.463 2.456
2500.101 2.654 2.646 0.009 ( +0.3% ) 8.66 8.80 -1.6% 6.940 6.267
2500.111 1.337 1.330 0.007 ( +0.5% ) 18.83 19.63 -4.1% 2.215 2.213
2500.112 1.743 1.735 0.008 ( +0.5% ) 14.69 14.73 -0.2% 2.291 2.172
2500.131 5.194 5.194 0.000 ( +0.0% ) 15.30 15.68 -2.4% 1.544 1.545
2500.201 2.485 2.478 0.007 ( +0.3% ) 7.47 7.49 -0.3% 6.219 5.588
2500.211 1.599 1.592 0.007 ( +0.5% ) 17.22 17.30 -0.4% 2.292 2.132
2500.212 2.041 2.033 0.008 ( +0.4% ) 13.37 13.81 -3.2% 2.373 2.183
2500.221 2.003 2.006 -0.002 ( -0.1% ) 7.59 7.70 -1.5% 2.433 2.434
2500.222 3.080 3.072 0.008 ( +0.3% ) 7.44 7.54 -1.3% 2.525 2.518
2500.223 8.896 8.888 0.008 ( +0.1% ) 2.72 2.77 -1.7% 2.541 2.534
2500.224 5.523 5.515 0.008 ( +0.1% ) 1.08 1.10 -2.0% 2.579 2.256
2500.225 5.542 5.533 0.008 ( +0.1% ) 1.00 1.03 -2.8% 2.605 2.263
2500.226 2.981 2.973 0.008 ( +0.3% ) 7.41 7.36 +0.7% 2.518 2.326
2500.227 8.972 8.972 0.000 ( +0.0% ) 9.87 10.11 -2.4% 1.499 1.491
2500.231 1.403 1.407 -0.004 ( -0.3% ) 13.54 13.40 +1.0% 2.197 1.936
2500.232 2.153 2.145 0.008 ( +0.4% ) 13.54 13.38 +1.2% 2.285 2.051
2500.233 4.679 4.670 0.008 ( +0.2% ) 4.80 4.81 -0.0% 2.304 1.813
2500.234 3.315 3.307 0.008 ( +0.3% ) 1.46 1.47 -0.7% 2.330 1.843
2500.235 3.326 3.317 0.009 ( +0.3% ) 1.37 1.39 -1.5% 2.347 1.535
2500.236 2.093 2.085 0.008 ( +0.4% ) 13.44 13.80 -2.7% 2.280 2.072
2500.237 7.977 7.977 0.000 ( +0.0% ) 14.32 14.43 -0.8% 1.577 1.440
2500.241 9.405 9.405 0.000 ( +0.0% ) 3.78 3.63 +4.3% 1.962 1.793
2500.242 10.331 10.331 0.000 ( +0.0% ) 0.85 0.88 -2.5% 1.725 1.728
2500.243 2.712 2.712 0.000 ( +0.0% ) 7.76 7.66 +1.3% 1.066 1.059
2500.244 485.976 485.976 0.000 ( +0.0% ) 0.55 0.54 +0.5% 1.673 1.671
2500.245 823.224 823.224 0.000 ( +0.0% ) 0.73 0.74 -2.5% 1.663 1.656
2500.901 1.749 1.749 0.000 ( +0.0% ) 18.38 21.31 -13.7% 1.830 1.832
2500.902 1.598 1.598 0.000 ( +0.0% ) 21.44 20.92 +2.5% 1.763 1.370
2500.911 13.931 13.931 0.000 ( +0.0% ) 2.92 3.03 -3.7% 1.078 1.080
2500.912 0.171 0.171 0.001 ( +0.4% ) 1.35 1.37 -1.0% 0.970 0.971
2500.913 0.110 0.110 0.000 ( +0.0% ) 1.17 1.21 -2.7% 0.969 0.965

@cmsbuild
Copy link
Contributor

Milestone for this pull request has been moved to CMSSW_14_2_X. Please open a backport if it should also go in to CMSSW_14_1_X.

@namapane
Copy link
Contributor Author

namapane commented Sep 2, 2024

Milestone for this pull request has been moved to CMSSW_14_2_X. Please open a backport if it should also go in to CMSSW_14_1_X.

I opened #45860 as requested and will close this one.
Note: upon request of Sergio Sanchez Cruz, we decided to keep storing jetRelIso=1/ptRatio-1 (as it was before) instead of storing ptRatio as in this draft PR. The two variables transform into each other trivially.

@namapane namapane closed this Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants