Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Photon MVA-based ID for Run3 from EGamma POG #39927

Merged

Conversation

Prasant1993
Copy link
Contributor

PR description:

This PR is to add Photon MVA-based ID for Run3 from EGamma POG:

  • This will add photon MVA-based ID decisions for a new optimized ID in Run3.
  • The new optimized ID is trained using G+Jets simulation sample produced in CMSSW_12_2_X Winter22 campaign.
  • The ID optimization and the comparison of results can be found in the below presentations made in EGamma POG:
  1. https://indico.cern.ch/event/1182858/#2-update-on-run3-photon-mva-id
  2. https://indico.cern.ch/event/1188686/#2-update-on-run3-photon-mva-id
  3. https://indico.cern.ch/event/1192144/#2-update-on-run3-photon-mva-id
  4. https://indico.cern.ch/event/1192146/#2-update-on-run3-photon-mva-id
  5. https://indico.cern.ch/event/1192145/#2-update-on-run3-photon-mva-id
  6. https://indico.cern.ch/event/1204272/#2-update-on-run3-photon-mva-id

PR validation:

runTheMatrix tests have been successfully run for the following workflows :

  • runTheMatrix.py -l 12434.0
  • runTheMatrix.py -l 11834.0
  • runTheMatrix.py -l 10024.0
  • runTheMatrix.py -l 136.874

Validation has also been performed to cross-check the Run3 MVA ID efficiencies from the MINIAOD datasets and compared with Run2 ID decisions using G+Jets, Relval H125, Relval ZEE and Relval QCD MC samples as shown in the below plots:

The photon MVA training weight files for Run3 have been already added here : https://github.com/cms-data/RecoEgamma-PhotonIdentification/tree/master/MVA
A PR is submitted and not merged yet : cms-data/RecoEgamma-PhotonIdentification#13
This current PR for photon MVA ID will take the input weight files from the above PR to work with.

Backport:

This PR does not need any backportation.

We would like to keep both RunIIFall17V2 photon MVA ID decision and this new Run3Winter22V1 photon MVA ID decision in the upcoming nano because several early analysis are using Run2 ID for Run3, and it is working quite well, so they might or might not want to switch. The necessary changes for adding the Run3 MVA photon ID in NanoAOD has already been made in this PR.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39927/32847

  • This PR adds an extra 3976KB to repository

  • Found files with invalid states:

    • RecoEgamma/PhotonIdentification/data/RunIII_Winter22_phoMVA_ID_weights/PhoID_endcap_Train_GJetMC_EgammaID_RunIII_Winter22MiniAOD_V11_BDTG.weights.xml:
    • RecoEgamma/PhotonIdentification/data/RunIII_Winter22_phoMVA_ID_weights/PhoID_barrel_Train_GJetMC_EgammaID_RunIII_Winter22MiniAOD_V13_BDTG.weights.xml:
    • RecoEgamma/PhotonIdentification/data/RunIII_Winter22_phoMVA_ID_weights/PhoMVA_ID_EB_V1.weights.root:
    • RecoEgamma/PhotonIdentification/data/RunIII_Winter22_phoMVA_ID_weights/PhoMVA_ID_EE_V1.weights.root:
  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Prasant1993 (Prasant Kumar Rout) for master.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)
  • PhysicsTools/PatAlgos (xpog, reconstruction)
  • RecoEgamma/EgammaTools (reconstruction)
  • RecoEgamma/PhotonIdentification (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo, @swertz, @vlimant can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @jainshilpi, @schoef, @emilbols, @mbluj, @demuller, @varuns23, @seemasharmafnal, @mmarionncern, @jdolen, @ahinzmann, @lgray, @missirol, @azotz, @Sam-Harper, @jdamgov, @nhanvtran, @gkasieczka, @hatakeyamak, @andrzejnovak, @AlexDeMoor, @AnnikaStein, @valsdav, @JyothsnaKomaragiri, @sobhatta, @afiqaize, @gpetruc, @wrtabb, @mariadalfonso, @sameasy, @ram1123 this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@swagata87
Copy link
Contributor

type egamma

@swagata87
Copy link
Contributor

please test with cms-data/RecoEgamma-PhotonIdentification#13

@swagata87
Copy link
Contributor

EGM POG maintains 4 types of IDs (electron cutbased+MVA; photon cutbased+MVA). All of them are being reoptimised for Run3.
1/4 #39839 -> Electron cutbased
2/4 this PR -> Photon MVA
And the other 2 PRs are coming shortly.
All of them aims for the last open pre-release of 12_6_0, so that new IDs can be present in reMini and reNano which will be made with 12_6.

Tagging EGM nanoAOD contact @rgoldouz

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aaed28/28645/summary.html
COMMIT: 3e75cdf
CMSSW: CMSSW_12_6_X_2022-10-30-0000/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/39927/28645/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 446 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3416356
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3416322
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 4.14 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): 0.556 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 13234.0,... ): 0.268 KiB Physics/NanoAODDQM
  • Checked 206 log files, 48 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@vlimant
Copy link
Contributor

vlimant commented Oct 31, 2022

enable nano

@swertz
Copy link
Contributor

swertz commented Oct 31, 2022

Do we want to disable the Run3 IDs (also in #39839) for Run2 samples?

@swagata87
Copy link
Contributor

Do we want to disable the Run3 IDs (also in #39839) for Run2 samples?

Run3 IDs are not specifically needed in Run2 samples. But we are anyway not going to reNano any of the Run2 data or MC, right? So maybe we can leave it out for simplicity for now? We can come back to cleanups later on..?

@swertz
Copy link
Contributor

swertz commented Oct 31, 2022

But we are anyway not going to reNano any of the Run2 data or MC, right? So maybe we can leave it out for simplicity for now? We can come back to cleanups later on..?

The idea is to keep Run2 and Run3 samples as much as possible in sync for combined analyses, there will for sure be a re-nano of Run2 at some point (if not in 126X then in 130X), so we should keep things clean in master.

This is also valid for MINI, btw (not running the Run3 IDs for Run2 MINI).

@swagata87
Copy link
Contributor

yes we are just trying to make sure that Run3 IDs get integrated now, as we do not want to miss the deadline of last open pre-release of 12_6_0. The cleanups can be done afterwards too, if that's not a problem.

@swertz
Copy link
Contributor

swertz commented Nov 7, 2022

+xpog

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Nov 8, 2022

@Prasant1993 @swagata87 are the many differences in the Run3 workflows (tracking, etc.) explained somehow? I doubt they come from this PR, not even from the technical PR tested together with this one, but it would be nice to pinpoint a possible origin of them before merging

@vlimant
Copy link
Contributor

vlimant commented Nov 8, 2022

@perrotta
Copy link
Contributor

perrotta commented Nov 8, 2022

@perrotta can you please point out one of the relmon directory that makes you worried ?
I'm looking at some of them and it looks like a glitch in the comparison. The changes in this PR have no way to influence the tracks "etaSim" (whatever this means)

@vlimant I see you were able to find them yourself, and you also reached my conclusion that it cannot be originated by this PR.
Let try to see whether the "glitch" disappears if we retest it once more. If not, I would ask the authors to investigate where the possible correlation comes from

@perrotta
Copy link
Contributor

perrotta commented Nov 8, 2022

please test with cms-data/RecoEgamma-PhotonIdentification#13

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 8, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aaed28/28875/summary.html
COMMIT: 3e75cdf
CMSSW: CMSSW_12_6_X_2022-11-07-2300/el8_amd64_gcc10
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39927/28875/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aaed28/28875/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aaed28/28875/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 445 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3416402
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3416368
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 4.14 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): 0.556 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 13234.0,... ): 0.268 KiB Physics/NanoAODDQM
  • Checked 206 log files, 48 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

NANO Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.311_mc106Xul17v2+TTbarMINIAOD10.6_UL17v2+NANO_mc10.6ul17v2+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.312_mc106Xul17v2+TTbarMINIAOD10.6_UL18v2+NANO_mc10.6ul18v2+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.31_mc106Xul17v2+TTbarMINIAOD10.6_UL16v2+NANO_mc10.6ul16v2+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.331_data106Xul17v2+MuonEG2017MINIAOD10.6v2+NANO_data10.6ul17v2+HRV_NANO_data
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.332_data106Xul18v2+MuonEG2018MINIAOD10.6v2+NANO_data10.6ul18v2+HRV_NANO_data
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.33_data106Xul16v2+MuonEG2016MINIAOD10.6v2+NANO_data10.6ul16v2+HRV_NANO_data
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.401_mc122Xrun3+TTbarMINIAOD12.2+NANO_mc12.2+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.4_mc122Xrun3_v10+TTbarMINIAOD12.2+NANO_mc12.2_v10+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.501_mc124Xrun3+TTbarMINIAOD12.4+NANO_mc12.4+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.511_data124Xrun3+MuonEG2022MINIAOD12.4+NANO_data12.4+HRV_NANO_data
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.51_data124Xrun3_v10+MuonEG2022MINIAOD12.4+NANO_data12.4_v10+HRV_NANO_data
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.5_mc124Xrun3_v10+TTbarMINIAOD12.4+NANO_mc12.4_v10+HRV_NANO_mc
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-aaed28/2500.6_mc126X_v10+TTBarMINIAOD12.6+NANO_mc12.6_v10+HRV_NANO_mc

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 15
  • DQMHistoTests: Total histograms compared: 14677
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 14677
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 5.768 KiB( 14 files compared)
  • DQMHistoSizes: changed ( 2500.311,... ): 0.556 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 2500.331,... ): 0.268 KiB Physics/NanoAODDQM
  • Checked 31 log files, 0 edm output root files, 15 DQM output files

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.31 2.210 2.206 0.004 ( +0.2% ) 9.25 8.61 +7.4% 1.449 1.476
2500.311 2.331 2.327 0.004 ( +0.2% ) 8.77 8.34 +5.1% 1.819 1.868
2500.312 2.282 2.278 0.004 ( +0.2% ) 8.95 8.45 +5.9% 1.809 1.852
2500.33 1.099 1.095 0.004 ( +0.4% ) 21.33 20.13 +6.0% 1.644 1.467
2500.331 1.396 1.391 0.004 ( +0.3% ) 15.81 14.92 +6.0% 1.792 1.625
2500.332 1.328 1.323 0.004 ( +0.3% ) 17.55 16.35 +7.3% 1.745 1.559
2500.4 2.115 2.115 0.000 ( +0.0% ) 9.85 9.62 +2.4% 1.364 1.302
2500.401 2.119 2.115 0.004 ( +0.2% ) 9.96 9.63 +3.4% 1.161 1.405
2500.5 0.302 0.302 0.000 ( +0.0% ) 49.61 49.58 +0.1% 1.112 1.177
2500.501 0.302 0.302 0.000 ( +0.0% ) 49.62 48.91 +1.5% 1.118 1.192
2500.51 1.100 1.100 0.000 ( +0.0% ) 27.78 27.73 +0.2% 1.375 1.307
2500.511 1.104 1.100 0.004 ( +0.4% ) 27.74 27.96 -0.8% 1.371 1.307
2500.6 1.418 1.418 0.000 ( +0.0% ) 23.49 22.85 +2.8% 1.082 1.023
2500.601 1.422 1.418 0.004 ( +0.3% ) 23.46 24.04 -2.4% 1.082 1.160

@rappoccio
Copy link
Contributor

+1

Expected new features into nano.

@Martin-Grunewald
Copy link
Contributor

Errors in IBs:

----- Begin Fatal Exception 09-Nov-2022 06:07:37 CET-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=PhotonMVAValueMapProducer label='photonMVAValueMapProducer'
Exception Message:
edm::FileInPath unable to find file RecoEgamma/PhotonIdentification/data/MVA/RunIII_Winter22/PhoMVA_ID_EB_V1.weights.root anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /data/cmsbld/jenkins/workspace/ib-run-relvals/CMSSW_12_6_X_2022-11-08-2300/poison:/data/cmsbld/jenkins/workspace/ib-run-relvals/CMSSW_12_6_X_2022-11-08-2300/src:/data/cmsbld/jenkins/workspace/ib-run-relvals/CMSSW_12_6_X_2022-11-08-2300/external/slc7_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02758/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-11-08-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02758/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-11-08-2300/external/slc7_amd64_gcc10/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-relvals/CMSSW_12_6_X_2022-11-08-2300/pyRelval/250402.0_TTbar_13+FS_TTbar_13_PRMXUP15_PU25+HARVESTUP15FS+MINIAODMCUP15FS
----- End Fatal Exception -------------------------------------------------

@smuzaffar
Copy link
Contributor

cms-data/RecoEgamma-PhotonIdentification#13 was needed but it did not go in the IB. I have merged it and started a new IB an hour ago

@Martin-Grunewald
Copy link
Contributor

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants