Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Run 3 PFScouting NanoAOD #40438

Merged
merged 2 commits into from
Oct 11, 2023

Conversation

alintulu
Copy link
Contributor

@alintulu alintulu commented Jan 6, 2023

PR description:

This PR adds a custom NanoAOD for Run 3 PFScouting. To achieve this we had to add a couple of new files and amend a few more. I will now try to describe the reason behind each addition/amendment:

  • The custom format is added with custom_run3scouting_cff.py, which was modeled from custom_jme_cff.py. All Run 3 PFScouting objects (with the exception of jets and PFCandidates) are added with the help of SimpleFlatTableProducer.
  • Before adding jets and PFCandidates, the latter is converted from Run3ScoutingParticles to recoPFCandidates. This is done in PhysicsTools/NanoAOD/plugins/Run3ScoutingParticleToRecoPFCandidateProducer.cc (I was not sure where to put this file, please let me know if it should be moved). The conversion allows us to cluster AK4 and AK8 jets from the recoPFCandidates. There are a list of Run3ScoutingParticle values which are not available in recoPFCandidates, and these are therefore tracked with ValueMaps.
  • We have trained AK4 flavour-tagging, AK8 b-tagging and AK8 mass regression with ParticleNet specifically for PFScouting. In order to perform inference, DeepBoostedJetTagInfoProducer.cc and BoostedJetONNXJetTagsProducer.cc had to be amended. The former was amended to allow for the specific PFScouting inputs and the latter to produce ValueMaps instead of JetTags. Since the PFScouting jets are RECO and not PAT I did not find a way of associating the output score to the jet using a JetTag (but was able with a ValueMap). I've already opened a PR with the ONNX models necessary for running inference with CMSSW (Add PartcileNet ONNX models for Run 3 PFScouting cms-data/RecoBTag-Combined#49)
  • To match PFScouting jets to GenJets JetDeltaRValueMapProducer.cc was amended to return the index instead of the value of the matched jet.
  • Finally, the PFScouting stream is currently made out of two datasets, DST_Run3_PFScoutingPixelTracking_v and DST_HLTMuon_Run3_PFScoutingPixelTracking_v. I added a boolean (isPFScouting) by amending NanoAODOutputModule.cc, TriggerOutputBranches.cc and TriggerOutputBranches.h. By setting isPFScouting to True, information regarding if the event is part of one of these datasets is added.

PR validation:

The PR passed the tests listed at https://cms-sw.github.io/PRWorkflow.html.

With everything included, the event size is 8.4 kB/event (when ran over 5000 events from /store/data/Run2022F/ScoutingPFRun3/RAW/v1/000/361/303/00000/37f1ab1d-94f9-4177-91e5-db46490bc69a.root). However the ScoutingParticles account for 5.3 kB/event. It is not clear how useful they are in the NanoAOD and could perhaps be completely or partly removed (to be discussed).

A pie-chart showing the most up to date event size distribution can be found here.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

N/A

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 6, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40438/33578

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 6, 2023

A new Pull Request was created by @alintulu (Adelina Lintuluoto) for master.

It involves the following packages:

  • CommonTools/RecoAlgos (reconstruction)
  • PhysicsTools/NanoAOD (xpog)
  • RecoBTag/FeatureTools (reconstruction)
  • RecoBTag/ONNXRuntime (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo, @swertz, @vlimant can you please review it and eventually sign? Thanks.
@AlexDeMoor, @rappoccio, @JyothsnaKomaragiri, @ahinzmann, @AnnikaStein, @abbiendi, @emilbols, @jhgoh, @jdolen, @gkasieczka, @hqucms, @hatakeyamak, @gpetruc, @andrzejnovak, @demuller, @missirol this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@clacaputo
Copy link
Contributor

Hi @alintulu , is there a workflow where we can test this code?

@alintulu
Copy link
Contributor Author

Hi @alintulu , is there a workflow where we can test this code?

Hi @clacaputo! Since my PR to RecoBTag-Combined (cms-data/RecoBTag-Combined#49) has not been merged yet I created another branch which is a clone of this PR with the exception of storing the ONNX model files locally. You can create a custom PFScouting NanoAOD from that branch using these commands. Is this enough?

My understanding is that a PFScouting NanoDQM would be great (see #39000 (comment)), but unfortunately I've failed to create one so far. @mariadalfonso would you happen to know how to best achieve this?

@clacaputo
Copy link
Contributor

You can create a custom PFScouting NanoAOD from that branch using these commands. Is this enough?

Hi @alintulu , it would be better to test the code using runTheMatrix.py, defining a dedicated workflow for the PFScouting NanoAOD

@clelange
Copy link
Contributor

Hi @alintulu , it would be better to test the code using runTheMatrix.py, defining a dedicated workflow for the PFScouting NanoAOD

Hi @clacaputo - that should be possible. Do you have an example that we could start from that's ideally close to what we need here? I don't have any experience with implementing runTheMatrix.py workflows. Thank you!

@clacaputo
Copy link
Contributor

Hi @clacaputo - that should be possible. Do you have an example that we could start from that's ideally close to what we need here? I don't have any experience with implementing runTheMatrix.py workflows. Thank you!

Hi @clelange , you can find an examples in #40553. Sorry for the late reply

@cmsbuild cmsbuild mentioned this pull request Feb 1, 2023
@alintulu
Copy link
Contributor Author

alintulu commented Feb 1, 2023

Hi @clelange , you can find an examples in #40553. Sorry for the late reply

Thank you @clacaputo! I was able to create a test, however it fails as the RecoBTag-Combined PR (cms-data/RecoBTag-Combined#49) has yet to be merged. I will ping them.

@smuzaffar
Copy link
Contributor

test parameters:

@smuzaffar
Copy link
Contributor

please test

@alintulu , we can test cms-data/RecoBTag-Combined#49 directly here :-)

@kpedro88
Copy link
Contributor

kpedro88 commented Feb 2, 2023

@alintulu you can also run your own local tests by cloning your cms-data PR into your working area:

cd $CMSSW_BASE/src
git cms-addpkg RecoBTag/Combined
git clone https://github.com/alintulu/RecoBTag-Combined -b scoutingNanoAOD RecoBTag/Combined/data
scram b

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/30340/summary.html
COMMIT: 324432c
CMSSW: CMSSW_13_0_X_2023-01-31-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40438/30340/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/30340/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/30340/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 31 lines from the logs
  • Reco comparison results: 35 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 15
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555458
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@vlimant
Copy link
Contributor

vlimant commented Oct 9, 2023

@cms-sw/orp-l2 is there something holding up merging this in master ?

@rappoccio
Copy link
Contributor

unhold

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

test parameters:

pull_request = cms-data/RecoBTag-Combined#49

@rappoccio
Copy link
Contributor

please test

  • Restarting stale tests then will merge when they are complete.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/35101/summary.html
COMMIT: 5b034aa
CMSSW: CMSSW_13_3_X_2023-10-09-1100/el8_amd64_gcc11
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40438/35101/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/35101/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/35101/git-merge-result

Unit Tests

I found 4 errors in the following unit tests:

---> test TestDQMOnlineClient-ecalgpu_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-hcalgpu_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-pixelgpu_dqm_sourceclient had ERRORS
and more ...

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 3085 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3356920
  • DQMHistoTests: Total failures: 3032
  • DQMHistoTests: Total nulls: 35
  • DQMHistoTests: Total successes: 3353831
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 1 / 48 workflows

NANO Comparison Summary

Summary:

  • You potentially added 4 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 15
  • DQMHistoTests: Total histograms compared: 15925
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 15925
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 14 files compared)
  • Checked 31 log files, 14 edm output root files, 15 DQM output files

Nano size comparison Summary:

  • Nano ERROR: Missing ref/2500.5-size.json
  • Nano ERROR: Missing ref/2500.5-size.json
    | Sample | kb/ev | ref kb/ev | diff kb/ev | ev/s/thd | ref ev/s/thd | diff rate | mem/thd | ref mem/thd |
    | --- | --- | --- | --- | --- | --- | --- | --- | --- |
    | 2500.0 | 2.473 | 2.469 | 0.004 ( +0.2% ) | 5.33 | 5.27 | +1.2% | 2.108 | 2.061 |
    | 2500.001 | 2.616 | 2.611 | 0.005 ( +0.2% ) | 4.73 | 4.77 | -0.9% | 2.527 | 2.453 |
    | 2500.002 | 2.526 | 2.522 | 0.004 ( +0.2% ) | 4.92 | 4.82 | +2.2% | 2.527 | 2.440 |
    | 2500.01 | 1.265 | 1.264 | 0.001 ( +0.1% ) | 9.65 | 9.85 | -2.1% | 2.189 | 2.174 |
    | 2500.011 | 1.636 | 1.634 | 0.001 ( +0.1% ) | 5.21 | 5.30 | -1.6% | 2.354 | 2.359 |
    | 2500.012 | 1.519 | 1.517 | 0.002 ( +0.2% ) | 7.43 | 7.44 | -0.1% | 2.248 | 2.255 |
    | 2500.1 | 2.127 | 2.126 | 0.001 ( +0.0% ) | 5.40 | 5.35 | +1.0% | 2.018 | 1.920 |
    | 2500.2 | 2.238 | 2.237 | 0.001 ( +0.0% ) | 6.19 | 6.21 | -0.3% | 1.933 | 1.920 |
    | 2500.21 | 1.125 | 1.125 | 0.000 ( +0.0% ) | 4.45 | 4.40 | +1.1% | 2.211 | 2.199 |
    | 2500.211 | 1.480 | 1.479 | 0.001 ( +0.1% ) | 3.85 | 3.86 | -0.3% | 2.192 | 2.173 |
    | 2500.3 | 1.995 | 1.995 | 0.000 ( +0.0% ) | 12.93 | 12.98 | -0.4% | 1.918 | 1.825 |
    | 2500.31 | 1.202 | 1.201 | 0.001 ( +0.1% ) | 20.06 | 20.48 | -2.1% | 2.284 | 2.200 |
    | 2500.311 | 1.581 | 1.579 | 0.002 ( +0.1% ) | 14.26 | 14.14 | +0.9% | 2.373 | 2.233 |
    | 2500.4 | 1.995 | 1.995 | 0.000 ( +0.0% ) | 12.82 | 12.93 | -0.8% | 1.917 | 1.906 |

@mmusich
Copy link
Contributor

mmusich commented Oct 10, 2023

I think unit tests here failed because the external update necessary for #42953 wasn't captured yet in the IB used for tests.

@mmusich
Copy link
Contributor

mmusich commented Oct 10, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/35113/summary.html
COMMIT: 5b034aa
CMSSW: CMSSW_13_3_X_2023-10-09-2300/el8_amd64_gcc11
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40438/35113/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/35113/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-023d2d/35113/git-merge-result

Comparison Summary

Summary:

  • You potentially added 131 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 19636 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3356920
  • DQMHistoTests: Total failures: 8809
  • DQMHistoTests: Total nulls: 6
  • DQMHistoTests: Total successes: 3348083
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.023 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 4.53 ): -0.023 KiB JetMET/SUSYDQM
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 5 / 48 workflows

NANO Comparison Summary

Summary:

  • You potentially added 4 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 15
  • DQMHistoTests: Total histograms compared: 15925
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 15925
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 14 files compared)
  • Checked 31 log files, 14 edm output root files, 15 DQM output files

Nano size comparison Summary:

  • Nano ERROR: Missing ref/2500.5-size.json
  • Nano ERROR: Missing ref/2500.5-size.json
    | Sample | kb/ev | ref kb/ev | diff kb/ev | ev/s/thd | ref ev/s/thd | diff rate | mem/thd | ref mem/thd |
    | --- | --- | --- | --- | --- | --- | --- | --- | --- |
    | 2500.0 | 2.473 | 2.469 | 0.004 ( +0.2% ) | 5.30 | 5.35 | -0.8% | 2.100 | 2.130 |
    | 2500.001 | 2.616 | 2.611 | 0.005 ( +0.2% ) | 4.72 | 4.77 | -1.0% | 2.524 | 2.038 |
    | 2500.002 | 2.526 | 2.522 | 0.004 ( +0.2% ) | 4.93 | 4.97 | -0.9% | 2.516 | 2.061 |
    | 2500.01 | 1.265 | 1.264 | 0.001 ( +0.1% ) | 9.61 | 9.82 | -2.2% | 2.183 | 1.888 |
    | 2500.011 | 1.636 | 1.634 | 0.001 ( +0.1% ) | 5.19 | 5.29 | -1.9% | 2.339 | 1.910 |
    | 2500.012 | 1.519 | 1.517 | 0.002 ( +0.2% ) | 7.42 | 7.51 | -1.2% | 2.236 | 1.918 |
    | 2500.1 | 2.127 | 2.126 | 0.001 ( +0.0% ) | 5.37 | 5.37 | -0.0% | 2.015 | 1.992 |
    | 2500.2 | 2.238 | 2.237 | 0.001 ( +0.0% ) | 6.13 | 6.19 | -0.9% | 1.925 | 1.875 |
    | 2500.21 | 1.125 | 1.125 | 0.000 ( +0.0% ) | 4.44 | 4.43 | +0.2% | 2.208 | 1.900 |
    | 2500.211 | 1.480 | 1.479 | 0.001 ( +0.1% ) | 3.85 | 3.84 | +0.2% | 2.190 | 1.999 |
    | 2500.3 | 1.995 | 1.995 | 0.000 ( +0.0% ) | 12.84 | 12.90 | -0.5% | 1.913 | 1.903 |
    | 2500.31 | 1.202 | 1.201 | 0.001 ( +0.1% ) | 20.18 | 20.59 | -2.0% | 2.285 | 2.279 |
    | 2500.311 | 1.581 | 1.579 | 0.002 ( +0.1% ) | 13.90 | 14.11 | -1.5% | 2.297 | 2.358 |
    | 2500.4 | 1.995 | 1.995 | 0.000 ( +0.0% ) | 12.92 | 12.94 | -0.1% | 1.915 | 1.906 |

@rappoccio
Copy link
Contributor

+1

@vlimant
Copy link
Contributor

vlimant commented Jan 10, 2024

as a follow up here, please include a PR to comply with the autoNano syntax #42238 and we should figure out a way to modify the workflow for a MINI input that already has the scouting content, instead of using the two file solutions

'--geometry' : 'DB:Extended',
'--datatier':'NANOAOD',
'--eventcontent':'NANOAOD',
'--filein':'/store/mc/Run3Summer22MiniAODv3/BulkGravitonToHH_MX1120_MH121_TuneCP5_13p6TeV_madgraph-pythia8/MINIAODSIM/124X_mcRun3_2022_realistic_v12-v3/2810000/f9cdd76c-faac-4f24-bf0c-2496c8fffe54.root',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in the first place ; we should not be using the --filein and --secondfilein directly but use datasets instead as for the data workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.