Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HLT-Scouting collections to MINIAOD event content (follow-up of #42863) #43327

Merged
merged 1 commit into from
Nov 20, 2023

Conversation

missirol
Copy link
Contributor

@missirol missirol commented Nov 18, 2023

PR description:

#42863 added the HLT-Scouting collections to the MINIAODSIM event content (i.e. MINIAOD content of standard MC samples). This PR suggests two improvements on top of #42863.

  1. Add the HLT-Scouting collections to the MINIAOD event content (not only to MINIAODSIM), as per request of the Scouting group.

    • Pro: for Primary Datasets (PDs, real data) whose RAW event content includes HLT-Scouting objects, the latter objects will also be included in MINIAOD. Right now, only one such PD exists, named ScoutingPFMonitor. It is used for offline studies related to Scouting, and the Scouting group currently uses a workflow with "two-file solution" to access offline objects from MINIAOD and HLT-Scouting objects from AOD. Having both in MINIAOD will simplify this workflow significantly. Note that this change (adding HLT-Scouting to MINIAOD) has no impact on any other PD, to my knowledge, as those PDs do not retain HLT-Scouting objects in RAW in the first place.

    • Con 1: the size of the MINIAOD samples of the ScoutingPFMonitor PD will increase. This size increase has not been quantified. It is assumed to be at most 10% based on the checks done in Adding the scouting event content to MINIAODSIM #42863. Since this only applies to a single PD with relatively low rate (below ~40 Hz during normal pp data-taking in 2023), I dare say this cost is rather small. For example, if I check the total size of all the Run2023 MINIAOD samples on DAS, I get 1.65 PB. If I restrict that to the ScoutingPFMonitor PDs, I get 2.8 TB (0.16% of the total).

      rm -f tmp.txt
      for ddd in $(dasgoclient -query "dataset dataset=/*/*Run2023*/*MINIAOD* status=VALID"); do
        dasgoclient -query "file dataset=$ddd | sum(file.size)" >> tmp.txt
      done
      cat tmp.txt | awk '{sum += $2} END {print sum}'
      
      rm -f tmp.txt
      for ddd in $(dasgoclient -query "dataset dataset=/*Scouting*/*Run2023*/*MINIAOD* status=VALID"); do
        dasgoclient -query "file dataset=$ddd | sum(file.size)" >> tmp.txt
      done
      cat tmp.txt | awk '{sum += $2} END {print sum}'
    • Con 2: the size of MINIAOD samples derived from data tiers such as FEVTDEBUGHLT will also increase (again by a guess-stimated ~10% or less). I do not know this kind of use cases in detail. I see this happens, for example, in wfs such as 141.001 where there is a reHLT step on data with --eventcontent FEVTDEBUGHLT (followed by a 2nd step with RECO, MINI, NANO, etc). Here too, I would guess this use case is limited, and the overall cost of this increase could be considered small.

  2. Integrate this better in the way HLT currently provides collections to the 'central' event contents in CMSSW. This PR defines a PSet HLTriggerMINIAOD in HLTrigger/Configuration (HLTriggerMINIAOD in this PR includes only the HLT Scouting event content), similarly to the way HLTriggerAOD and others are defined. This part of the PR is purely technical, it's just meant to homogenise how extra HLT-related collections are inserted in different data tiers.

HLTrigger_EventContent_cff.py was not modified directly, but recreated by running an updated version of HLTrigger/Configuration/test/getEventContent.py.

If approved, I would suggest to backport this PR to CMSSW_13_3_X to keep HLTrigger/Configuration as similar as possible in 13_3_X (currently used for HLT-menu development) and later cycles (and to cover the unlikely scenario of taking data relevant to Scouting in 2024 with 13_3_X).

(Since changes to HLTrigger/Configuration/test are normally done only by @cms-sw/hlt-l2, I could also close this PR and let this update be done by TSG/STORM in one of the next HLT PRs.)

Attn: @elfontan @kelmorab (TSG/Scouting conveners)

PR validation:

Ran a couple of runTheMatrix.py wfs for Run-3 data and MC, and checked that the HLT-Scouting collections are present in the MINIAOD(SIM) outputs.

If this PR is a backport, please specify the original PR and why you need to backport that PR. If this PR will be backported, please specify to which release cycle the backport is meant for:

CMSSW_13_3_X

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43327/37777

  • This PR adds an extra 24KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @missirol (Marino Missiroli) for master.

It involves the following packages:

  • Configuration/EventContent (operations)
  • HLTrigger/Configuration (hlt)

@cmsbuild, @rappoccio, @mmusich, @fabiocos, @antoniovilela, @davidlange6, @Martin-Grunewald can you please review it and eventually sign? Thanks.
@fabiocos, @Martin-Grunewald, @silviodonato this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Nov 18, 2023

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-29b123/35941/summary.html
COMMIT: 86ee6b5
CMSSW: CMSSW_14_0_X_2023-11-18-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43327/35941/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
141.001 step 2
141.008505 step 2
141.008521 step 2
141.112 step 2
141.11 step 2
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially removed 365 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 141 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363868
  • DQMHistoTests: Total failures: 2389
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3361457
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@Martin-Grunewald
Copy link
Contributor

@missirol
Hmm, on one side I see the solution adopted here given the precedent. On the other side it looks like HLT is (ab)used to solve offline probems. Since it is straight forward and I do not have a better solution, OK, let's go ahead.
Could you please make the 13_3 backport PR?
Thanks!

@mmusich
Copy link
Contributor

mmusich commented Nov 20, 2023

+hlt

@rappoccio
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit 3f852f7 into cms-sw:master Nov 20, 2023
11 checks passed
@missirol missirol deleted the devel_hltScoutingInMINIAOD branch September 13, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants