Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase-2 Global Trigger Emulator #41808

Merged
merged 1 commit into from
Jul 25, 2023

Conversation

jheikkil
Copy link
Contributor

@jheikkil jheikkil commented May 30, 2023

PR description:

This PR includes the first version of the Phase-2 Global Trigger emulator that implements a bit-wise compatible emulation of the GT firmware. The PR includes the HLT-TDR simplified L1 menu (16 seeds), documented in this twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/PhaseIIL1TriggerMenuTools#How_to_use_the_GT_Emulator

As the Phase-2 Global Trigger emulator relies on various upstream emulators, this PR can be merged once:

PR validation:

The expected rates have been presented at the Annual Review (https://indico.cern.ch/event/1257943/#7-algorithms-and-physics-perfo).

-Jaana on behalf of the Phase-2 Global Trigger team

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41808/35693

ERROR: Build errors found during clang-tidy run.

L1Trigger/Phase2L1GT/plugins/L1GTProducer.cc:301:17: error: no viable conversion from 'const PackedJet' (aka 'const array<unsigned long, 2>') to 'l1gt::Jet' [clang-diagnostic-error]
      l1gt::Jet gtJet = collection[i].getHWJetGT();
                ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02787/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_2_X_2023-05-29-2300/src/DataFormats/L1TParticleFlow/interface/gt_datatypes.h:80:10: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const PackedJet' (aka 'const array<unsigned long, 2>') to 'const Jet &' for 1st argument
--
L1Trigger/Phase2L1GT/plugins/L1GTProducer.cc:370:39: error: no member named 'getHWTauGT' in 'l1t::PFTau' [clang-diagnostic-error]
      l1gt::Tau gtTau = collection[i].getHWTauGT();
                        ~~~~~~~~~~~~~ ^
L1Trigger/Phase2L1GT/plugins/L1GTProducer.cc:386:35: warning: std::move of the variable 'gtObj' of the trivially-copyable type 'P2GTCandidate' has no effect [performance-move-const-arg]
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41808/35695

ERROR: Build errors found during clang-tidy run.

L1Trigger/Phase2L1GT/plugins/L1GTBoardWriter.cc:60:31: note: 'channel' declared here
--
L1Trigger/Phase2L1GT/plugins/L1GTBoardWriter.cc:61:106: error: variable 'channel' cannot be implicitly captured in a lambda with no capture-default specified [clang-diagnostic-error]
              channelMap.insert({l1t::demo::LinkId{"Algos", channel}, {l1t::demo::ChannelSpec{1, 0, 0}, {channel}}});
                                                                                                         ^
L1Trigger/Phase2L1GT/plugins/L1GTBoardWriter.cc:60:31: note: 'channel' declared here
--
L1Trigger/Phase2L1GT/plugins/L1GTBoardWriter.cc:63:20: error: variable 'channelMap' cannot be implicitly captured in a lambda with no capture-default specified [clang-diagnostic-error]
            return channelMap;
                   ^
L1Trigger/Phase2L1GT/plugins/L1GTBoardWriter.cc:59:54: note: 'channelMap' declared here
--
L1Trigger/Phase2L1GT/plugins/L1GTEvaluationProducer.cc:111:53: error: no member named 'EMP' in 'l1t::demo::FileFormat' [clang-diagnostic-error]
                           : l1t::demo::FileFormat::EMP,
                             ~~~~~~~~~~~~~~~~~~~~~~~^
Suppressed 1235 warnings (1235 in non-user code).
--
L1Trigger/Phase2L1GT/plugins/L1GTProducer.cc:370:39: error: no member named 'getHWTauGT' in 'l1t::PFTau' [clang-diagnostic-error]
      l1gt::Tau gtTau = collection[i].getHWTauGT();
                        ~~~~~~~~~~~~~ ^
Suppressed 1141 warnings (1141 in non-user code).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41808/35696

ERROR: Build errors found during clang-tidy run.

L1Trigger/Phase2L1GT/plugins/L1GTProducer.cc:370:39: error: no member named 'getHWTauGT' in 'l1t::PFTau' [clang-diagnostic-error]
      l1gt::Tau gtTau = collection[i].getHWTauGT();
                        ~~~~~~~~~~~~~ ^
Suppressed 1141 warnings (1141 in non-user code).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 2, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41808/35753

ERROR: Build errors found during clang-tidy run.

L1Trigger/Phase2L1GT/plugins/L1GTProducer.cc:370:39: error: no member named 'getHWTauGT' in 'l1t::PFTau' [clang-diagnostic-error]
      l1gt::Tau gtTau = collection[i].getHWTauGT();
                        ~~~~~~~~~~~~~ ^
Suppressed 1141 warnings (1141 in non-user code).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@aloeliger
Copy link
Contributor

This requires #41492

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-59f559/33830/summary.html
COMMIT: 2c267b2
CMSSW: CMSSW_13_3_X_2023-07-21-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41808/33830/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 15 lines from the logs
  • Reco comparison results: 52 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3149863
  • DQMHistoTests: Total failures: 225
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3149616
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@aloeliger
Copy link
Contributor

+l1

  • @cms-sw/upgrade-l2 At this stage I think HLT and core software are okay with the idiosyncrasies of this emulator and we can plan to upgrade it as we go along to patch it into phase 2. Is there anything else you would liked addressed from your side?

@srimanob
Copy link
Contributor

+Upgrade

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

@aloeliger
Copy link
Contributor

@makortel @fwyzard At this point I want to ask both of you if you have any advice on the question of a menu parser. I talked about this a bit here: #41808 (comment) and here: #41808 (comment).

The complication is that for phase-2, whenever one wants to run the GT now, one has to load a series of modules corresponding to different paths or conditions, create (or load) paths for each of them, and have them attached to the process alongside two other GT components. I.e. to run the GT in a configuration, one must simultaneously specify what the menu is, there is no separation anymore on the questions of "Is the GT emulator running?" and "What is the GT emulator menu?". There is no XML file it reads from. It's all python modules now.

So there are two questions I want to try and sort out:

  1. This is onerous to write by hand every time for someone writing a config by hand (i.e. you don't want to write out 512 paths and modules attached to a process every time). Presumably the L1 menu will pull from a lot of pre-defined places, so is there a good or best tool to use for defining a file to load from, loading/creating the modules and paths and attaching them to process, and then making sure they are in the schedule? The GT team has provided a tool that functions in this capacity, collectAlgorithmPaths, here: https://github.com/cms-sw/cmssw/pull/41808/files#diff-e31eafd3c6618dda2a3823af5957890f581204bf99538a0ef3dc42597b0a9b01R11-R23, where each file that defines a menu element, also adds it to a variable algorithms, and then by loading each file you fill this variable with desired modules, and it gives you a series of paths to be run. Is this an acceptable solution?
  2. A primary use case is of course going to involve the cmsDriver. The same question exists as to how we want to specify a menu to be loaded (i.e. getting and loading the modules, attaching to the process, etc.) in a default way, and how one would do it for a non-default case (i.e. testing out a new menu or running a different L1 menu) when using the cmsDriver. I had to tackle this for some of our prototype use, and I came up with this: https://github.com/cms-l1t-offline/cmssw/blob/315e0b6cbcc55bec5b5eecbe955f25f68db9daef/Configuration/Applications/python/ConfigBuilder.py#L1523-L1579, which I didn't think looked like a long term solution (not the least of which because it just hard-codes the menu). Is there a better idea for using the new menu functionality at the cmsDriver? Could or should this sort of solution be adapted for cmsDriver?

(@qvyz Just tagging you on this, because we had discussed it a little)

@srimanob
Copy link
Contributor

Hi @aloeliger
Do you think it is better to collect points and open the git issue to discuss? So we can keep issue as opening until solved.
Thx.

@aloeliger
Copy link
Contributor

Hi @aloeliger Do you think it is better to collect points and open the git issue to discuss? So we can keep issue as opening until solved. Thx.

Probably. I'll open an issue.

Comment on lines +11 to +23
def collectAlgorithmPaths(process) -> "tuple[cms.Path]":
str_paths = set()
for algorithm in algorithms:
algo_paths = re.sub(r'[()]'," " , algorithm.expression.value()).split()
for algo in algo_paths:
if algo in process.pathNames() :
str_paths.add(algo)
paths = set()

for str_path in str_paths:
paths.add(getattr(process, str_path))

return tuple(paths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this approach can potentially return the paths in a different order on each invocation.
This does happen in production, and results in jobs with a different process hash.
The framework then considers them different configurations and does not fully merge the resulting files; for example, they appear to belong to different runs (even though the run number is the same).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard I see, so in principle if the order is deterministic there shouldn't be a problem, right? So wouldn't a simple sorted(str_paths) then do the trick?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should fix the reproducibility problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard Would you mind either checking whether this indeed solves the issue or pointing me to how I can reproduce it myself before submitting a PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.

You could try running the cmsDriver command used in production a few times, potentially on different machines (like lxplus8 and lxplus9) and check

  • that sometimes you get different ordering with the current code
  • that you always get the same order with the fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be feasible to insert them in the process using the same order ?
Or it's not meaningful anyway ?

Copy link
Contributor

@aloeliger aloeliger Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could be ordered by seed number in the menu, but I don't think that is meaningful enough in this context that we truly need to order them that way as paths as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be feasible to insert them in the process using the same order ? Or it's not meaningful anyway ?

Given that an algorithm can be a combination of multiple paths and in principle a path can be used in multiple algorithms (although it's unclear whether we should deem the latter case bad practice), I'm not entirely sure how meaningful any path order actually is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #46303. @fwyzard or @rovere would it be possible to request that one of you retests the issues with this PR pulled in? I'm not quite sure what you have done to recreate this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rovere simply used the samples from the official Spring24 production; a test would be to re-produce the datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.