Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on running unscheduled HLT modules together with other "steps" #36938

Open
missirol opened this issue Feb 11, 2022 · 44 comments
Open

on running unscheduled HLT modules together with other "steps" #36938

missirol opened this issue Feb 11, 2022 · 44 comments

Comments

@missirol
Copy link
Contributor

missirol commented Feb 11, 2022

This issue is related to #36138 (as suggested by @silviodonato, I open here a separate issue, in an attempt to make things clearer).

TSG plans to introduce Tasks in the HLT menus for Run 3, as a necessary step to support configurations with both CPU and GPU modules handled via SwitchProducers.

#36138 showed that the unscheduled execution of HLT modules could become problematic if triggered prematurely by OutputModules ("prematurely" according to HLT's needs); this was addressed with the introduction of FinalPath by Core-Sw experts.

It was recently verified that FinalPath does solve the issue originally discussed in #36138 when running a standalone HLT configuration.

However, there might still be a problem in wfs where HLT needs to run together with other parts of the reconstruction (or, "steps").

For example, there are TSG tests where HLT runs together with L1T-reco (L1 step) and/or Offline-reco (RECO step), and these tests currently fail if HLT uses unscheduled modules (L1 and RECO are just used as examples here; this might apply to other "steps" as well).

The technical reason is that, in such cases, cmsDriver adds OutputModules on EndPaths. This clashes with the needs of an HLT-with-Tasks (the original problem with using EndPaths). On the other hand, simply converting these EndPaths to FinalPaths could hinder the execution of L1-or-RECO modules (in the assumption that L1-or-RECO expect the execution of certain modules to happen only via the request of an OutputModule).

To reproduce the issue one can do the following:

cmsrel CMSSW_12_3_0_pre4
cd CMSSW_12_3_0_pre4/src
cmsenv
git cms-merge-topic missirol:issue_HLTWithTasksAndOtherSteps
scram b -j 4
voms-proxy-init --voms cms --rfc --valid 168:00

# 1. HLT standalone config (full config via ConfDB, not cmsDriver): this works now thanks to FinalPath
cmsRun HLTrigger/Configuration/test/OnLine_HLT_GRun.py &> OnLine_HLT_GRun_MC.log

# 2. HLT step only, via cmsDriver: this will fail as cmsDriver adds an OutputModule on an EndPath,
#    but it could be fixed by just using FinalPath instead (HLT is the only step)
cmsRun HLTrigger/Configuration/test/RelVal_HLT_GRun_MC.py &> RelVal_HLT_GRun_MC.log

# 3. HLT+L1T, via cmsDriver: this will fail like 2.,
#    but here using FinalPath might conflict with the needs of other steps
cmsRun HLTrigger/Configuration/test/RelVal_DigiL1RawHLT_GRun_MC.py &> RelVal_DigiL1RawHLT_GRun_MC.log

This issue was first discussed in #36878 (comment), where some details on the error messages were also given.

Question is: if HLT is to use Tasks, what are the possible solutions to the issue above, when the configuration contains OutputModules and runs HLT together with other "steps"?

FYI: @silviodonato @Martin-Grunewald @Sam-Harper @fwyzard

@cmsbuild
Copy link
Contributor

A new Issue was created by @missirol Marino Missiroli.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign core,hlt

@cmsbuild
Copy link
Contributor

New categories assigned: core,hlt

@missirol,@Dr15Jones,@smuzaffar,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

Pretty much everything in runTheMatrix.py relies nowadays in unscheduled execution with the DAG of producers initiated by the output module (with the modules whose execution defined by a filter properly in a Path behind such filter).

I'm curious if the issue is limited to output modules. For example, if one adds (for whatever reason) an EDAnalyzer in the same job as the HLT , and the EDAnalyzer reads a data product produced by some intermediate EDProducer, whose execution is currently dictated by an EDFilter. Would the expected behavior be that the EDAnalyzer causes that EDProducer to run or not?

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Feb 14, 2022

I think with HLT using tasks (soon, to support GPUs!), the EDAnalyser should trigger execution of the producer, but NO OM should trigger execution of any producer. (HLT is so far not yet using tasks, and then the EDanalyzer should be on an HLT path which runs required producers beforehand in that path).

@makortel
Copy link
Contributor

We would like to understand better the use case. Maybe we could discuss about this topic in e.g. a core software meeting next week? (in addition of possibly continuing the discussion here)

Currently an "unscheduled" module can be executed after all its input data products are either available, or known to be never available, with the exception that an unscheduled module whose output data products are not consumed is ignored (actually destructed early on). "Scheduled" execution (Path) makes a module to be executed even in absence of consumers, and with EDFilters it adds a constraint that a module can be executed only if all preceding EDFilters have accepted the Event.

We are wondering what are the benefits for moving EDProducers into Tasks (unscheduled) for HLT that you consider important?

(SwitchProducer just by itself does not require the use of Task, but from the chains of modules providing input to it, the modules that should run only for one of the cpu, cuda cases need to be in Task so they'd get executed only on demand)

The high-level question we are thinking about is: Which modules lead to an unscheduled module to be executed? And what would be the relationship of those modules?

@Martin-Grunewald
Copy link
Contributor

@fwyzard @Sam-Harper - please see this issue!

@missirol
Copy link
Contributor Author

@makortel Thanks for your explanations. I don't grasp all the details here, but I should have made one point clearer in the issue's description.

In the test in question, all Sequences/Paths in the HLT config were converted to use Tasks, not just those related to GPU modules or SwitchProducers. So, in my understanding, in this test we "over-taskified" (wrt what HLT strictly needs to support GPU modules), and maybe this is part of the problem. I haven't run yet the same type of tests on a version of the HLT menu where Tasks are used only where strictly needed (i.e. paths/sequences with CPU and GPU modules).

@fwyzard
Copy link
Contributor

fwyzard commented Feb 15, 2022

We are wondering what are the benefits for moving EDProducers into Tasks (unscheduled) for HLT that you consider important?

Apart from the interaction with SwitchProducers, ideally the use of Tasks can help simplify the structure of the HLT configuration.

Here's an example based on the HLT muon paths (I didn't check why we use the cms.ignored filters, I just left them as they are in the recent HLT menu):

HLT_IsoMu20_v15 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu18 +
    hltPreIsoMu20 +
    hltL1fL1sMu18L1Filtered0 +
    HLTL2muonrecoSequence +
    cms.ignore(hltL2fL1sMu18L1f0L2Filtered10Q) +
    HLTL3muonrecoSequence +
    cms.ignore(hltL1fForIterL3L1fL1sMu18L1Filtered0) +
    hltL3fL1sMu18L1f0L2f10QL3Filtered20Q +
    HLTL3muonEcalPFisorecoSequenceNoBoolsForMuons +
    hltL3fL1sMu18L1f0L2f10QL3Filtered20QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    HLTL3muonHcalPFisorecoSequenceNoBoolsForMuons +
    hltL3fL1sMu18L1f0L2f10QL3Filtered20QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    HLTTrackReconstructionForIsoL3MuonIter02 +
    hltMuonTkRelIsolationCut0p07Map +
    hltL3crIsoL1sMu18L1f0L2f10QL3f20QL3trkIsoFiltered0p07 +
    HLTEndSequence )

HLT_IsoMu24_v13 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22 +
    hltPreIsoMu24 +
    hltL1fL1sMu22L1Filtered0 +
    HLTL2muonrecoSequence +
    cms.ignore(hltL2fL1sSingleMu22L1f0L2Filtered10Q) +
    HLTL3muonrecoSequence +
    cms.ignore(hltL1fForIterL3L1fL1sMu22L1Filtered0) +
    hltL3fL1sSingleMu22L1f0L2f10QL3Filtered24Q +
    HLTL3muonEcalPFisorecoSequenceNoBoolsForMuons +
    hltL3crIsoL1sSingleMu22L1f0L2f10QL3f24QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    HLTL3muonHcalPFisorecoSequenceNoBoolsForMuons +
    hltL3crIsoL1sSingleMu22L1f0L2f10QL3f24QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    HLTTrackReconstructionForIsoL3MuonIter02 +
    hltMuonTkRelIsolationCut0p07Map +
    hltL3crIsoL1sSingleMu22L1f0L2f10QL3f24QL3trkIsoFiltered0p07 +
    HLTEndSequence )

HLT_IsoMu24_eta2p1_v15 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22 +
    hltPreIsoMu24eta2p1 +
    hltL1fL1sMu22erL1Filtered0 +
    HLTL2muonrecoSequence +
    cms.ignore(hltL2fL1sSingleMu22erL1f0L2Filtered10Q) +
    HLTL3muonrecoSequence +
    cms.ignore(hltL1fForIterL3L1fL1sMu22erL1Filtered0) +
    hltL3fL1sSingleMu22erL1f0L2f10QL3Filtered24Q +
    HLTL3muonEcalPFisorecoSequenceNoBoolsForMuons +
    hltL3crIsoL1sSingleMu22erL1f0L2f10QL3f24QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    HLTL3muonHcalPFisorecoSequenceNoBoolsForMuons +
    hltL3crIsoL1sSingleMu22erL1f0L2f10QL3f24QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    HLTTrackReconstructionForIsoL3MuonIter02 +
    hltMuonTkRelIsolationCut0p07Map +
    hltL3crIsoL1sSingleMu22erL1f0L2f10QL3f24QL3trkIsoFiltered0p07 +
    HLTEndSequence )

HLT_IsoMu27_v16 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22or25 +
    hltPreIsoMu27 +
    hltL1fL1sMu22or25L1Filtered0 +
    HLTL2muonrecoSequence +
    cms.ignore(hltL2fL1sMu22or25L1f0L2Filtered10Q) +
    HLTL3muonrecoSequence +
    cms.ignore(hltL1fForIterL3L1fL1sMu22or25L1Filtered0) +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered27Q +
    HLTL3muonEcalPFisorecoSequenceNoBoolsForMuons +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered27QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    HLTL3muonHcalPFisorecoSequenceNoBoolsForMuons +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered27QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    HLTTrackReconstructionForIsoL3MuonIter02 +
    hltMuonTkRelIsolationCut0p07Map +
    hltL3crIsoL1sMu22Or25L1f0L2f10QL3f27QL3trkIsoFiltered0p07 +
    HLTEndSequence )

HLT_IsoMu30_v4 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22or25 +
    hltPreIsoMu30 +
    hltL1fL1sMu22or25L1Filtered0 +
    HLTL2muonrecoSequence +
    cms.ignore(hltL2fL1sMu22or25L1f0L2Filtered10Q) +
    HLTL3muonrecoSequence +
    cms.ignore(hltL1fForIterL3L1fL1sMu22or25L1Filtered0) +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered30Q +
    HLTL3muonEcalPFisorecoSequenceNoBoolsForMuons +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered30QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    HLTL3muonHcalPFisorecoSequenceNoBoolsForMuons +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered30QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    HLTTrackReconstructionForIsoL3MuonIter02 +
    hltMuonTkRelIsolationCut0p07Map +
    hltL3crIsoL1sMu22Or25L1f0L2f10QL3f30QL3trkIsoFiltered0p07 +
    HLTEndSequence )

can be simplified to

HLTIsoMuonTask = cms.Task(
    HLTL2muonrecoTask,
    HLTL3muonrecoTask,
    HLTL3muonEcalPFisorecoTask,
    HLTL3muonHcalPFisorecoTask,
    HLTTrackReconstructionForIsoL3MuonIter02Task,
    hltMuonTkRelIsolationCut0p07Map)

HLT_IsoMu20_v15 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu18 +
    hltPreIsoMu20 +
    hltL1fL1sMu18L1Filtered0 +
    cms.ignore(hltL2fL1sMu18L1f0L2Filtered10Q) +
    cms.ignore(hltL1fForIterL3L1fL1sMu18L1Filtered0) +
    hltL3fL1sMu18L1f0L2f10QL3Filtered20Q +
    hltL3fL1sMu18L1f0L2f10QL3Filtered20QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    hltL3fL1sMu18L1f0L2f10QL3Filtered20QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    hltL3crIsoL1sMu18L1f0L2f10QL3f20QL3trkIsoFiltered0p07 +
    HLTEndSequence,
    HLTIsoMuonTask)

HLT_IsoMu24_v13 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22 +
    hltPreIsoMu24 +
    hltL1fL1sMu22L1Filtered0 +
    cms.ignore(hltL2fL1sSingleMu22L1f0L2Filtered10Q) +
    cms.ignore(hltL1fForIterL3L1fL1sMu22L1Filtered0) +
    hltL3fL1sSingleMu22L1f0L2f10QL3Filtered24Q +
    hltL3crIsoL1sSingleMu22L1f0L2f10QL3f24QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    hltL3crIsoL1sSingleMu22L1f0L2f10QL3f24QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    hltL3crIsoL1sSingleMu22L1f0L2f10QL3f24QL3trkIsoFiltered0p07 +
    HLTEndSequence,
    HLTIsoMuonTask)

HLT_IsoMu24_eta2p1_v15 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22 +
    hltPreIsoMu24eta2p1 +
    hltL1fL1sMu22erL1Filtered0 +
    cms.ignore(hltL2fL1sSingleMu22erL1f0L2Filtered10Q) +
    cms.ignore(hltL1fForIterL3L1fL1sMu22erL1Filtered0) +
    hltL3fL1sSingleMu22erL1f0L2f10QL3Filtered24Q +
    hltL3crIsoL1sSingleMu22erL1f0L2f10QL3f24QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    hltL3crIsoL1sSingleMu22erL1f0L2f10QL3f24QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    hltL3crIsoL1sSingleMu22erL1f0L2f10QL3f24QL3trkIsoFiltered0p07 +
    HLTEndSequence,
    HLTIsoMuonTask)

HLT_IsoMu27_v16 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22or25 +
    hltPreIsoMu27 +
    hltL1fL1sMu22or25L1Filtered0 +
    cms.ignore(hltL2fL1sMu22or25L1f0L2Filtered10Q) +
    cms.ignore(hltL1fForIterL3L1fL1sMu22or25L1Filtered0) +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered27Q +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered27QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered27QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    hltL3crIsoL1sMu22Or25L1f0L2f10QL3f27QL3trkIsoFiltered0p07 +
    HLTEndSequence,
    HLTIsoMuonTask)

HLT_IsoMu30_v4 = cms.Path(
    HLTBeginSequence +
    hltL1sSingleMu22or25 +
    hltPreIsoMu30 +
    hltL1fL1sMu22or25L1Filtered0 +
    cms.ignore(hltL2fL1sMu22or25L1f0L2Filtered10Q) +
    cms.ignore(hltL1fForIterL3L1fL1sMu22or25L1Filtered0) +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered30Q +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered30QL3pfecalIsoRhoFilteredEB0p14EE0p10 +
    hltL3fL1sMu22Or25L1f0L2f10QL3Filtered30QL3pfhcalIsoRhoFilteredHB0p16HE0p20 +
    hltL3crIsoL1sMu22Or25L1f0L2f10QL3f30QL3trkIsoFiltered0p07 +
    HLTEndSequence,
    HLTIsoMuonTask)

Would the expected behavior be that the EDAnalyzer causes that EDProducer to run or not?

That... depends :-(

[yes, I know DQM modules are technically EDProducers, not EDAnalyzers - but IIRC they only produce tokens in the EndLumi/EndRun transitions, and should behave like EDAnalyzers during the event loop; otherwise, just replace "DQM module" with a generic EDAnalyzer in the discussion that follows]

In this example, if we add a DQM module that consumes the tracks associated to the L3 muons and their isolation cone, produced by some modules in the HLTIsoMuonTask, most likely we would like to monitor only the tracks that have "already" been produced while running the HLT_IsoMu* paths - if any; we would not want this DQM module to cause the muon reconstruction to run on every event !

In this case, adding filters in from of the DQM module might be enough: if we require that any of the HLT_IsoMu* paths have fired before running the DQM module, the framework should be able to schedule it accordingly, and not cause the muon reconstruction to run unnecessarily.

However, as the information to be monitored in an EDAnalyzer (or "kept" in an OutputModule) gets more complex, using this kind of explicit scheduling becomes impossible.
Consider the case where we monitor in a single module the isolation of both muons and electrons: such module needs to consume the muon-related tracks, and the electron-related tracks.
If we put in front of it a filter that says "run only if the electron paths OR muon paths have fired", the module will cause the electron reconstruction to be run if the muon paths fire - and vice versa.
If we put in front of it a filter that says "run only if the electron paths AND muon paths have fired", the module will not cause any unnecessary reconstruction, but will monitor only a very small subset of the events.

Another case I'm not sure about (again an EDProducer, though) is how do we handle the TriggerSummaryProducerAOD and TriggerSummaryProducerRAW ?
Won't they cause additional "unwanted" parts of the HLT reconstruction to run ?

So, for HLT-related collections, my impression is that we do not want an EDAnalyzer or OutputModule to cause additional modules to be run... do we ?

However, as soon as we mix the HLT configuration with the rest of the CMSSW configurations, those parts likely need the current logic that

@makortel
Copy link
Contributor

Thanks @fwyzard for detailed reply. I used an EDAnalyzer just as an example, and indeed any other module outside of the trigger paths would serve as a similar example purpose as well.

It seems to me that the fundamental scheduling issue is not really specific to OutputModules, and that with Tasks there would still be a strong relationship between the modules directly in the Path and the modules in the Task associated to the Path. (in contrast of adding the modules in the Task into a "global pool" of modules that could be run "on demand")

How about a Task-like configuration construct that would bind the contained modules strongly to the Path? The modules would be specified as a set (i.e. without order), and framework would figure out their execution order within the Path according to their data dependencies (and if nothing in the Path depends on some module, that module would be ignored/deleted). Any consumption from a module outside of the Path would not cause those modules to be executed.

Would anything in the HLT care if the modules in this Task-like construct would, on the C++ side, end up to look like they would be in the Path? This would be visible at least in PlaceInPathContext in

PlaceInPathContext const* placeInPathContext() const { return parent_.placeInPathContext(); }

@fwyzard
Copy link
Contributor

fwyzard commented Feb 16, 2022

How about a Task-like configuration construct that would bind the contained modules strongly to the Path? The modules would be specified as a set (i.e. without order), and framework would figure out their execution order within the Path according to their data dependencies (and if nothing in the Path depends on some module, that module would be ignored/deleted). Any consumption from a module outside of the Path would not cause those modules to be executed.

Seems like a reasonable approach -- but I'll need some time to think about the possible implications, especially when we need to "mix" the HLT with other steps like the L1 emulation, use of DQM modules, etc.

@Martin-Grunewald @missirol what do you think ?

@missirol
Copy link
Contributor Author

I think it sounds like a good approach (thanks, Matti). If there is a prototype, or a way to test it, I'm happy to give it a try.

I'm not the best person to comment on ConfDB, but ConfDB supports Tasks, and if this approach results in just having to use cms.[TaskNewName] instead of cms.Task, I imagine it would not be a problem.

@Martin-Grunewald
Copy link
Contributor

Seems ok to work. Indeed I need to correct my statement: adding an EDAnalyzer should not trigger execution of producers, the EDAnalyzer should fail gracefully if the collection is not there.

In fact, for the Scouting triggers, we have a set of 'packer modules' which pack up event info in a condensed dataformat. Also these producers, currently run in an EndPath just in front of an OM, should NOT trigger execution of the producers, just pack up what is already in the Event - see discussion with more details here: https://its.cern.ch/jira/browse/CMSHLT-2231

@makortel
Copy link
Contributor

Thanks for the feedback! It sounds to me like we could proceed to a prototype (without specific discussion in core sw meeting, unless anyone feels such would be useful)

@Martin-Grunewald
Copy link
Contributor

Any news on this?

@Dr15Jones
Copy link
Contributor

I have added all the python code needed to add a new ConditionalTask class to our configuration system. Every python unit test that tested Tasks have now been replicated to also test CondtionalTasks and those new tests now all pass.

I'm in the process of figuring out exactly how I want to pass the python info into the C++. Once I'm happy with that, the last step will be to get the C++ to inject the modules into the correct parts of our scheduling system.

@Martin-Grunewald
Copy link
Contributor

Conditional Task's PR: #37305

@makortel
Copy link
Contributor

makortel commented Apr 4, 2022

The ConditionalTask in #37305 comes with two caveats:

  • modules in ConditionalTask that are depended on only by SwitchProducer case-EDModules get treated as unscheduled modules
  • modules in ConditionalTask but not consumed by any module in the Path or the ConditionalTasks associated with the Path get treated as unscheduled modules

We are going to work on these (with the SwithcProducer one on higher priority), but would you (@cms-sw/hlt-l2) be able to try the ConditionalTask out if it works for you (within the caveats above; of course after it gets merged etc)? We'd greatly appreciate early feedback. I believe in absence of SwitchProducers replacing all Task with ConditionalTask should work properly.

@missirol
Copy link
Contributor Author

missirol commented Apr 6, 2022

@makortel @Dr15Jones (cc: @fwyzard)

Continuing from the discussion started in #37305 (comment), I'm trying to do one check related to ConditionalTasks, and I see something I don't understand. I share it here in case others can have a look.

In this check, I took the HLT menu available in CMSSW, customised it with the addition of GPU modules (this is done with customiseHLTforPatatrack) but using ConditionalTasks instead of Tasks (I literally just changed Tasks to ConditionalTasks in customiseHLTforPatatrack). This type of setup would be the simplest to have a full HLT config with GPU offloading, using (Conditional)Tasks only where strictly needed (pixel/ECAL/HCAL reco).

The test can be reproduced as follows:

cmsrel CMSSW_12_4_X_2022-04-05-1100
cd CMSSW_12_4_X_2022-04-05-1100/src
cmsenv
# Add ConditionalTask
git cms-merge-topic cms-sw:37305
# Changes to use customiseHLTforPatatrack with ConditionalTasks
git cms-merge-topic missirol:hltTest_conditionalTasksInCustomPatatrack
# Build: takes a while, many pkgs are checked out
scram b -j 8
# Run test on a file on lxplus
cmsRun HLTrigger/Configuration/test/OnLine_HLT_GRun.py realData=False globalTag=@ \
  inputFiles=file:/afs/cern.ch/work/m/missirol/public/cmssw36938/RelVal_Raw_GRun_MC.root \
  &> OnLine_HLT_GRun_withConditionalTasks.log

What I see is that:

  • the test described above fails with a seg-fault (see this log for the stack trace); this ran with 4 threads (and 4 streams);

  • if I run the same job with 1 thread (and 1 stream), the job works;

  • if I revert to using Tasks instead of ConditionalTasks, the job works with 4 threads (this makes sense, as HLT+customiseHLTforPatatrack, with Tasks, already runs okay in GPU PR tests).

I hope I'm not missing something obvious. I'd be grateful for any insight.

@makortel
Copy link
Contributor

makortel commented Apr 6, 2022

Thanks @missirol for the test. The stack strace looks strange (not hinting to anything obvious). I'm taking a look.

@makortel
Copy link
Contributor

makortel commented Apr 6, 2022

I was able to reproduce the crash, and to craft a simple test case that also crashes when run with >= 2 streams (i.e. 1-thread 2-stream case crashes). It turns out there is a deeper issue in how ConditionalTask and SwitchProducer interact that leads to the crash. In short the two don't really work together yet (cases that don't crash are more of accidental), but it seems that this aspect should get automatically fixed when addressing the SwitchProducer caveat I mentioned in #36938 (comment).

@missirol
Copy link
Contributor Author

Hi @makortel @Dr15Jones ,

I wanted to ask feedback on one example config involving ConditionalTask, SwitchProducer, and EDAlias.

This is the config: https://gist.github.com/missirol/34a7ff84d801c9f006fc6cfc9b7a0a27
(It heavily borrows from a test in FWCore/Integration.)

It is adapted to mimic actual Paths in the latest HLT menu. These Paths (DQM_*Reco*) are meant to run reconstruction only on GPU machines (there is a statusOnGPUFilter at the beginning of the Path), and they are meant to run both CPU and GPU producers on the same events. The aim is to then write both CPU and GPU products in the output file.

The example runs in two ways, and we see the following.

  • Case 1 ("GPU"): cmsRun testCondTaskWithSwitchProducer_cfg.py enableGPU # usa branch "GPU".

    • The filter passes, and all the products are in the output (as intended).
  • Case 2 ("CPU"): cmsRun testCondTaskWithSwitchProducer_cfg.py enableCPU # usa branch "CPU".

    • The filter fails, the CPU product does not appear in the output (as intended), but the GPU one does (unexpected).
    • If we un-comment L54 ("intProducerGPU"), the the GPU product does not appear in the output (I guess because the module is seen as consumed then, thus becomes scheduled).

@fwyzard tested all cases in this example, incl. changing Task<->ConditionalTask, and also EndPath<->FinalPath.

task consume output CPU collection ? GPU collection ?
Task none EndPath ✔️ ✔️
Task products EndPath ✔️ ✔️
Task aliases EndPath ✔️ ✔️
Task both EndPath ✔️ ✔️
ConditionalTask none EndPath ✔️ ✔️
ConditionalTask products EndPath
ConditionalTask aliases EndPath ✔️
ConditionalTask both EndPath
Task none FinalPath
Task products FinalPath
Task aliases FinalPath
Task both FinalPath

Do you think "Case 2" could be fixed to give the expected outcome, i.e. GPU module not executed and not written to output (row 7 in the table)?

@fwyzard
Copy link
Contributor

fwyzard commented May 17, 2022

Note that in the table, ❌ is the good behaviour :-)

@fwyzard
Copy link
Contributor

fwyzard commented May 17, 2022

IMHO the current behaviour is difficult to reason with, making it hard to use a ConditionalTask in practice.

I think the behaviour should be the same whether a product is consumed directly (e.g. intProducerCPU, intProducerGPU), or using an explicit branch in a SwitchProducer (e.g. intProducer@cpu, intProducer@gpu).

The behaviour should also not depend on whether the consuming module is actually run, or prevented to run by a failing filter.
But maybe this is already the case once the previous issue is fixed ?

@makortel
Copy link
Contributor

  • The filter fails, the CPU product does not appear in the output (as intended), but the GPU one does (unexpected).

Maybe we're missing a check for the explicit consumption of SwitchProducer case-branches in the code that resolves the module dependencies for ConditionalTask. I'll take a look (thanks for the test configuration!).

@makortel
Copy link
Contributor

  • The filter fails, the CPU product does not appear in the output (as intended), but the GPU one does (unexpected).

Maybe we're missing a check for the explicit consumption of SwitchProducer case-branches in the code that resolves the module dependencies for ConditionalTask. I'll take a look (thanks for the test configuration!).

The culprit wasn't exactly that, but a logic error (or oversight) for how the non-chosen EDAlias cases were treated in conjunction with ConditionalTask. #38006 provides a fix (and I'll backport that to 12_4_X).

@missirol
Copy link
Contributor Author

Thanks @makortel !

@missirol
Copy link
Contributor Author

missirol commented Jun 9, 2022

Hi @Dr15Jones , Matti ,

while testing recent HLT menus with ConditionalTasks, @Sam-Harper found a problem which we are trying to understand.

I think I managed to translate the issue into a minimal example, which you find below [*].

Running cmsRun cfg.py enableGPU, one gets the following error message

----- Begin Fatal Exception 09-Jun-2022 10:29:47 CEST-----------------------
An exception of category 'ScheduleExecutionFailure' occurred while
   [0] Calling beginJob
Exception Message:
Unrunnable schedule
The Path/EndPath configuration could cause the job to deadlock
  module 'a1' is on path 'thePath' and depends on module 'intProducerCPU'
  module 'intProducerCPU' is on path 'thePath' and follows module 'a1' on the path
----- End Fatal Exception -------------------------------------------------

The error is related to the branch of the SwitchProducer which is not enabled (because the test uses "enableGPU").

One finds that any one of the following changes will make the configuration work:

  • if a1 is removed, it works
  • if a1 is moved to the end of the Path, it works
  • if t is changed from ConditionalTask to Task, it works
  • if p2 is changed from an EDProducer to an EDAnalyzer, it works

Could you please have a look?

[*]

# cfg.py
import FWCore.ParameterSet.Config as cms

import sys
enableGPU = (sys.argv[-1] == 'enableGPU')
class SwitchProducerTest(cms.SwitchProducer):
  def __init__(self, **kargs):
    super(SwitchProducerTest,self).__init__(
      dict(
        cpu = lambda accelerators: (True, -10),
        gpu = lambda accelerators: (enableGPU, -9)
      ), **kargs)

process = cms.Process('TEST')

process.maxEvents.input = 10
process.options.numberOfThreads = 1
process.options.numberOfStreams = 1
process.options.numberOfConcurrentRuns = 1
process.options.numberOfConcurrentLuminosityBlocks = 1

process.source = cms.Source('EmptySource')

process.intProducerCPU = cms.EDProducer('ManyIntProducer', ivalue = cms.int32(1))
process.intProducerGPU = cms.EDProducer('ManyIntProducer', ivalue = cms.int32(2))
process.intProducer = SwitchProducerTest(
  cpu = cms.EDAlias(intProducerCPU = cms.VPSet(cms.PSet(type = cms.string('*')))),
  gpu = cms.EDAlias(intProducerGPU = cms.VPSet(cms.PSet(type = cms.string('*')))),
)

process.f = cms.EDFilter('HLTBool',
  result = cms.bool(enableGPU)
)

process.t = cms.ConditionalTask(
  process.intProducerCPU,
  process.intProducerGPU,
  process.intProducer,
)

process.a1 = cms.EDAnalyzer('GenericConsumer',
  eventProducts = cms.untracked.vstring(
    'intProducer@cpu',
    'intProducer@gpu',
  ),
  lumiProducts = cms.untracked.vstring(),
  runProducts = cms.untracked.vstring(),
)

process.s1 = cms.Sequence(process.t)

process.p2 = cms.EDProducer("AddIntsProducer", labels = cms.VInputTag("intProducer@cpu","intProducer@gpu"))
process.s2 = cms.Sequence(process.p2)

process.thePath = cms.Path(
    process.f
  + process.s1
  + process.a1
  + process.s2
)

@missirol
Copy link
Contributor Author

missirol commented Jun 9, 2022

I forgot to add that the test in #36938 (comment) was done in 12_4_0_pre4.

@missirol
Copy link
Contributor Author

missirol commented Jun 9, 2022

I forgot to add that the test in #36938 (comment) was done in 12_4_0_pre4.

False alarm. The problem was that I was testing in CMSSW_12_4_0_pre4.

@fwyzard clarified to me that the issue is not present in CMSSW_12_5_0_pre2, as well as recent CMSSW_12_4_X IBs (e.g. CMSSW_12_4_X_2022-06-08-2300).

This is most likely because #38006 (#38015) not only fixes #36938 (comment) , but also #36938 (comment).

I also checked that the non-minimal example (i.e. the test with the actual HLT menu) works in CMSSW_12_5_0_pre2.

The conclusion would be that, for HLT's purposes, ConditionalTasks are fully working starting from 12_4_0 and 12_5_0_pre2.

Sorry for the noise!

@Dr15Jones
Copy link
Contributor

@missirol glad you were able to solve the problem! Also go to hear that the last version of ConditionalTask is working for you.

@missirol
Copy link
Contributor Author

(I continue in this issue, since much about ConditionalTasks has been discussed here so far)

Hi @makortel @Dr15Jones ,

during HLT-menu development (CMSHLT-2454), we came across a runtime error that seems related to ConditionalTasks.

Error:

----- Begin Fatal Exception 14-Sep-2022 17:14:40 CEST-----------------------
An exception of category 'ScheduleExecutionFailure' occurred while
   [0] Calling beginJob
Exception Message:
Unrunnable schedule
Paths are non consistent
  module 'hltEcalDigis' depends on 'hltEcalDigisLegacy' which appears on paths
  DQM_EcalReconstruction_v3 
but is missing from
  HLT_PFJet40_v22 HLT_PFJet400_v21 
----- End Fatal Exception -------------------------------------------------

The config is attached; the test was done in CMSSW_12_4_8.

The error occurs when the cuda branch of the HLT SwitchProducers is enabled (so, running on GPUs).

A recent change we made is that now one SwitchProducer has a branch with 2 VPSets, and one producer is called by both branches:

hltEcalDigis = SwitchProducerCUDA(
   cpu = cms.EDAlias(
       hltEcalDigisLegacy = cms.VPSet( 
         [..]
       )
   ),
   cuda = cms.EDAlias(
       hltEcalDigisFromGPU = cms.VPSet( 
         [..]
       ),
       hltEcalDigisLegacy = cms.VPSet( 
         [..]
       )
   ),
 )

With any one of the following 3 changes, the runtime error does not occur.

  1. Change cms.ConditionalTask to cms.Task.
  2. Remove hltEcalDigis@cpu from hltEcalConsumerCPU.eventProducts (just replacing hltEcalDigis@cpu with hltEcalDigisLegacy was not enough).
  3. Using options.accelerators = ['cpu'].

Questions:

  • Could you please have a look?
  • We plan to use workaround-2 until a better solution is available. Do you see any side-effects with this? With workaround-2, it seems like hltEcalDigisLegacy is executed as intended, even if not called by hltEcalConsumerCPU, since it is now called as part of hltEcalDigis@cuda.

@makortel
Copy link
Contributor

Thanks @missirol for the report, we are taking a look. I was able to reproduce the problem, and from first poking around the cause for the exception is not immediately clear.

I see hltEcalDigis and hltEcalDigisLegacy are in the same ConditionalTask (HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask), so I'd expect wherever hltEcalDigis gets used, hltEcalDigisLegacy would be available as well (for the code which inserts the modules from ConditionalTask into the Path).

@makortel
Copy link
Contributor

makortel commented Sep 14, 2022

I think I have identified the problem, and it is related to hltEcalDigisFromGPU part of the alias matching to anything produced by the module

        hltEcalDigisFromGPU = cms.VPSet(cms.PSet(
            type = cms.string('*')
        )),

and that a certain std::multimap happened to store the entry for hltEcalDigisFromGPU before hltEcalDigisLegacy.

Assuming there is nothing else going on, specifying the products aliased from hltEcalDigisFromGPU explicitly, rather than using the * pattern, should make the configuration functional. I'll look into a more proper fix as well.

@missirol
Copy link
Contributor Author

Thanks, Matti. This works and gives us a good quick solution, I think.

@makortel
Copy link
Contributor

#39409 has a fix. I'll make backports to 12_5_X and 12_4_X after we settle on the content.

@makortel
Copy link
Contributor

#39409 has a fix. I'll make backports to 12_5_X and 12_4_X after we settle on the content.

#39409 was merged, backports are in #39529 and #39530

@makortel
Copy link
Contributor

#43296 implements the printout of ConditionalTask modules that are not consumed by any module in any of the Paths the ConditionalTask (in question) is associated with, as discussed with some of the HLT folks. Per that discussion this PR is likely the last development for ConditionalTask (unless some new use case pops up).

@missirol
Copy link
Contributor Author

as discussed with some of the HLT folks

Just for record, this discussion happened here (details in the minutes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants