Move CaloL1 online DQM to Global #32004

aloeliger · 2020-11-02T17:03:10Z

PR description:

This PR modifies the CaloL1 DQM online module to become "Global" format to comply with the concurrent LS processing introduced for offline DQM as a part of #25090. This module preserves all monitoring elements (and therefore summary tests) by use of maps containing relevant luminosity block information as part of a CaloL1 namespace structure used as a part of the run cache. This PR should make PR #31678 obsolete.

PR validation:

All code compiles, and I have checked that a min bias data workflow reproduces the same histograms between the old online module and new online module. I have also verified that error plots between the old and new version of the online module yield the same error information in the same way by comparing on Run 315273 (a 2018 HI run with HCAL mismatches at CaloL1). Online monitoring should remain unchanged. As well, the PR seems to pass the basic test procedure suggested in the CMSSW PR instructions.

if this PR is a backport please specify the original PR and why you need to backport that PR:

This PR is not a backport

cmsbuild · 2020-11-02T17:04:12Z

The code-checks are being triggered in jenkins.

cmsbuild · 2020-11-02T17:11:52Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32004/19521

This PR adds an extra 28KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File DQM/L1TMonitor/src/L1TStage2CaloLayer1.cc modified in PR(s): Clean BuildFiles in EventFilter #31898

cmsbuild · 2020-11-02T17:12:23Z

A new Pull Request was created by @aloeliger for master.

It involves the following packages:

DQM/L1TMonitor

@andrius-k, @kmaeshima, @ErnestaP, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

jfernan2 · 2020-11-02T17:58:13Z

please test

cmsbuild · 2020-11-02T17:58:34Z

The tests are being triggered in jenkins.

CMSSW_11_2_X_2020-11-02-1100/slc7_amd64_gcc820: https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/10446/console Started: 2020/11/02 19:00

cmsbuild · 2020-11-02T19:19:48Z

+1
Tested at: ca6de1d
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-06ec13/10446/summary.html
CMSSW: CMSSW_11_2_X_2020-11-02-1100
SCRAM_ARCH: slc7_amd64_gcc820

cmsbuild · 2020-11-02T19:19:50Z

Comparison job queued.

makortel · 2020-11-02T19:43:52Z

DQM/L1TMonitor/src/L1TStage2CaloLayer1.cc

+  if (theMap->find(lumiBlock) != theMap->end()) {
+    //this is not the first event with errors per lumi-block, insert it only if the number of errors is larger than before
+    if (nErrors > theMap->find(lumiBlock)->second) {
+      (*theMap)[lumiBlock] = nErrors;


This approach is not thread safe (I'm saying it only here and not repeat everywhere). The compiler would also complain if the map would be taken as const std::map<std::string, int>& (unfortunately "const pointer" is not the same as "(const) pointer to const", which is what would be really needed here).

If I understood correctly, the aim is to find a maximum of some quantity over a LuminosityBlock, and eventually put that number into a bin of a histogram that corresponds that lumi. I believe the most efficiency way to do that (in general) would be to (following the example https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#example_usage_calculating_per_lu)

Use edm::StreamCache with an int for each quantity to be recorded over a lumi

In dqmAnalyze() just do along streamCache(event.streamID).var = std::max(streamCache(event.streamID).var, localVar) (*)

Use edm::LuminosityBlockSummaryCache with an int for each quantity to do a reduction (with std::max() operation) over the edm streams

streamEndLuminosityBlockSummary() stores the maximum of the StreamCache and the LumiSummaryCache to the LumiSummaryCache and resets the StreamCache value(s) to zero

globalEndLuminosityBlockSummary() stores the reduced maximum into a container (see next bullet) along with the lumi id

Use tbb::concurrent_vector<std::pair<edm::LuminosityBlockNumber_t, int>> in the RunCache (i.e. in CaloL1Information::monitoringDataHolder) to hold the values per lumi for that run

dqmEndRun() copies the values from the container into the histogram

(alternatively a edm::LuminosityBlockCache could be combined with std::atomic<int>s)

(*) for @cms-sw/dqm-l2, the DQMGlobalEDAnalyzer API would be more consistent with edm::global interfaces if the edm::StreamID would be passed from accumulate() to dqmAnalyze()

@makortel Thanks for the comments. I was afraid this might be the case.

If I understood correctly, the aim is to find a maximum of some quantity over a LuminosityBlock, and eventually put that number into a bin of a histogram that corresponds that lumi

This is correct. Unfortunately, I think I've tried to use LuminosityBlockCaches to gain access to the luminosity block transition functions and cache information in this manner (the old implementation of this module was designed to fill things on luminosity block transitions, this seemed like the quickest way to regain all functionality). My attempt was something similar to:

class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>

as the main implementation of the module, but his doesn't work due to the way DQM global are implemented, as run level caching is assumed at the dqm global analyzer level here:

cmssw/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h

Lines 13 to 19 in 59d2873

class DQMGlobalEDAnalyzer

: public edm::global::EDProducer<edm::RunCache<H>,

// DQMGlobalEDAnalyzer are fundamentally unable to produce histograms for any

// other scope than MonitorElement::Scope::RUN.

edm::EndRunProducer,

edm::Accumulator,

Args...> {

So I think this rules out every solution you mentioned but the last. However, I am unfamiliar with tbb::concurrent_vector<>'s. Is there documentation I should be aware of for using these? I assume these objects are just a vector with some sort of inherent locking implemented?

I had also considered an even less verbose solution, comparing and setting in the monitoring element directly for every event, I assume this would be no more thread safe than what I have implemented right now?

Thanks again for your help,
Andrew Loeliger

My attempt was something similar to:

class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>

as the main implementation of the module, but his doesn't work due to the way DQM global are implemented, as run level caching is assumed at the dqm global analyzer level here:

https://github.com/cms-sw/cmssw/blob/master/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h#L13-L19

The DQMGlobalEDAnalyzer indeed requires/uses a RunCache for the MonitorElements, but (as far as I can tell) that does not preclude the use of other caches for other information (especially when the functions specific to those caches do not use any DQM APIs).

So I think this rules out every solution you mentioned but the last.

I meant that all the three bullet points need to be used together.

The alternative I suggested in the parentheses under the bullet points would replace the use of edm::StreamCache and edm::LuminosityBlockSummaryCache with edm::LuminosityBlockCache and explicit use of std::atomic<int> (which would require a somewhat-low-level loop using std::atomic<int>::compare_exchange_weak() to achieve the max() operation).

However, I am unfamiliar with tbb::concurrent_vector<>'s. Is there documentation I should be aware of for using these? I assume these objects are just a vector with some sort of inherent locking implemented?

It provides fine-grained synchronization (and thus might be a bit overkill for just concurrent push_back()). The interface is very std::vector-like (including push_back() and iteration over elements). For more information please see
https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/concurrent_vector_ug.html
https://www.threadingbuildingblocks.org/docs/doxygen/a00046.html

I had also considered an even less verbose solution, comparing and setting in the monitoring element directly for every event, I assume this would be no more thread safe than what I have implemented right now?

Right, not without a per-histogram locking (which the MonitorElement actually has, but is not exposed outside, for good reasons).

The DQMGlobalEDAnalyzer indeed requires/uses a RunCache for the MonitorElements, but (as far as I can tell) that does not preclude the use of other caches for other information (especially when the functions specific to those caches do not use any DQM APIs).

Then, can I ask what is the appropriate syntax to use when trying to mix and use the different cache scopes at various levels? As I mentioned, I tried the above syntax:

class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>

but later, when trying to retrieve this luminosity block cache doing something like:

if (nEcaLLinkErrors > luminosityBlockCache(event.getLuminosityBlock().index())->maxEvtLinkErrorsECALCurrentLumi_)

This would always error in the compiler as it didn't recognize the luminosityBlockCache() function for retrieving the luminosity block cache (it is also perhaps worth noting that it would also not recognize overrides of the global luminosity block transition functions, stating the override was not overriding anything previously defined).

This error was unique to the DQMGlobalEDAnalyzer implementation, when changing from DQMGlobalEDAnalyzer to DQMOneEDAnalyzer these lines passed the compiler without complaint.

Is there a method I am unaware of to use these two different cache scopes together?

Thanks,
Andrew Loeliger

This one

class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>

declares to use edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder> as the RunCache (which I fully believe can lead to compilation errors`). In order to use multiple caches, each of them needs to be declared separately, e.g.

// note CMS naming convention is start namespaces in lower case, and classes/structs in upper case namespace caloL1Information { struct MonitoringDataHolder { // MonitorElements; tbb::concurrent_vector<std::pair<edm::LuminosityBlockID, int>> maxEvtLinkErrorsMapECAL; ... }; struct PerStreamMax { int maxEvtLinkErrorsMapECAL = 0; ... }; struct PerLumiMax { int maxEvtLinkErrorsMapECAL = 0; ... }; } class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer< caloL1Information::MonitoringDataHolder, // RunCache, not decorated with edm::RunCache<...> because the base class does that edm::StreamCache<caloL1Information::PerStreamMax>, edm::LuminosityBlockSummaryCache<caloL1Information::PerLumiMax>> std::shared_ptr<caloL1Information::PerStreamMax> beginStream(edm::StreamID) const { return std::make_shared<caloL1Information::PerStreamMax>(); } std::shared_ptr<caloL1Information::PerLumiMax> globalBeginLuminosityBlockSummary(edm::LuminosityBlock const&, edm::EventSetup const& ) const { return std::make_shared<caloL1Information::PerLumiMax>(); } // reduction over events within one stream (and by definition within one lumi) // there may be concurrent calls to this function, but framework guarantees that there is at most one call for any single stream // (and therefore StreamCache can be used safely without explicit synchronization) void dqmAnalyze(edm::Event const& iEvent, edm::EventSetup const&, caloL1Information::MonitoringDataHolder const&) const { ... auto cache = streamCache(iEvent.streamID()); cache->maxEvtLinkErrorsMapECAL = std::max(cache->maxEvtLinkErrorsMapECAL, nEcalLinkErrors); ... } // reduction over streams // framework guarantees that this function is called one at a time // (and therefore StreamCache and LuminosityBlockSummaryCache can be used without explicit synchronization) void streamEndLuminosityBlockSummary(edm::StreamID streamID, edm::LuminosityBlock const&, edm::EventSetup const&, caloL1Information::PerLumiMax* lumiSummaryCache) const { auto cache = streamCache(streamID); lumiSummaryCache->maxEvtLinkErrorsMapECAL = std::max(lumiSummaryCache->maxEvtLinkErrorsMapECAL, cache->maxEvtLinkErrorsMapECAL); ... } // store reduced maximum to tbb::concurrent_vector void globalEndLuminosityBlockSummary(edm::LuminosityBlock const& iLumi, edm::EventSetup const&, caloL1Information::PerLumiMax* lumiSummaryCache) const { auto cache = runCache(iLumi.getRun().index()); cache->maxEvtLinkErrorsMapECAL.emplace_back(iLumi.id(), lumiSummaryCache->maxEvtLinkErrorsMapECAL); ... }

(please invent better names for the cache types :) )

makortel · 2020-11-02T19:46:09Z

DQM/L1TMonitor/interface/L1TStage2CaloLayer1.h

+    //these maps store the maximum number of evt and link mismatches per lumi section.
+    //they are read back at the end of runs
+    //Made std::unique_ptr's for better memory management
+    std::unique_ptr<std::map<std::string, int>> maxEvtLinkErrorsMapECAL{std::make_unique<std::map<std::string, int>>()};


How is unique_ptr better here than

std::map<std::string, int> maxEvtLinkErrorsMapECAL;

?

I believe the reasoning behind this was, as you were noting above, my way to dodge the compiler errors for a const function modifying class elements by making the class element an unedited pointer to some mutable object that I was working with (which, I grant you, seems like a textbook way to write terminally buggy code).

I think continuing from the above, what is necessary for this module to maintain it's previous functionality and also is a thread safe dynamically sized vector/map type of object whose contents are mutable in the analysis functions for storing luminosity block information at the run cached level. Will tbb::concurrent_vector<>'s provide this? I assume that there's also probably a considerable speed cost associated with an object like this?

Thanks,
Andrew Loeliger

cmsbuild · 2020-11-02T20:33:40Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-06ec13/10446/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 35
DQMHistoTests: Total histograms compared: 2544092
DQMHistoTests: Total failures: 7
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2544063
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
Checked 149 log files, 22 edm output root files, 35 DQM output files

jfernan2 · 2020-11-09T12:28:53Z

@aloeliger the last test shows this change in one of the MEs of L1T / L1TStage2CaloLayer1 / MismatchDetail:
https://tinyurl.com/y6nqrycp

aloeliger · 2020-11-09T13:30:38Z

Hi all,

I just wanted to give a quick check in of where this is right now.

I am looking at changing the code using the strategy suggested by @makortel using the mixed stream and summary caches to maintain stream safety, and I have all of the "Max" style histograms changed over to use this style, however I am running into technical difficulties trying to change last20Mismatches_ over.

last20Mismatches_ is a monitoring element that just catalogues the run, lumi and event of the last 20 mismatches in order, as well as the type of mismatch.

My initial thought for how to go about this was by using a vector of tuples per stream that would maintain a strict size of 20, and then using a RunSummaryCache use streamEndRunSummary to merge it with a similar vector at the run cache level, with functions for maintaining order and size of the vector. This vector can then be read from at the global run end to create the last20Mismatches monitoring element. However, when trying to add a RunSummaryCache to the current module like so:

192class L1TStage2CaloLayer1 : public DQMGlobalEDAnalyzer<CaloL1Information::monitoringDataHolder,
193  edm::StreamCache<CaloL1Information::perStreamMonitoringDataHolder>,
194  edm::LuminosityBlockSummaryCache<CaloL1Information::perLumiBlockMonitoringInformation>,
195  edm::RunSummaryCache<CaloL1Information::perRunSummaryMonitoringInformation>
196  > {

I seem to end up getting unresolvable compilation errors. From what I understand from the error:

In file included from /afs/cern.ch/work/a/aloelige/private/DQMDevelopment/src/DQM/L1TMonitor/interface/L1TStage2CaloLayer1.h:10,
                 from /afs/cern.ch/work/a/aloelige/private/DQMDevelopment/src/DQM/L1TMonitor/plugins/SealModule.cc:27:
/cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre6/src/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h: In instantiation of 'class DQMGlobalEDAnalyzer<CaloL1Information::moni
toringDataHolder, edm::StreamCache<CaloL1Information::perStreamMonitoringDataHolder>, edm::LuminosityBlockSummaryCache<CaloL1Information::perLumiBlockMonitoringInformation>, edm::RunSummary
Cache<CaloL1Information::perRunSummaryMonitoringInformation> >':
/afs/cern.ch/work/a/aloelige/private/DQMDevelopment/src/DQM/L1TMonitor/interface/L1TStage2CaloLayer1.h:192:36:   required from here
/cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre6/src/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h:58:8: error: 'void DQMGlobalEDAnalyzer<H, Args>::globalEndRunProduce(ed
m::Run&, const edm::EventSetup&) const [with H = CaloL1Information::monitoringDataHolder; Args = {edm::StreamCache<CaloL1Information::perStreamMonitoringDataHolder>, edm::LuminosityBlockSum
maryCache<CaloL1Information::perLumiBlockMonitoringInformation>, edm::RunSummaryCache<CaloL1Information::perRunSummaryMonitoringInformation>}]' marked 'final', but is not virtual
   void globalEndRunProduce(edm::Run& run, edm::EventSetup const& setup) const final {

and the linked DQMGlobalEDAnalyzer line:

cmssw/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h

Lines 58 to 63 in 74517b2

    
           void globalEndRunProduce(edm::Run& run, edm::EventSetup const& setup) const final { 
        
             auto const& h = *this->runCache(run.index()); 
        
             dqmEndRun(run, setup, h); 
        
             dqmstore_->leaveLumi(run.run(), /* lumi */ 0, meId(run)); 
        
             run.emplace(runToken_); 
        
           }

It seems that including a RunSummaryCache requires that I have a definition of the function globalEndRunProduce() as a class function, however this is already used by the DQM core class above and marked as final which would not allow me to create this as a class function.

My knowledge of c++ inheritance language features is a little rusty. I really don't need this function for any functionality of the module, so is there a way to define this in the L1TStage2CaloLayer1 module explicitly, but refer it to the DQM core class definition of this function in a way that will satisfy both the RunSummaryCache and the final marked function in DQMGlobalEDAnalyzer so that the RunSummaryCache can be used to try and retain the functionality of last20Mismatches_?

aloeliger · 2020-11-12T16:35:43Z

Hello all,

I just wanted to check in and see if there was any guidance in how the last20Mismatches_ monitoring element could be preserved in a thread safe way. Is there a way that RunSummaryCache can be used together with DQMGlobalEDAnalyzer without this problem with globalEndRunProduce()?

makortel · 2020-11-12T19:44:37Z

(apologies, this fell a bit below the radar)

Is there a way that RunSummaryCache can be used together with DQMGlobalEDAnalyzer without this problem with globalEndRunProduce()?

At the moment, I don't think it can, but there could be a relatively simple way to extend DQMGlobalEDAnalyzer to allow that. Let me experiment a bit.

cmsbuild · 2020-11-16T10:23:24Z

+1
Tested at: 6db14ec
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-06ec13/10772/summary.html
CMSSW: CMSSW_11_2_X_2020-11-15-2300
SCRAM_ARCH: slc7_amd64_gcc820

cmsbuild · 2020-11-16T10:23:26Z

Comparison job queued.

cmsbuild · 2020-11-16T11:39:11Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-06ec13/10772/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 35
DQMHistoTests: Total histograms compared: 2529296
DQMHistoTests: Total failures: 11
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 2529263
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
Checked 148 log files, 22 edm output root files, 35 DQM output files

jfernan2 · 2020-11-16T11:46:14Z

@aloeliger the difference in L1T / L1TStage2CaloLayer1 / MismatchDetail / last20Mismatches is still there in some WFs, I am not sure if your last changes were meant to fix it

aloeliger · 2020-11-16T17:50:18Z

Hi @jfernan2, these last few commits were designed to take care of the stream safety issues pointed out by @makortel, but it was not intended to change anything significant in terms of how last20Mismatches operates.

I have taken a look at the 136.731 workflow to understand what was going on.

These are the non-empty mismatch histograms reported back by the old version of the DQM code:

and these are the histograms reported back by the this current PR code:

Note that these histograms are identical, except that the old code reports no mismatches in the last20Mismatches monitoring element, despite there being reported mismatches for this workflow.

I suspect one of two things is happening here:

The most likely of the two is that I have introduced a change in how last20Mismatches is filled, filling it at the end of a run, instead of at the end of a lumisection (I thought it would be easier to recreate what I thought was it's behavior this way). It is possible that the old code only shows the last 20 mismatches for a particular lumisection so the last lumisection of this workflow contains no mismatches (so none are shown there). However the run does contain mismatches, so they are shown in the new version. I think it makes more sense to me to have this monitoring element filled on the run scale, as filling it on the lumisection scale may be misleading. That being said, I could try to write the monitoring element such that it fills on lumisections instead to synchronize the plots, providing I had an idea of how to clear the monitoring element between lumisections.
I did not write the original last20Mismatches code, so it could be that it was bugged in a way I was unaware of due to my unfamiliarity.

I have not discussed this particular change with the CaloL1 team yet, so I may need to do that, provided you are okay with this explanation.

jfernan2 · 2020-11-16T18:02:54Z

Thanks @aloeliger I also vote for option 1. but I let L1T to decide

aloeliger · 2020-11-17T17:03:39Z

@jfernan2 I have discussed with @asavincms and I think there are some more upgrades planned, but this should be okay from our side for now if it is okay for the DQM migration.

jfernan2 · 2020-11-17T17:08:45Z

OK @aloeliger but let's note down that the plot may be fired since now it is likely being correctly filled (by the end of the run)
Thanks

jfernan2 · 2020-11-17T17:08:49Z

+1

cmsbuild · 2020-11-17T17:09:07Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

silviodonato · 2020-11-17T18:09:37Z

OK @aloeliger but let's note down that the plot may be fired since now it is likely being correctly filled (by the end of the run)
Thanks

@jfernan2 please open an issue to keep track of it

silviodonato · 2020-11-17T18:09:42Z

+1

jfernan2 · 2020-11-17T18:22:59Z

@jfernan2 please open an issue to keep track of it

I am not sure if this is actually an issue @silviodonato

My understanding from @aloeliger findings is that the plot is now correctly filled since it collects mismatches from the whole run, while before just from last LS.

I let L1T experts to confirm, since now the plot may be red in Offline DQM, while previously was hiding mismatches not produced in the last LS of the run, I am referring to this comment:
#32004 (comment)

aloeliger · 2020-11-17T19:51:27Z

@jfernan2 @silviodonato This is correct, and I believe it reports more accurately to what one would expect the behavior to be. However I think it is important to note that this plot is also not a part of the summary tests for the overall CaloL1 system

As seen here:

cmssw/DQM/L1TMonitorClient/data/L1TStage2CaloLayer1QualityTests.xml

Lines 1 to 38 in 8ebdaa0

    
           <TESTSCONFIGURATION> 
        
             <QTEST name="Layer1MismatchThreshold"> 
        
                 <TYPE>ContentsYRange</TYPE> 
        
                 <PARAM name="warning">0.</PARAM> 
        
                 <PARAM name="error">1.</PARAM> 
        
                 <PARAM name="ymin">0.0</PARAM> 
        
                 <PARAM name="ymax">10.0</PARAM> 
        
                 <PARAM name="useEmptyBins">0</PARAM> 
        
             </QTEST> 
        
             <QTEST name="Layer1LinkErrorThreshold"> 
        
                 <TYPE>ContentsYRange</TYPE> 
        
                 <PARAM name="warning">0.</PARAM> 
        
                 <PARAM name="error">1.</PARAM> 
        
                 <PARAM name="ymin">0.0</PARAM> 
        
                 <PARAM name="ymax">10.0</PARAM> 
        
                 <PARAM name="useEmptyBins">0</PARAM> 
        
             </QTEST> 
        
             <LINK name="*L1TStage2CaloLayer1/MismatchDetail/maxEvtMismatchByLumi*"> 
        
                 <TestName activate="true">Layer1MismatchThreshold</TestName> 
        
             </LINK> 
        
             <LINK name="*L1TStage2CaloLayer1/maxEvtMismatchByLumi"> 
        
                 <TestName activate="true">Layer1MismatchThreshold</TestName> 
        
             </LINK> 
        
             <LINK name="*L1TStage2CaloLayer1/MismatchDetail/maxEvtLinkErrorsByLumi*"> 
        
                 <TestName activate="true">Layer1LinkErrorThreshold</TestName> 
        
             </LINK> 
        
             <LINK name="*L1TStage2CaloLayer1/maxEvtLinkErrorsByLumi"> 
        
                 <TestName activate="true">Layer1LinkErrorThreshold</TestName> 
        
             </LINK> 
        
           </TESTSCONFIGURATION>

The only tests in the summary are from maximum mismatch/link error plots which are exactly identical between old code and new code the way the changes have been implemented. Unless one is doing incredibly careful inspection of the CaloL1 system while processing runs offline, this change will likely not be noticed unless it is already clear that the run has produced a lot of mismatches. The monitoring element is mostly intended as an online monitoring tool for being able to understand what issues are happening in what order in the trigger, which may help with diagnosis.

I am more concerned however about the performance of this monitoring element in a multithreaded environment, at offline. I have tested it for stability in online on actual data with multiple lumisections (seen here with a 2018 comissioning run 310150):

and at offline using the current runTheMatrix.py setup, but I am unsure if the sorting and maintaining functions I have written for its use across multiple streams will hold up and be stable. I know this PR has already been merged, but would it be possible for me to test the performance of this module on multiple streams to make sure it is safe for offline use before we might be surprised by bugs in its performance?

And so you are aware, after discussion with Alexander Savin (@asavincms) we may attempt to further patch this monitoring element in the future by not allowing for duplicate events to show up in multiple y axis positions .

silviodonato · 2020-11-18T13:09:58Z

@aloeliger see #32172 and #32173

Move CaloL1 online DQM to Global

ca6de1d

cmsbuild added this to the CMSSW_11_2_X milestone Nov 2, 2020

cmsbuild added code-checks-pending comparison-pending dqm-pending orp-pending pending-signatures tests-pending labels Nov 2, 2020

aloeliger mentioned this pull request Nov 2, 2020

Add new CaloL1 offline DQM module and configuration #31678

Closed

cmsbuild added code-checks-approved and removed code-checks-pending labels Nov 2, 2020

cmsbuild added tests-started and removed tests-pending labels Nov 2, 2020

cmsbuild added tests-approved and removed tests-started labels Nov 2, 2020

makortel suggested changes Nov 2, 2020

View reviewed changes

cmsbuild added comparison-available and removed comparison-pending labels Nov 2, 2020

cmsbuild added tests-approved and removed tests-started labels Nov 16, 2020

cmsbuild added comparison-available and removed comparison-pending labels Nov 16, 2020

cmsbuild added dqm-approved fully-signed and removed dqm-pending pending-signatures labels Nov 17, 2020

cmsbuild added orp-approved and removed orp-pending labels Nov 17, 2020

cmsbuild merged commit c77f047 into cms-sw:master Nov 17, 2020

This was referenced Nov 18, 2020

segmentation violation in L1TStage2CaloLayer1 #32172

Closed

Revert "Move CaloL1 online DQM to Global" #32173

Merged

aloeliger mentioned this pull request Nov 20, 2020

Global CaloL1 DQM Fix #32205

Merged

aloeliger mentioned this pull request Feb 4, 2021

Calo l1 last20 mismatches axis fix #32815

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move CaloL1 online DQM to Global #32004

Move CaloL1 online DQM to Global #32004

aloeliger commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

jfernan2 commented Nov 2, 2020

cmsbuild commented Nov 2, 2020 •

edited

Loading

cmsbuild commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

makortel Nov 2, 2020

aloeliger Nov 2, 2020 •

edited

Loading

makortel Nov 2, 2020

aloeliger Nov 2, 2020

makortel Nov 3, 2020

makortel Nov 2, 2020

aloeliger Nov 2, 2020 •

edited

Loading

cmsbuild commented Nov 2, 2020

jfernan2 commented Nov 9, 2020

aloeliger commented Nov 9, 2020

aloeliger commented Nov 12, 2020

makortel commented Nov 12, 2020

cmsbuild commented Nov 16, 2020

cmsbuild commented Nov 16, 2020

cmsbuild commented Nov 16, 2020

jfernan2 commented Nov 16, 2020 •

edited

Loading

aloeliger commented Nov 16, 2020 •

edited

Loading

jfernan2 commented Nov 16, 2020 •

edited

Loading

aloeliger commented Nov 17, 2020

jfernan2 commented Nov 17, 2020

jfernan2 commented Nov 17, 2020

cmsbuild commented Nov 17, 2020

silviodonato commented Nov 17, 2020

silviodonato commented Nov 17, 2020

jfernan2 commented Nov 17, 2020

aloeliger commented Nov 17, 2020 •

edited

Loading

silviodonato commented Nov 18, 2020

	class DQMGlobalEDAnalyzer
	: public edm::global::EDProducer<edm::RunCache<H>,
	// DQMGlobalEDAnalyzer are fundamentally unable to produce histograms for any
	// other scope than MonitorElement::Scope::RUN.
	edm::EndRunProducer,
	edm::Accumulator,
	Args...> {

Move CaloL1 online DQM to Global #32004

Move CaloL1 online DQM to Global #32004

Conversation

aloeliger commented Nov 2, 2020

PR description:

PR validation:

if this PR is a backport please specify the original PR and why you need to backport that PR:

cmsbuild commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

jfernan2 commented Nov 2, 2020

cmsbuild commented Nov 2, 2020 • edited Loading

cmsbuild commented Nov 2, 2020

cmsbuild commented Nov 2, 2020

makortel Nov 2, 2020

Choose a reason for hiding this comment

aloeliger Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

makortel Nov 2, 2020

Choose a reason for hiding this comment

aloeliger Nov 2, 2020

Choose a reason for hiding this comment

makortel Nov 3, 2020

Choose a reason for hiding this comment

makortel Nov 2, 2020

Choose a reason for hiding this comment

aloeliger Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

cmsbuild commented Nov 2, 2020

jfernan2 commented Nov 9, 2020

aloeliger commented Nov 9, 2020

aloeliger commented Nov 12, 2020

makortel commented Nov 12, 2020

cmsbuild commented Nov 16, 2020

cmsbuild commented Nov 16, 2020

cmsbuild commented Nov 16, 2020

jfernan2 commented Nov 16, 2020 • edited Loading

aloeliger commented Nov 16, 2020 • edited Loading

jfernan2 commented Nov 16, 2020 • edited Loading

aloeliger commented Nov 17, 2020

jfernan2 commented Nov 17, 2020

jfernan2 commented Nov 17, 2020

cmsbuild commented Nov 17, 2020

silviodonato commented Nov 17, 2020

silviodonato commented Nov 17, 2020

jfernan2 commented Nov 17, 2020

aloeliger commented Nov 17, 2020 • edited Loading

silviodonato commented Nov 18, 2020

cmsbuild commented Nov 2, 2020 •

edited

Loading

aloeliger Nov 2, 2020 •

edited

Loading

aloeliger Nov 2, 2020 •

edited

Loading

jfernan2 commented Nov 16, 2020 •

edited

Loading

aloeliger commented Nov 16, 2020 •

edited

Loading

jfernan2 commented Nov 16, 2020 •

edited

Loading

aloeliger commented Nov 17, 2020 •

edited

Loading