-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move CaloL1 online DQM to Global #32004
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32004/19521
|
A new Pull Request was created by @aloeliger for master. It involves the following packages: DQM/L1TMonitor @andrius-k, @kmaeshima, @ErnestaP, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins.
|
+1 |
Comparison job queued. |
if (theMap->find(lumiBlock) != theMap->end()) { | ||
//this is not the first event with errors per lumi-block, insert it only if the number of errors is larger than before | ||
if (nErrors > theMap->find(lumiBlock)->second) { | ||
(*theMap)[lumiBlock] = nErrors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is not thread safe (I'm saying it only here and not repeat everywhere). The compiler would also complain if the map would be taken as const std::map<std::string, int>&
(unfortunately "const pointer" is not the same as "(const) pointer to const", which is what would be really needed here).
If I understood correctly, the aim is to find a maximum of some quantity over a LuminosityBlock, and eventually put that number into a bin of a histogram that corresponds that lumi. I believe the most efficiency way to do that (in general) would be to (following the example https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#example_usage_calculating_per_lu)
- Use
edm::StreamCache
with anint
for each quantity to be recorded over a lumi- In
dqmAnalyze()
just do alongstreamCache(event.streamID).var = std::max(streamCache(event.streamID).var, localVar
) (*)
- In
- Use
edm::LuminosityBlockSummaryCache
with anint
for each quantity to do a reduction (withstd::max()
operation) over the edm streamsstreamEndLuminosityBlockSummary()
stores the maximum of the StreamCache and the LumiSummaryCache to the LumiSummaryCache and resets the StreamCache value(s) to zeroglobalEndLuminosityBlockSummary()
stores the reduced maximum into a container (see next bullet) along with the lumi id
- Use
tbb::concurrent_vector<std::pair<edm::LuminosityBlockNumber_t, int>>
in theRunCache
(i.e. inCaloL1Information::monitoringDataHolder
) to hold the values per lumi for that rundqmEndRun()
copies the values from the container into the histogram
(alternatively a edm::LuminosityBlockCache
could be combined with std::atomic<int>
s)
(*) for @cms-sw/dqm-l2, the DQMGlobalEDAnalyzer
API would be more consistent with edm::global
interfaces if the edm::StreamID
would be passed from accumulate()
to dqmAnalyze()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makortel Thanks for the comments. I was afraid this might be the case.
If I understood correctly, the aim is to find a maximum of some quantity over a LuminosityBlock, and eventually put that number into a bin of a histogram that corresponds that lumi
This is correct. Unfortunately, I think I've tried to use LuminosityBlockCache
s to gain access to the luminosity block transition functions and cache information in this manner (the old implementation of this module was designed to fill things on luminosity block transitions, this seemed like the quickest way to regain all functionality). My attempt was something similar to:
class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>
as the main implementation of the module, but his doesn't work due to the way DQM global are implemented, as run level caching is assumed at the dqm global analyzer level here:
cmssw/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h
Lines 13 to 19 in 59d2873
class DQMGlobalEDAnalyzer | |
: public edm::global::EDProducer<edm::RunCache<H>, | |
// DQMGlobalEDAnalyzer are fundamentally unable to produce histograms for any | |
// other scope than MonitorElement::Scope::RUN. | |
edm::EndRunProducer, | |
edm::Accumulator, | |
Args...> { |
So I think this rules out every solution you mentioned but the last. However, I am unfamiliar with tbb::concurrent_vector<>
's. Is there documentation I should be aware of for using these? I assume these objects are just a vector with some sort of inherent locking implemented?
I had also considered an even less verbose solution, comparing and setting in the monitoring element directly for every event, I assume this would be no more thread safe than what I have implemented right now?
Thanks again for your help,
Andrew Loeliger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My attempt was something similar to:
class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>
as the main implementation of the module, but his doesn't work due to the way DQM global are implemented, as run level caching is assumed at the dqm global analyzer level here:
https://github.com/cms-sw/cmssw/blob/master/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h#L13-L19
The DQMGlobalEDAnalyzer
indeed requires/uses a RunCache for the MonitorElements, but (as far as I can tell) that does not preclude the use of other caches for other information (especially when the functions specific to those caches do not use any DQM APIs).
So I think this rules out every solution you mentioned but the last.
I meant that all the three bullet points need to be used together.
The alternative I suggested in the parentheses under the bullet points would replace the use of edm::StreamCache
and edm::LuminosityBlockSummaryCache
with edm::LuminosityBlockCache
and explicit use of std::atomic<int>
(which would require a somewhat-low-level loop using std::atomic<int>::compare_exchange_weak()
to achieve the max()
operation).
However, I am unfamiliar with
tbb::concurrent_vector<>
's. Is there documentation I should be aware of for using these? I assume these objects are just a vector with some sort of inherent locking implemented?
It provides fine-grained synchronization (and thus might be a bit overkill for just concurrent push_back()
). The interface is very std::vector
-like (including push_back()
and iteration over elements). For more information please see
https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/concurrent_vector_ug.html
https://www.threadingbuildingblocks.org/docs/doxygen/a00046.html
I had also considered an even less verbose solution, comparing and setting in the monitoring element directly for every event, I assume this would be no more thread safe than what I have implemented right now?
Right, not without a per-histogram locking (which the MonitorElement actually has, but is not exposed outside, for good reasons).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DQMGlobalEDAnalyzer indeed requires/uses a RunCache for the MonitorElements, but (as far as I can tell) that does not preclude the use of other caches for other information (especially when the functions specific to those caches do not use any DQM APIs).
Then, can I ask what is the appropriate syntax to use when trying to mix and use the different cache scopes at various levels? As I mentioned, I tried the above syntax:
class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>
but later, when trying to retrieve this luminosity block cache doing something like:
if (nEcaLLinkErrors > luminosityBlockCache(event.getLuminosityBlock().index())->maxEvtLinkErrorsECALCurrentLumi_)
This would always error in the compiler as it didn't recognize the luminosityBlockCache()
function for retrieving the luminosity block cache (it is also perhaps worth noting that it would also not recognize overrides of the global luminosity block transition functions, stating the override was not overriding anything previously defined).
This error was unique to the DQMGlobalEDAnalyzer
implementation, when changing from DQMGlobalEDAnalyzer
to DQMOneEDAnalyzer
these lines passed the compiler without complaint.
Is there a method I am unaware of to use these two different cache scopes together?
Thanks,
Andrew Loeliger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one
class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>>
declares to use edm::LuminosityBlockCache<CaloL1Information::monitoringDataHolder>
as the RunCache (which I fully believe can lead to compilation errors`). In order to use multiple caches, each of them needs to be declared separately, e.g.
// note CMS naming convention is start namespaces in lower case, and classes/structs in upper case
namespace caloL1Information {
struct MonitoringDataHolder {
// MonitorElements;
tbb::concurrent_vector<std::pair<edm::LuminosityBlockID, int>> maxEvtLinkErrorsMapECAL;
...
};
struct PerStreamMax {
int maxEvtLinkErrorsMapECAL = 0;
...
};
struct PerLumiMax {
int maxEvtLinkErrorsMapECAL = 0;
...
};
}
class L1TStage2CaloLayer1Offline : public DQMGlobalEDAnalyzer<
caloL1Information::MonitoringDataHolder, // RunCache, not decorated with edm::RunCache<...> because the base class does that
edm::StreamCache<caloL1Information::PerStreamMax>,
edm::LuminosityBlockSummaryCache<caloL1Information::PerLumiMax>>
std::shared_ptr<caloL1Information::PerStreamMax> beginStream(edm::StreamID) const {
return std::make_shared<caloL1Information::PerStreamMax>();
}
std::shared_ptr<caloL1Information::PerLumiMax> globalBeginLuminosityBlockSummary(edm::LuminosityBlock const&, edm::EventSetup const& ) const {
return std::make_shared<caloL1Information::PerLumiMax>();
}
// reduction over events within one stream (and by definition within one lumi)
// there may be concurrent calls to this function, but framework guarantees that there is at most one call for any single stream
// (and therefore StreamCache can be used safely without explicit synchronization)
void dqmAnalyze(edm::Event const& iEvent, edm::EventSetup const&, caloL1Information::MonitoringDataHolder const&) const {
...
auto cache = streamCache(iEvent.streamID());
cache->maxEvtLinkErrorsMapECAL = std::max(cache->maxEvtLinkErrorsMapECAL, nEcalLinkErrors);
...
}
// reduction over streams
// framework guarantees that this function is called one at a time
// (and therefore StreamCache and LuminosityBlockSummaryCache can be used without explicit synchronization)
void streamEndLuminosityBlockSummary(edm::StreamID streamID, edm::LuminosityBlock const&, edm::EventSetup const&, caloL1Information::PerLumiMax* lumiSummaryCache) const {
auto cache = streamCache(streamID);
lumiSummaryCache->maxEvtLinkErrorsMapECAL = std::max(lumiSummaryCache->maxEvtLinkErrorsMapECAL, cache->maxEvtLinkErrorsMapECAL);
...
}
// store reduced maximum to tbb::concurrent_vector
void globalEndLuminosityBlockSummary(edm::LuminosityBlock const& iLumi, edm::EventSetup const&, caloL1Information::PerLumiMax* lumiSummaryCache) const {
auto cache = runCache(iLumi.getRun().index());
cache->maxEvtLinkErrorsMapECAL.emplace_back(iLumi.id(), lumiSummaryCache->maxEvtLinkErrorsMapECAL);
...
}
(please invent better names for the cache types :) )
//these maps store the maximum number of evt and link mismatches per lumi section. | ||
//they are read back at the end of runs | ||
//Made std::unique_ptr's for better memory management | ||
std::unique_ptr<std::map<std::string, int>> maxEvtLinkErrorsMapECAL{std::make_unique<std::map<std::string, int>>()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is unique_ptr
better here than
std::map<std::string, int> maxEvtLinkErrorsMapECAL;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the reasoning behind this was, as you were noting above, my way to dodge the compiler errors for a const
function modifying class elements by making the class element an unedited pointer to some mutable object that I was working with (which, I grant you, seems like a textbook way to write terminally buggy code).
I think continuing from the above, what is necessary for this module to maintain it's previous functionality and also is a thread safe dynamically sized vector/map type of object whose contents are mutable in the analysis functions for storing luminosity block information at the run cached level. Will tbb::concurrent_vector<>
's provide this? I assume that there's also probably a considerable speed cost associated with an object like this?
Thanks,
Andrew Loeliger
Comparison is ready Comparison Summary:
|
@aloeliger the last test shows this change in one of the MEs of L1T / L1TStage2CaloLayer1 / MismatchDetail: |
Hi all, I just wanted to give a quick check in of where this is right now. I am looking at changing the code using the strategy suggested by @makortel using the mixed stream and summary caches to maintain stream safety, and I have all of the "Max" style histograms changed over to use this style, however I am running into technical difficulties trying to change
My initial thought for how to go about this was by using a vector of tuples per stream that would maintain a strict size of 20, and then using a
I seem to end up getting unresolvable compilation errors. From what I understand from the error:
and the linked cmssw/DQMServices/Core/interface/DQMGlobalEDAnalyzer.h Lines 58 to 63 in 74517b2
It seems that including a My knowledge of c++ inheritance language features is a little rusty. I really don't need this function for any functionality of the module, so is there a way to define this in the L1TStage2CaloLayer1 module explicitly, but refer it to the DQM core class definition of this function in a way that will satisfy both the |
Hello all, I just wanted to check in and see if there was any guidance in how the |
(apologies, this fell a bit below the radar)
At the moment, I don't think it can, but there could be a relatively simple way to extend |
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
@aloeliger the difference in L1T / L1TStage2CaloLayer1 / MismatchDetail / last20Mismatches is still there in some WFs, I am not sure if your last changes were meant to fix it |
Hi @jfernan2, these last few commits were designed to take care of the stream safety issues pointed out by @makortel, but it was not intended to change anything significant in terms of how I have taken a look at the 136.731 workflow to understand what was going on. These are the non-empty mismatch histograms reported back by the old version of the DQM code: and these are the histograms reported back by the this current PR code: Note that these histograms are identical, except that the old code reports no mismatches in the I suspect one of two things is happening here:
I have not discussed this particular change with the CaloL1 team yet, so I may need to do that, provided you are okay with this explanation. |
Thanks @aloeliger I also vote for option 1. but I let L1T to decide |
@jfernan2 I have discussed with @asavincms and I think there are some more upgrades planned, but this should be okay from our side for now if it is okay for the DQM migration. |
OK @aloeliger but let's note down that the plot may be fired since now it is likely being correctly filled (by the end of the run) |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
@jfernan2 please open an issue to keep track of it |
+1 |
I am not sure if this is actually an issue @silviodonato My understanding from @aloeliger findings is that the plot is now correctly filled since it collects mismatches from the whole run, while before just from last LS. I let L1T experts to confirm, since now the plot may be red in Offline DQM, while previously was hiding mismatches not produced in the last LS of the run, I am referring to this comment: |
@jfernan2 @silviodonato This is correct, and I believe it reports more accurately to what one would expect the behavior to be. However I think it is important to note that this plot is also not a part of the summary tests for the overall CaloL1 system As seen here:
The only tests in the summary are from maximum mismatch/link error plots which are exactly identical between old code and new code the way the changes have been implemented. Unless one is doing incredibly careful inspection of the CaloL1 system while processing runs offline, this change will likely not be noticed unless it is already clear that the run has produced a lot of mismatches. The monitoring element is mostly intended as an online monitoring tool for being able to understand what issues are happening in what order in the trigger, which may help with diagnosis. I am more concerned however about the performance of this monitoring element in a multithreaded environment, at offline. I have tested it for stability in online on actual data with multiple lumisections (seen here with a 2018 comissioning run 310150): and at offline using the current And so you are aware, after discussion with Alexander Savin (@asavincms) we may attempt to further patch this monitoring element in the future by not allowing for duplicate events to show up in multiple y axis positions . |
@aloeliger see #32172 and #32173 |
PR description:
This PR modifies the CaloL1 DQM online module to become "Global" format to comply with the concurrent LS processing introduced for offline DQM as a part of #25090. This module preserves all monitoring elements (and therefore summary tests) by use of maps containing relevant luminosity block information as part of a CaloL1 namespace structure used as a part of the run cache. This PR should make PR #31678 obsolete.
PR validation:
All code compiles, and I have checked that a min bias data workflow reproduces the same histograms between the old online module and new online module. I have also verified that error plots between the old and new version of the online module yield the same error information in the same way by comparing on Run 315273 (a 2018 HI run with HCAL mismatches at CaloL1). Online monitoring should remain unchanged. As well, the PR seems to pass the basic test procedure suggested in the CMSSW PR instructions.
if this PR is a backport please specify the original PR and why you need to backport that PR:
This PR is not a backport