-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
edm::Service callback for lumi/run cleanup #28521
Comments
A new Issue was created by @schneiml Marcel Schneider. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@schneiml I would think that the natural way to handle such re-use would be to use a shared reference count. If the count is 0 by the time you need to 'rebook' you can just re-use it. |
@schneiml I probably missed something. You are trying to make the new interface work with the old system involving the DQMStore service? Does that mean you would not be putting data products into the Run and LuminosityBlock for this? |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@Dr15Jones The current plan for 11_1 is to have This way, we should be able to still support all the legacy modules and workflows like online DQM while running without concurrent-anything (potentially even multithreaded, though that is not really required), while also allowing concurrent-anything multi-threaded for non-legacy modules (and also be a bit more memory-safe, even if incorrect, for legacy modules that end up in multi-thread concurrent situations). That seems like a good deal to me: we get a lot of the cleanup, but don't break anybodies workflow (yet), get safety from undefined behavior and make the implementation more simple. Moving to products should become much easier in the future, but needs more redesign around the non-standard workflows. Re the ref-counting solution, the problem is that I can't rely on getting (non-edm) callbacks from the modules. For the non-legacy modules, these callbacks drive everything (like in the old |
BTW, another idea I had was to trigger the ref-count/cleanup using |
My thought was to share a |
@Dr15Jones hmm, that way the dependency could be avoided. I still think this sounds rather fragile, esp. since the correctness for legacy modules depends on the cleanup happening at exactly the right time (if a module holds a ME/TH1 pointer and expects it to stay valid across run/lumi boundaries, we need to be really sure to never allocate a copy). Since you provide the signal in #28562, I guess I'll just go with that. Thanks a lot! |
My latest development branch master...schneiml:dqm-prepare-for-stream-v3 now uses the new callbacks, and that seems to work well. Closing this issue. |
Dear framework experts,
while porting the "new DQMStore" work back to work with a
edm::Service
basedDQMStore
, I face a problem related to which callbacks are available to the service.In the
DQMStore
, we now essentially emulate the product life cycle in edm: histograms are created as a module enters a lumisection/run, kept while that lumi/run has not ended, saved to one or more output files in output modules or EDAnalyzers, and then removed and recycled once the run/lumi is completely processed. One important detail here is that we whenever possible reuse theMonitorElement
/TH1
objects: this is to make sure that when there are no concurrent runs/lumis, there is exactly one object per histogram and the legacy modules can keep pointers to that, and continue working correctly.Now, the problem is that the
DQMStore
has to delete (or rather "mark for recycling") the histograms for a lumisection after all output modules have written the their output. As far as I can see, there is no callback for that.My current interim implementation cleans up at the beginning of the next lumisection, but that can't work with concurrent lumis. Also it does not work with the
DQMRootSource
, since it wants to read all histograms before triggering the start of the run/lumi. In the past, the cleanup was triggered by the output modules themselves, which is deeply flawed since there might be more than one of them (even in practice, usuallyMEtoEDM
andDQMRootOutputModule
were loaded at the same time).Find the current WIP here:
read_file1_file2_cfg.py
fails because the second lumi is read before the first one is cleaned up, therefore new objects need to be allocated; after the first lumi is cleaned up, the old pointers are invalid andDummyReadDQMStore
crashes.. This could easily be avoided by fixingDummyReadDQMStore
, but I believe we should support this (full legacy behavior as long as there are no concurrent runs/lumis). The flawedDQMStore
logic isDQMServices/Core/src/DQMStore.cc:506-520
.Is there an
edm::Service
callback that could be used (after a lumi is fully processed, but before the next one starts reading)? Could theDQMRootSource
be modified to only start reading histograms after the global begin lumi/run was triggered? Would it be acceptable to count the output modules and countwatchPostModuleWrite*
events?The text was updated successfully, but these errors were encountered: