-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM: Remove more DQMStore features #28247
DQM: Remove more DQMStore features #28247
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28247/12393
|
A new Pull Request was created by @schneiml (Marcel Schneider) for master. It involves the following packages: DQM/EcalCommon @andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test I don't expect the PR tests to be very useful for validation here, but let's see if they notice anything. |
The tests are being triggered in jenkins. |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
No significant changes in the comparison. Still requires confirmation from ECAL and AlCa. |
please test workflow 1010.0,1020.0,1030.0,1040.0 |
The tests are being triggered in jenkins. |
@schneiml in my private tests based on CMSSW_11_0_X_2019-11-19-2300 I see that 1040.0 is now crashing... |
@fabiocos 1040.0 runs for me, after clean rebase on I have not checked the output, though. |
Comparison job queued. |
Comparison is ready @slava77 comparisons for the following workflows were not done due to missing matrix map:
Comparison Summary:
|
apparently there was something wrong in my setup, repeating the test |
@@ -98,7 +98,6 @@ | |||
# 2 provide more detailed output | |||
Frequency = cms.untracked.int32(50), | |||
MEPathToSave = cms.untracked.string('AlCaReco/SiStripGainsAAG'), | |||
deleteAfterCopy = cms.untracked.bool(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tocheng @christopheralanwest @mmusich please suggest how to test this ALCAPROMPT, using --dasquery=file dataset=/ZeroBias/Run2016C-SiStripCalMinBiasAfterAbortGap-18Apr2017-v1/ALCARECO run=276097
in a similar way as done for SiStrpGains does not work:
Begin processing the 1st record. Run 276092, Event 32587086, LumiSection 74 on stream 0 at 22-Nov-2019 10:43:57.009 CET
----- Begin Fatal Exception 22-Nov-2019 10:43:57 CET-----------------------
An exception of category 'Configuration' occurred while
[0] Processing Event run: 276092 lumi: 74 event: 32587086 stream: 0
[1] Running path 'pathALCARECOPromptCalibProdSiStripGainsAAG'
[2] Calling method for module HLTHighLevel/'ALCARECOCalMinBiasFilterForSiStripGainsAAG'
Exception Message:
requested HLT path "pathALCARECOSiStripCalMinBiasAAG" does not exist
----- End Fatal Exception -------------------------------------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you change the ALCA producer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmsDriver.py step3 --datatier ALCARECO --conditions auto:run2_data -s ALCA:PromptCalibProdSiStripGainsAAG --eventcontent ALCARECO -n 1000 --dasquery=file dataset=/ZeroBias/Run2016C-SiStripCalMinBiasAfterAbortGap-18Apr2017-v1/ALCARECO run=276097
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiocos, the command is correct but the input dataset is not.
In 2017 we changed SiStripCalMinBiasAfterAbortGap
to SiStripCalMinBiasAAG
due to limitations in the output dataset name length in dataset storage system.
Because of this the PD you chose is not processable tout-court in recent releases, but needs to have this line https://github.com/cms-sw/cmssw/blob/master/Calibration/TkAlCaRecoProducers/python/ALCARECOPromptCalibProdSiStripGainsAAG_cff.py#L9 adjusted.
To make it work out of the box just switch to a more recent dataset like
/StreamExpress/Run2018C-SiStripCalMinBiasAAG-Express-v1/ALCARECO
(is at T2_CH_CERN
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmusich with step3 --processName ALCA2 --datatier ALCARECO --conditions auto:run2_data -s ALCA:PromptCalibProdSiStripGainsAAG --eventcontent ALCARECO -n 1000 --dasquery=file dataset=/StreamExpress/Run2018C-SiStripCalMinBiasAAG-Express-v1/ALCARECO
I get
which looks to me comparable to your test at the file crossing boundaries
@@ -27,7 +27,6 @@ | |||
# 2 provide more detailed output | |||
Frequency=cms.untracked.int32(50), | |||
MEPathToSave=cms.untracked.string('AlCaReco/EcalPedestalsPCL'), | |||
deleteAfterCopy=cms.untracked.bool(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tocheng @christopheralanwest @schneiml this can be tested using wf 1010.0 . In its step3 the log file is flooded with
Begin processing the 100th record. Run 299149, Event 579095302, LumiSection 408 on stream 3 at 20-Nov-2019 17:35:47.296 CET
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_0' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_1' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_10' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_11' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_12' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_13' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_14' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_15' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_16' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_17' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_18' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_19' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_2' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_20' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_21' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_22' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_23' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_24' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_25' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_26' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_27' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_28' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_29' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_3' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_30' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_31' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_32' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_33' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_34' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_35' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_36' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_37' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_38' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_39' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_4' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_40' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_41' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_42' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_43' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_44' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_45' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_46' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_47' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_48' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_49' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_5' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_50' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_51' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_52' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_53' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_54' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_55' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_56' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_57' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_58' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_59' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_6' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_60' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_61' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_62' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_63' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_64' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_65' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_66' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_67' in 'AlCaReco/EcalPedestalsPCL/EB/0'
DQMStore: WARNING: attempt to remove non-existent monitor element 'eb_68' in 'AlCaReco/EcalPedestalsPCL/EB/0'
(...)
Please comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re the message see the comment I made in another location: the code that used to produce them is deleted in this PR.
Re the RSS graph: We see the usual MEtoEDM endJob spike, before as after the PR, doubling the memory use as all histograms are copied into a EDM product. Removing the delete-calls makes it an exact doubling, before it might have been a bit less, as some histos where deleted before saving.
This spike is why "normal" reco jobs switched to DQMIO saving in LS1 (which removes this spike alltogether), but for the ALCA workflows memory use was not considered a problem and they where never migrated. (Edit: This is just for context, this PR does not affect anything about the MEtoEDM procedure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schneiml the message is in the default version, this PR removes it, sorry for the possible confusion.
Concerning the RSS plot I do not see anything worrying, the behaviour old vs new looks quite similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiocos re the message: ok, that is expected, there is nothing deleted any more and ofc there is no reason to warn about deleting non-existing things then.
This might explain why the memory use is effectively unaffected: The delete calls were not doing anything to begin with.
if (pos != data_.end()) | ||
data_.erase(pos); | ||
else if (warning) { | ||
std::cout << "DQMStore: WARNING: attempt to remove non-existent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiocos I don't fully understand your comment in a parallel thread, were you testing the old or the new code?
The code that makes the messages you show is deleted here.
@@ -9,7 +9,6 @@ | |||
# 2 provide more detailed output | |||
Frequency = cms.untracked.int32(50), | |||
MEPathToSave = cms.untracked.string('AlCaReco/SiStrip'), | |||
deleteAfterCopy = cms.untracked.bool(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiocos, I am sorry to be picky, but I don't think @allForExpress
resolves into the PromptCalibProd...
ALCAPROMPT producer changed here, so I am afraid your plot proves nothing.
A more interesting test IMHO is to monitor the RSS of step3 of wf 1001.0:
cmsDriver.py step3 --datatier ALCARECO --triggerResultsProcess RECO --conditions
auto:run1_data -s ALCAOUTPUT:SiStripCalZeroBias+TkAlMinBias+DtCalib+Hotline+LumiPix
elsMinBias+AlCaPCCZeroBiasFromRECO+AlCaPCCRandomFromRECO,ALCA:PromptCalibProd+Promp
tCalibProdSiStrip+PromptCalibProdSiStripGains+PromptCalibProdSiStripGainsAAG+Prompt
CalibProdSiPixelAli+PromptCalibProdSiPixel --eventcontent ALCARECO -n 100 --filein
file:step2.root --fileout file:step3.root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmusich as far as I can see in also the Tracer dump it do runs both MEtoEDMConvertSiStrip
and pathALCARECOSiStripPCLHistos
that are also modified in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about PromptCalibProdSiStripGains...
and PromptCalibProdSiPixelAli
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or your test was limited to this particular file change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmusich I believe that we are clearly missing a test that is realistically mimicking what happens nowadays at the T0 , not bits and pieces. My understanding from a discussion with @tocheng is that @allForExpress
should cover all the ALCAPRODUCER available, and I looked at a few ALCAPROMPT touched by this PR separately.
It would be important to get from ALCA a realistic workflow on recent data. Anyway I will make a final check following your suggestion (but is this test on very old data representative of what happens in today's situation?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional tests. I agree about the lack of a suitable relval test testing all the AlCa aspects of Express.
For tracker at least 1001.0 is a reasonable proxy. For the changes proposed here I don't think occupancy or n. of channel differences matter to measure a comparison of the memory consumption (it would be nice to have 1001.0 updated with Run2 input data though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 |
Thanks everybody for the help with getting this integrated! -- now lets move on to #28379, which I am currently rebasing on top of this. |
PR description:
This PR removes more rarely used functionality from the DQM infrastructure, namely
scaleElements
: not sure what this was ever used for, online maybe?softReset
: this feature is rather useful, but the few usages don't really justify the complexity. Used mainly in ECAL DQM.setAccumulate
flag: basically unused, not sure what it was supposed to signal.remove*
: Used in a bunch of places mostly for non-standard workflows. Remove is very hard to support safely (even today, it might randomly invalidate pointers that other modules are holding), so we have to drop it. The most common pattern is "check if ME exists, if yes remove, then book": this probably was not required in most cases, but it might be required to re-bin plots. We might change semantics ofbook*
to replace an existing ME if merge is not possible, without invalidating the exisiting MEs (only replace theTH1
). [update]QStatisticalTests
: completely unused. [update]PR validation:
Requires feedback from subsystem users, mainly ECAL (@tanmaymudholkar ?) regarding
softReset
. I suspect this PR will change behavior for ECAL online DQM.The other user of
softReset
isHLXMonitor
, which as far as I understand is history. Not sure if anybody feels responsible.scaleElements
was called in theMEtoEDM
workflow, so it might affect AlCa/PCL. I suspect it does not (I can't see any configuration for it, and I don't see what it would be good for), but it might be worth another check.Remove was used in a lot of places, I suspect in most cases this will only make additional MEs show up in the output. However, there might be cases where re-booking now behaves incorrectly. I typically removed the entire
cleanup
functionality incl. config options, unless that seemed to complicated, then I left a comment in the now-useless function stubs.