Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EcalPhiSymByRun for ECAL ALCANANO test #4664

Closed
wants to merge 13 commits into from

Conversation

tvami
Copy link
Contributor

@tvami tvami commented Apr 19, 2022

Replay Request

Requestor

AlCaDB

Describe the configuration

  • Release: CMSSW_12_3_0, CMSSW_12_4_0_pre3, CMSSW_12_3_2, CMSSW_12_4_3
  • Run: 346512
  • GTs:
    • expressGlobalTag: 123X_dataRun3_Express_v5,124X_dataRun3_Express_v3
    • promptrecoGlobalTag: 123X_dataRun3_Prompt_v6,124X_dataRun3_Prompt_v3
    • alcap0GlobalTag: 123X_dataRun3_Prompt_v6, 124X_dataRun3_Prompt_v3
  • Additional changes:

Purpose of the test

To test the ALCANANO for ECAL. The Replay is really stripped down to run the ECAL stream alone.

T0 Operations cmsTalk thread

Will provide soon
Tier0 Operations cmsTalk Forum

@tvami tvami marked this pull request as draft April 19, 2022 18:53
@tvami
Copy link
Contributor Author

tvami commented Apr 19, 2022

In the relvals we have these special commands:
https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/relval_steps.py#L2263-L2270

Chatting with Marco I understand this means that we need a new scenario for this to be able to run at T0 :(

@tvami
Copy link
Contributor Author

tvami commented Apr 19, 2022

I made a new scenario
cms-sw/cmssw#37627

@tvami tvami marked this pull request as ready for review April 21, 2022 15:18
@tvami
Copy link
Contributor Author

tvami commented Apr 21, 2022

CMSSW_12_4_0_pre3 is out:
https://cms-talk.web.cern.ch/t/development-release-cmssw-12-4-0-pre3-now-available/9562/1
so this can be launched any time now

@tvami
Copy link
Contributor Author

tvami commented Apr 21, 2022

@tvami
Copy link
Contributor Author

tvami commented Apr 28, 2022

Since CMSSW_12_3_2 is soon out, I'll make a commit to change to that

@tvami
Copy link
Contributor Author

tvami commented Apr 29, 2022

I see ls /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_2/

@tvami
Copy link
Contributor Author

tvami commented Apr 29, 2022

run replay please

@cmsdmwmbot
Copy link

An replay is requested by tvami. Waiting for available job slot.

@tvami
Copy link
Contributor Author

tvami commented May 3, 2022

The replay fails with this msg

(Exit Code: 8006)
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 346512 lumi: 1 event: 387150 stream: 5
   [1] Running path 'pathALCARECOEcalPhiSymByRun'
   [2] Calling method for module EcalPhiSymRecHitProducerRun/'ALCARECOEcalPhiSymRecHitProducerRun'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: edm::SortedCollection<EcalRecHit,edm::StrictWeakOrdering<EcalRecHit> >
Looking for module label: ecalRecHit
Looking for productInstanceName: EcalRecHitsEB

which seems to be due to the costumization not taking effect -- investigating

@tvami
Copy link
Contributor Author

tvami commented May 5, 2022

@tvami
Copy link
Contributor Author

tvami commented May 7, 2022

Outcome

Fatal Exception (Exit Code: 8006)
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 346512 lumi: 1 event: 683762 stream: 6
   [1] Running path 'dqmoffline_step'
   [2] Calling method for module HBHENoiseFilterResultProducer/'HBHENoiseFilterResultProducer'
Exception Message:
 could not find HcalNoiseSummary.
   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

So I was wrong and the T0 config doesn't have more control about the DQM step than the ConfigBuilder packages. PR to CMSSW will follow soon

@tvami
Copy link
Contributor Author

tvami commented May 16, 2022

CMSSW_12_4_0_pre4 is out, so let's test with that
https://cms-talk.web.cern.ch/t/development-release-cmssw-12-4-0-pre4-now-available/10458/1

@tvami
Copy link
Contributor Author

tvami commented May 16, 2022

run replay please

@cmsdmwmbot
Copy link

An replay is requested by tvami. Waiting for available job slot.

@germanfgv
Copy link
Contributor

@germanfgv
Copy link
Contributor

@tvami We have the following error on every PromptReco job:

cmsRun1
Fatal Exception (Exit Code: 8006)
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 346512 lumi: 1 event: 649218 stream: 0
   [1] Running path 'pathALCARECOEcalPhiSymByRun'
   [2] Prefetching for module EcalPhiSymRecHitProducerRun/'ALCARECOEcalPhiSymRecHitProducerRun'
   [3] Prefetching for module EcalRecHitProducer/'ecalRecHit@cpu'
   [4] Calling method for module EcalUncalibRecHitProducer/'ecalMultiFitUncalibRecHit@cpu'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: EBDigiCollection
Looking for module label: hltEcalPhiSymFilter
Looking for productInstanceName: phiSymEcalDigisEB

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

@tvami
Copy link
Contributor Author

tvami commented May 16, 2022

That is surprising. Doing

voms-proxy-init -rfc -voms cms -valid 192:00

cmsrel CMSSW_12_4_0_pre4
cd CMSSW_12_4_0_pre4/src
cmsenv
git cms-addpkg Configuration/DataProcessing
python3 Configuration/DataProcessing/test/RunPromptReco.py --scenario AlCaPhiSymEcal_Nano --reco --global-tag 123X_dataRun3_Prompt_v6 --lfn=file:/eos/cms/tier0/store/backfill/1/data/Tier0_REPLAY_2022/AlCaPhiSym/RAW/v91/000/346/512/00000/4887980a-dac3-48e0-be08-d99284f75c5b.root --alcareco EcalPhiSymByRun
cmsRun -e RunPromptRecoCfg.py

used to work.

Now I cant do it bc the RAW file is not available anymore. The repack output of this replay is supposed to be in

 ls /eos/cms/tier0/store/backfill/1/data/Tier0_REPLAY_2022/AlCaPhiSym/RAW/v214/000/346/512/00000/

no? Why is this empty?

@tvami
Copy link
Contributor Author

tvami commented Jun 30, 2022

hi @germanfgv I see we have a paused job... :( what's the error msg?

@germanfgv
Copy link
Contributor

The error is this:

2022-06-30 12:07:53,154:ERROR:ExecuteMaster:Exception is 'ConfigSection' object has no attribute 'nanoout'
2022-06-30 12:07:53,154:ERROR:ExecuteMaster:Traceback: 
2022-06-30 12:07:53,155:ERROR:ExecuteMaster:Traceback (most recent call last):
  File "/srv/job/WMCore.zip/WMCore/WMSpec/Steps/ExecuteMaster.py", line 141, in doExecution
    executionObject.execute()
  File "/srv/job/WMCore.zip/WMCore/WMSpec/Steps/Executors/CMSSW.py", line 346, in execute
    self.report.addInfoToOutputFilesForStep(stepName=self.stepName, step=self.step)
  File "/srv/job/WMCore.zip/WMCore/FwkJobReport/Report.py", line 1173, in addInfoToOutputFilesForStep
    fileInfo(fileReport=aFile, step=step, outputModule=module)
  File "/srv/job/WMCore.zip/WMCore/FwkJobReport/FileInfo.py", line 67, in __call__
    return self.processFile(filename = pfn,
  File "/srv/job/WMCore.zip/WMCore/FwkJobReport/FileInfo.py", line 85, in processFile
    output = getattr(step.output.modules, outputModule)
AttributeError: 'ConfigSection' object has no attribute 'nanoout'

I think we have seen this before, is that right @tvami ?

@tvami
Copy link
Contributor Author

tvami commented Jul 1, 2022

We did, and I actually fixed that, but it seems that never entered CMSSW_12_3_6, ok https://github.com/cms-sw/cmssw/releases/CMSSW_12_3_6

Let's redo it with 12_4_1

@germanfgv
Copy link
Contributor

Ok, I'l retry

@tvami
Copy link
Contributor Author

tvami commented Jul 1, 2022

Ok, I'l retry

You'll also need to change GT when moving to other CMSSW, so let me make a commit in a minute

@tvami
Copy link
Contributor Author

tvami commented Jul 1, 2022

@germanfgv done in 409c744 please go ahead with the test

@germanfgv
Copy link
Contributor

Here:
https://monit-grafana.cern.ch/d/t_jr45h7k/cms-tier0-replayid-monitoring?orgId=11&var-Bin=5m&var-ReplayID=220701093300&var-JobType=All&var-WorkflowType=All

@jhonatanamado
Copy link
Contributor

Hi @tvami , the last configuration shows a single paused job with the following error:

2022-07-01 12:50:30,930:INFO:Scram:Creating a subprocess to run the PSet setup.
2022-07-01 12:50:30,930:INFO:Scram:Also recording SCRAM command-line related output.
2022-07-01 12:50:30,933:INFO:Scram:    Invoking command: cmssw_wm_create_process.py --output_pkl /srv/job/WMTaskSpace/cmsRun1/PSet.pkl --funcname alcaSkim --funcargs /srv/job/WMTaskSpace/cmsRun1/process_funcArgs.json
2022-07-01 12:50:35,209:INFO:Scram:Subprocess stdout was:\nb\'Failed to load process from Scenario AlCaPhiSymEcal_Nano (<Configuration.DataProcessing.Impl.AlCaPhiSymEcal_Nano.AlCaPhiSymEcal_Nano object at 0x2b4a9a4c9d30>).
"/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 144, "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 135, 
process=create_process(args, func_args)File "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 97, in create_process
raise ex File "/cvmfs/cms.cern.ch/share/overrides/bin/cmssw_wm_create_process.py", line 93, in create_process
process = my_func(*call_func_args, **func_args) File "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_1/python/Configuration/DataProcessing/Scenario.py", line 104, in alcaSkim
raise NotImplementedError(msg)
NotImplementedError: Scenario Implementation AlCaPhiSymEcal_Nano\\nDoes not contain an implementation for alcaSkim
2022-07-01 12:50:35,210:INFO:Scram:Subprocess stderr was: None 
2022-07-01 12:50:35,210:ERROR:SetupCMSSWPset:Error running scram process.

Logs and tarballs are in /afs/cern.ch/user/c/cmst0/public/Tarballs_Replays/PR4664/job_628

@jhonatanamado
Copy link
Contributor

Replay finished without issues. Output data is

/AlCaPhiSym/Tier0_REPLAY_2022-EcalPhiSymByRun-PromptReco-v213/ALCARECO#7e1efd1c-b165-41f4-84dc-8d3e56c83825
/AlCaPhiSym/Tier0_REPLAY_2022-PromptReco-v213/AOD#ebcc04a1-7c5d-423b-958d-1a41cc6f8aa3
/AlCaPhiSym/Tier0_REPLAY_2022-PromptReco-v213/MINIAOD#dd676ad1-c8e7-443f-a30a-867dba0de21f
/AlCaPhiSym/Tier0_REPLAY_2022-v213/RAW#e6863153-553b-40bc-9f97-323a3e63f915

@tvami
Copy link
Contributor Author

tvami commented Jul 18, 2022

Oh yeah! That's new! @simonepigazzini please have a look at the output, thanks!

@simonepigazzini
Copy link

mmm, all 4 datasets are empty.

@tvami
Copy link
Contributor Author

tvami commented Jul 19, 2022

@jhonatanamado please post the tarballs, thanks!

@jhonatanamado
Copy link
Contributor

Output data for the latest changes (49d8f53) are:

/AlCaPhiSym/Tier0_REPLAY_2022-EcalPhiSymByRun-PromptReco-v1907/ALCARECO#16a8f798-65bd-4541-8b3c-126d84a3d920
/AlCaPhiSym/Tier0_REPLAY_2022-PromptReco-v1907/AOD#93ad1b20-76b6-4706-af3e-986970b15e9c
/AlCaPhiSym/Tier0_REPLAY_2022-PromptReco-v1907/MINIAOD#a51bc6dd-9fe0-4d57-abcd-f9f3b67e5377
/AlCaPhiSym/Tier0_REPLAY_2022-v1907/RAW#1819c59d-1506-4849-a294-5db7b2bb863d

@tvami
Copy link
Contributor Author

tvami commented Jul 20, 2022

@simonepigazzini

Dataset: /AlCaPhiSym/Tier0_REPLAY_2022-EcalPhiSymByRun-PromptReco-v1907/ALCARECO
Dataset size: 67224426 (67.2MB) Number of blocks: 1 Number of events: 2575197 Number of files: 1 Creation time: 2022-07-20 03:27:08 Cross section: 0 Physics group: NoGroup Status: VALID Type: data
Release, Blocks, Files, Runs, Configs, Parents, Children, Sites, Physics Groups XSDB Sources: dbs3 show)

This time, it doesnt seem to be empty

@simonepigazzini
Copy link

Hi,

the files are not empty but do not contain the collection we are supposed to save there. All the trees (Event, LS, Run) are empty, in the sense that only each tree trivial branches are stored.

simone

@tvami
Copy link
Contributor Author

tvami commented Jul 29, 2022

test syntax please

@tvami
Copy link
Contributor Author

tvami commented Jul 29, 2022

replay is failing with

2022-07-29 22:15:20,845:INFO:Report:addOutputFile method fileRef: , whole tree: {}
2022-07-29 22:15:20,883:ERROR:ExecuteMaster:Exception occured when executing step
2022-07-29 22:15:20,883:ERROR:ExecuteMaster:Exception is 'ConfigSection' object has no attribute 'nanooutedm'
2022-07-29 22:15:20,883:ERROR:ExecuteMaster:Traceback: 
2022-07-29 22:15:20,886:ERROR:ExecuteMaster:Traceback (most recent call last):
  File "/srv/job/WMCore.zip/WMCore/WMSpec/Steps/ExecuteMaster.py", line 141, in doExecution
    executionObject.execute()
  File "/srv/job/WMCore.zip/WMCore/WMSpec/Steps/Executors/CMSSW.py", line 335, in execute
    self.report.addInfoToOutputFilesForStep(stepName=self.stepName, step=self.step)
  File "/srv/job/WMCore.zip/WMCore/FwkJobReport/Report.py", line 1170, in addInfoToOutputFilesForStep
    fileInfo(fileReport=aFile, step=step, outputModule=module)
  File "/srv/job/WMCore.zip/WMCore/FwkJobReport/FileInfo.py", line 67, in __call__
    return self.processFile(filename = pfn,
  File "/srv/job/WMCore.zip/WMCore/FwkJobReport/FileInfo.py", line 85, in processFile
    output = getattr(step.output.modules, outputModule)
AttributeError: 'ConfigSection' object has no attribute 'nanooutedm'

@tvami
Copy link
Contributor Author

tvami commented Aug 1, 2022

replay is failing with

Just to be clear, I have no idea what to do with the failure above

@tvami
Copy link
Contributor Author

tvami commented Aug 10, 2022

I created this JIRA ticket: https://its.cern.ch/jira/browse/CMSTZ-1002

@tvami
Copy link
Contributor Author

tvami commented Aug 10, 2022

I'm also closing this PR, as at this point more serious development is needed

@tvami tvami closed this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants