Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ProcessBlock feature to the Framework #30117

Merged
merged 25 commits into from
Jul 10, 2020

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented Jun 4, 2020

This introduces the new transitions beginProcessBlock, endProcessBlock, beginProcessBlockProduce, and endProcessBlockProduce. These can be used by producers, filters, and analyzers in a way that is analogous to beginRun, endRun, beginRunProduce, and endRunProduce. These abilities become available when the types ProcessBlockCache, BeginProcessBlockProducer, or EndProcessBlockProducer are passed to the base class of the module as template parameters. These abilities are not available to legacy module types.

The order modules are executed respects product dependencies established with calls to consumes and produces. One module can put products into a ProcessBlock and another module get them.

beginProcessBlock occurs after beginJob and after beginStream, but before any run, lumi or event transitions. endProcessBlock occurs after all run, lumi, and event transitions, but before endStream and endJob. Each occurs only once per job, except in the very unusual case where all output files are closed and new output files opened. In that case only, there will be a pair of end and begin transitions in between the output file close and open.

This is only the first of a series of PRs implementing this feature. Future PRs will add persistency and the ability to access this information from Events, Runs, and Lumis through a cache. With this PR, these abilities cannot be used by OutputModules. That will be added later. (In this PR, the OutputModule part of this is partially implemented. In the code from the EventProcessor to the functions in the OutputModuleBase classes, the implementation is complete. The functions in OutputModuleBase in this PR do nothing. The PoolOutputModule only has the minimal changes required to prevent this PR from breaking it. The next PR will complete the implementation for output.)

Similarly, the implementation is not done in the InputSource. The future transition accessInputProcessBlock is also only partially implemented. (The part in the InputSource is not implemented although the part of the code from EventProcessor down to the modules is complete. Those functions will never get called with the code in this PR.)

There should be no changes in output. After this PR, these new abilities are only used by modules in Core tests. Nothing extra should be running in production jobs and nothing should be modified. You can see the new transitions in the output from the Tracer. There are new signals ActivityRegistry. The MessageLogger is modified to properly handle these new contexts. I expect any performance impact to be negligible until modules start doing nontrivial work in these transitions.

In comments to this PR requesting changes, it would be useful to identify issues that can be handled in future PRs. This PR is already large.

PR validation:

There are several new Framework unit tests and extensions to existing Framework unit tests to cover the new abilities. Output from tests in runTheMatrix and relVals should be identical.

I expect there will be conflicts between this PR and #29553 in at least EDConsumerBase. They should not be difficult to resolve though.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-30117/15881

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2020

A new Pull Request was created by @wddgit (W. David Dagenhart) for master.

It involves the following packages:

DataFormats/Provenance
FWCore/Framework
FWCore/Integration
FWCore/MessageService
FWCore/ServiceRegistry
FWCore/Services
FWCore/TestProcessor
FWCore/Utilities
IOPool/Input
IOPool/Output

@makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please review it and eventually sign? Thanks.
@makortel, @felicepantaleo, @rovere, @fwyzard this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented Jun 4, 2020

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2020

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/6828/console Started: 2020/06/04 22:03

@fwyzard
Copy link
Contributor

fwyzard commented Jun 4, 2020

@wddgit could you elaborate on what is the purpose of a ProcessBlock ?

@wddgit
Copy link
Contributor Author

wddgit commented Jun 4, 2020

Hi Andrea, Issue #28354 contains the primary discussion that I am aware of that lead to this pull request. The primary purpose is the DQM use case. The feature is general enough that it might be useful for other use cases as well. For example there are some generator products where it might be useful to use this instead of Run products. Possibly there are uses we have not thought of as well. Chris and Matti might want to add to my reply as they requested I start working on this. FYI @schneiml

@makortel makortel mentioned this pull request Jun 4, 2020
@makortel
Copy link
Contributor

makortel commented Jun 4, 2020

The most immediate motivation comes indeed from DQM (multi-run) harvesting (#28354), which this PR should be sufficient to solve. The full ProcessBlock feature will be more generic to accommodate other use cases as well. The ProcessBlock` is likely the closest thing to a "file-level storage" that we can get.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2020

-1

Tested at: 35ad468

CMSSW: CMSSW_11_2_X_2020-06-04-1100
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63ad0a/6828/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test EcalDCS_O2O_test had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2020

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63ad0a/6828/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 36
  • DQMHistoTests: Total histograms compared: 2783960
  • DQMHistoTests: Total failures: 30
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2783880
  • DQMHistoTests: Total skipped: 50
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 35 files compared)
  • Checked 152 log files, 16 edm output root files, 36 DQM output files

@wddgit
Copy link
Contributor Author

wddgit commented Jun 5, 2020

The unit test failure is not related to this PR. It looks unrelated from the content of the log. I can see the same error in other PRs. Probably I see the same unit test failure in the IBs as well (there is one in the same package, although I can't see to see any error at all in the IB log file, not sure why)

@fwyzard
Copy link
Contributor

fwyzard commented Jun 5, 2020

@wddgit @makortel thanks for the pointer to #28354 .

So, a "ProcessBlock" spans all the work done within a cmsRun job, or at least all the work done for a set of output files (this last part is not very clear).

It has the usual begin/end transitions - which IIUC are global, not per-stream.

Can it have data attached to it ?
Can such data be produced at the beginning, at the end, or both, of the ProcessBlock ?
Does that mean that one can also save per-ProcessBlock data in an output file ?

@wddgit
Copy link
Contributor Author

wddgit commented Jun 5, 2020

Yes. A "ProcessBlock" spans all the work done within a cmsRun job.

There is an exceptional case though. There is a maxSize parameter to OutputModule's. When the output file size exceeds this limit it tells the Framework to look for an opportunity to close all output files of all output modules and open new ones. There are a lot of details related to this that I am not going to try to describe in this comment (ask if you are interested). As far as I know, no one actually uses this capability. Although, maybe someone does that I do not know about. I was just looking through our unit tests and so far I have not even found any Core unit tests of this. Maybe we should add one. Anyway, in this special case the Framework will add additional "ProcessBlock" transitions as if each set of output files was from a different process.

ProcessBlock transitions are all global. There are no stream ProcessBlock transitions.

Yes, data can be attached to it. One can put data products into a ProcessBlock and get them with getByToken, just like for a Run, LuminosityBlock, or Event. There are unit tests in the PR that test this ability.

Data can be produced at the beginning, end, or both.

This PR does not implement persistence. With only this PR you cannot save per-ProcessBlock to a an output file. But the plan is to implement persistence in a future PR. Then you will be able to save per-ProcessBlock data to an output file. (Although just with the features in this PR, you can do useful things. One alternative to our current plan would be to never implement persistence.)

@makortel
Copy link
Contributor

makortel commented Jul 8, 2020

@cmsbuild, please test

Let's refresh the tests though

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2020

+1
Tested at: 3c64a1b
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63ad0a/7783/summary.html
CMSSW: CMSSW_11_2_X_2020-07-07-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2020

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2020

@makortel
Copy link
Contributor

makortel commented Jul 9, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63ad0a/7783/summary.html

@smuzaffar What happened to the summary of the comparisons?

The validateJR says no differences for both DQM and reco comparisons.

@smuzaffar
Copy link
Contributor

@mrodozov , can you please look in to this?

@mrodozov
Copy link
Contributor

mrodozov commented Jul 9, 2020

@smuzaffar

unfortunately the file is empty.
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_2_X_2020-07-07-1100+63ad0a/37565/validateJR/qaResultsSummary.log

there was some failures in the compare-root-files-short-matrix job. I restarted the job (I don't remember if this is possible, to run it standalone)

10:06:19 Traceback (most recent call last):
10:06:19   File "/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/cms-bot/report-pull-request-results.py", line 6, in <module>
10:06:19     from github import Github
10:06:19 ImportError: cannot import name Github
10:09:07 Traceback (most recent call last):
10:09:07   File "/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/cms-bot/report-pull-request-results.py", line 6, in <module>
10:09:07     from github import Github
10:09:07 ImportError: cannot import name Github
10:10:30 you chose the action COMPARISON_READY
10:10:30 API Rate Limit: 4796/5000, Reset in 758 sec i.e. at 2020-07-09 09:23:01
10:10:30 Alt comparison directory:  /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/upload/alternative-comparisons
10:10:30 
10:10:30 JR comparison Summary:  /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/upload/validateJR/qaResultsSummary.log

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 9, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63ad0a/7783/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2787364
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2787313
  • DQMHistoTests: Total skipped: 50
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 154 log files, 17 edm output root files, 37 DQM output files

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 1beb25d into cms-sw:master Jul 10, 2020
@wddgit wddgit deleted the processBlockFeature branch September 14, 2020 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants