Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow an InputSource to declare a run/lumi is the last to be merged #43457

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented Nov 30, 2023

PR description:

Allow an InputSource to declare that a run entry or a lumi entry is the last one that needs to be merged. The next run entry should have a different run number or ProcessHistoryID. Or the next lumi entry should be associated with a different run or have a different luminosity block number. Or the next run or lumi entry could be in a different file.

It is OK for InputSources not to make this declaration. The Framework can and already does figure this out by examining the next entry returned by getNextItemType. Existing sources should work OK.

This might be a useful optimization in an online source when the getNextItemType function takes a long time to return. If the last item is declared, then the global run transition or the global lumi transition can begin immediately. Otherwise, the Framework has to wait for getNextItemType to return for the next entry in order to know that the previous entry was the last. And that delays global begin run or global begin lumi.

This probably will not help much for offline input sources because getNextItemType returns quickly.

The Framework will detect the case where a source falsely declares an item is the last run or lumi entry. It will throw an exception indicating a bug in the InputSource.

This delay was noticed when investigating issue #42931, but this PR does not address either of the problems listed there.

There are some minor changes in the InputSource interface. The getNextItemType function now returns an object of type ItemTypeInfo instead of the enum ItemType. Also, the ItemType enum is changed to be an enum class (more of a style modernization, I could back that out if requested).

PR validation:

Three new unit tests are added for this feature. Existing core unit tests pass. Limited runTheMatrix passes.

Manually, you can also see the delay vanishes in the log file of one of the unit tests, but it is difficult to turn that into a pass/fail criteria as the timing of execution can vary and we don't want tests that occasionally fail due to timing variations.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43457/38008

  • This PR adds an extra 152KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 30, 2023

A new Pull Request was created by @wddgit (W. David Dagenhart) for master.

It involves the following packages:

  • DQMServices/FwkIO (dqm)
  • DQMServices/StreamerIO (dqm)
  • FWCore/Framework (core)
  • FWCore/Integration (core)
  • FWCore/Modules (core)
  • FWCore/Sources (core)
  • IOPool/Input (core)

@rvenditti, @cmsbuild, @antoniovagnerini, @makortel, @syuvivida, @tjavaid, @smuzaffar, @nothingface0, @Dr15Jones can you please review it and eventually sign? Thanks.
@missirol, @barvic, @makortel this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented Nov 30, 2023

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2023

-1

Failed Tests: AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-274562/36219/summary.html
COMMIT: 7b27eae
CMSSW: CMSSW_14_0_X_2023-11-30-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43457/36219/install.sh to create a dev area with all the needed externals and cmssw changes.

AddOn Tests

----- Begin Fatal Exception 01-Dec-2023 00:47:00 CET-----------------------
An exception of category 'StdException' occurred while
   [0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob)
Exception Message:
A std::exception was thrown.
vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
   Additional Info:
      [a] Another exception was caught while trying to clean up runs after the primary fatal exception.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 01-Dec-2023 00:47:03 CET-----------------------
An exception of category 'StdException' occurred while
   [0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob)
Exception Message:
A std::exception was thrown.
vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
   Additional Info:
      [a] Another exception was caught while trying to clean up runs after the primary fatal exception.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 01-Dec-2023 00:46:43 CET-----------------------
An exception of category 'StdException' occurred while
   [0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob)
Exception Message:
A std::exception was thrown.
vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
   Additional Info:
      [a] Another exception was caught while trying to clean up runs after the primary fatal exception.
----- End Fatal Exception -------------------------------------------------

Comparison Summary

Summary:

@wddgit
Copy link
Contributor Author

wddgit commented Dec 1, 2023

please test

Try again. See if the failures are reproducible. The only thing I noticed so far is that the 3 failures occurred within 20 seconds of each other in real time. Maybe some system glitch...

@wddgit
Copy link
Contributor Author

wddgit commented Dec 1, 2023

abort test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43457/38041

  • This PR adds an extra 152KB to repository

@wddgit
Copy link
Contributor Author

wddgit commented Dec 1, 2023

please test

Try again. I missed adding ServiceRegistry::Operate operate(serviceToken_); in a spot where it was needed. I squashed and force pushed a fix.

@wddgit
Copy link
Contributor Author

wddgit commented Dec 11, 2023

I implemented all Chris's suggestions except for making the ItemTypeInfo constructor explicit (see comment above). Thanks, it is better. Also rebased. I'll rerun the tests when code checks passes.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43457/38140

@cmsbuild
Copy link
Contributor

Pull request #43457 was updated. @syuvivida, @nothingface0, @rvenditti, @antoniovagnerini, @tjavaid, @smuzaffar, @makortel, @Dr15Jones, @cmsbuild can you please check and sign again.

@wddgit
Copy link
Contributor Author

wddgit commented Dec 11, 2023

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-274562/36423/summary.html
COMMIT: 1b3d395
CMSSW: CMSSW_14_0_X_2023-12-11-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43457/36423/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@makortel
Copy link
Contributor

+core

@makortel
Copy link
Contributor

@cms-sw/dqm-l2 Could you review and sign? The changes in your area should be straightforward.

@tjavaid
Copy link

tjavaid commented Dec 14, 2023

+1

  • with spurious differences coming from WFs mentioned above by @makortel .

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 71e2c07 into cms-sw:master Dec 19, 2023
11 checks passed
@wddgit wddgit deleted the lastRunOrLumiToBeMerged branch March 11, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants