Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WaitingTaskHolder to signal doneWaiting() instead of WaitingTaskWithArenaHolder in framework #47029

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Dec 27, 2024

PR description:

In the framework the doneWaiting() is always called from within the main arena in the TBB thread pool, and therefore using
WaitingTaskHolder is safe (WaitingTaskWithArenaHolder is needed only when doneWaiting() is called outside of the TBB arena).

Avoiding WaitingTaskWithArenaHolder allows to avoid enqueue() operation when the doneWaiting() calls in the framework are the ones that decrease the task reference count to 0.

Resolves cms-sw/framework-team#1125

PR validation:

Checked with gdb that a single-threaded test configuration with a module that uses ExternalWork (Alpaka-based module on the CPU serial backend, to be exact) where the acquire() does not leave any WaitingTaskWithArenaHolder alive (i.e. the acquire() does not really launch any asynchronous work) does not lead anymore to two TBB threads as a consequence of the task_arena::enqueue().

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 27, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47029/43145

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/Concurrency (core)
  • FWCore/Framework (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@missirol, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

And I forgot the EventSetup side

ModuleContextSentry moduleContextSentry(&moduleCallingContext_, parentContext);
try {
convertException::wrap([&]() { this->implDoAcquire(info, &moduleCallingContext_, holder); });
WaitingTaskWithArenaHolder holderWithArena{holder};
convertException::wrap([&]() { this->implDoAcquire(info, &moduleCallingContext_, holderWithArena); });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runAcquire() is called by runAcquireAfterAsyncPrefetch() that owns the holder beyond the scope of runAcquire() function. Therefore the destructor or doneWaiting() of this holderWithArena should not lead to the task reference count to drop to 0.

It could be cleaner to percolate the holder as WaitingTaskHolder through the implDoAcquire() functions, but that seemed a little bit tedious. I could do it nevertheless, but I wanted to get feedback on this PR in general first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this IF you add a comment explaining that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll percolate the WaitingTaskHolder through the call chain and create the WaitingTaskWithArenaHolder only when really needed. And then add a comment there why holder is copied instead of moved to holderWithArena constructor (reason being the lifetime of holder must be longer than that of holderWithArena).

Copy link
Contributor Author

@makortel makortel Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After realizing I'd have to add the same comment in 3 places, I wanted to try a different approach (as now pushed into this PR)

  • the runAcquire() here takes the holder by value, i.e. leading to the holder in the calling runAcquireAfterAsyncPrefetch() to be copied
  • the holder is percolated by moving down to the places where the modules' acquire() is called, and now moved to the WaitingTaskWithArenaHolder
    • I used rvalue references for the percolation. Strictly speaking the intermediate functions could take WaitingTaskHolder by value, but since the call chain is purely internal to the framework and not intended to be called by elsewhere, the rvalue reference enforcing the moves from the caller side felt nice.
  • runAcquireAfterAsyncPrefetch() has a comment that it is important to copy the holder to runAcquire()

This way there is one place where the copying of the holder is important, and that place is very near (only some lines above) the place where the other copy is used (that has the comment of it being important). Now also the behavior of holder within runAcquire() is decoupled from the requirements of runAcquireAfterAsyncPrefetch(), i.e. code called from runAcquire() can not mess it up (and the moves there are just an optimization).

@@ -42,7 +42,7 @@ namespace edm {

// Takes ownership of the underlying task and uses the current
// arena.
explicit WaitingTaskWithArenaHolder(WaitingTaskHolder&& iTask);
explicit WaitingTaskWithArenaHolder(WaitingTaskHolder iTask);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows a WaitingTaskHolder to be "implicitly copied" into WaitingTaskWithArenaHolder, i.e.

WaitingTaskHolder h;
WaitingTaskWithArenaHolder wta{h};
// instead of
WaitingTaskHolder h;
WaitingTaskWithArenaHolder wta{WaitingTaskHolder{h}};

The pattern of WaitingTaskHolder being moved should work as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment before the constructor should be changed as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the latest changes in the PR this particular change is not really needed. I find the by-value interface more clear than by-rvalue-reference (i.e. I'd keep it), but would be willing to revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the latest changes in the PR this particular change is not really needed

I was wrong, the TransformerBase::transformImpAsync() makes use of this constructor.

@makortel makortel force-pushed the waitingTaskWithArenaHolder branch from 1d9168d to 16a69d8 Compare December 27, 2024 22:00
@makortel
Copy link
Contributor Author

And I forgot the EventSetup side

Now the EventSetup side should be included as well

@makortel
Copy link
Contributor Author

enable gpu, threading

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47029/43146

@cmsbuild
Copy link
Contributor

Pull request #47029 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: GpuUnitTests
Size: This PR adds an extra 52KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-450e38/43604/summary.html
COMMIT: 16a69d8
CMSSW: CMSSW_15_0_X_2024-12-27-1100/el8_amd64_gcc12
Additional Tests: GPU,THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47029/43604/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Unit Tests

I found 2 errors in the following unit tests:

---> test testCudaDeviceAdditionWrapper had ERRORS
---> test testCudaDeviceAdditionKernel had ERRORS

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53071
  • DQMHistoTests: Total failures: 898
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52173
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

---> test testCudaDeviceAdditionWrapper had ERRORS
---> test testCudaDeviceAdditionKernel had ERRORS

AFAICT these tests are independent of the changes in this PR

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47029/43222

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2025

Pull request #47029 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Contributor Author

makortel commented Jan 8, 2025

@cmsbuild, please test

@makortel
Copy link
Contributor Author

makortel commented Jan 8, 2025

@Dr15Jones Could you take another look?

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2025

-1

Failed Tests: UnitTests GpuUnitTests
Size: This PR adds an extra 92KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-450e38/43685/summary.html
COMMIT: e77e887
CMSSW: CMSSW_15_0_X_2025-01-08-1100/el8_amd64_gcc12
Additional Tests: GPU,THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47029/43685/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test runtestPhysicsToolsPatAlgos had ERRORS

GPU Unit Tests

I found 2 errors in the following unit tests:

---> test testCudaDeviceAdditionWrapper had ERRORS
---> test testCudaDeviceAdditionKernel had ERRORS

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53071
  • DQMHistoTests: Total failures: 29
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 53042
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Jan 9, 2025

Comparison differences are related to #39803, #46416, and #39803

Unit test failures are related to #47052 and #46864

@makortel
Copy link
Contributor Author

makortel commented Jan 9, 2025

ignore tests-rejected with ib-failure

@makortel
Copy link
Contributor Author

makortel commented Jan 9, 2025

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 79e4b35 into cms-sw:master Jan 10, 2025
13 of 15 checks passed
@makortel makortel deleted the waitingTaskWithArenaHolder branch January 10, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use WaitingTaskHolder to signal doneWaiting() instead of WaitingTaskWithArenaHolder in framework
4 participants