-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use WaitingTaskHolder
to signal doneWaiting()
instead of WaitingTaskWithArenaHolder
in framework
#47029
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47029/43145
|
A new Pull Request was created by @makortel for master. It involves the following packages:
@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
And I forgot the EventSetup side |
FWCore/Framework/src/Worker.cc
Outdated
ModuleContextSentry moduleContextSentry(&moduleCallingContext_, parentContext); | ||
try { | ||
convertException::wrap([&]() { this->implDoAcquire(info, &moduleCallingContext_, holder); }); | ||
WaitingTaskWithArenaHolder holderWithArena{holder}; | ||
convertException::wrap([&]() { this->implDoAcquire(info, &moduleCallingContext_, holderWithArena); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runAcquire()
is called by runAcquireAfterAsyncPrefetch()
that owns the holder
beyond the scope of runAcquire()
function. Therefore the destructor or doneWaiting()
of this holderWithArena
should not lead to the task reference count to drop to 0.
It could be cleaner to percolate the holder
as WaitingTaskHolder
through the implDoAcquire()
functions, but that seemed a little bit tedious. I could do it nevertheless, but I wanted to get feedback on this PR in general first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this IF you add a comment explaining that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll percolate the WaitingTaskHolder
through the call chain and create the WaitingTaskWithArenaHolder
only when really needed. And then add a comment there why holder
is copied instead of moved to holderWithArena
constructor (reason being the lifetime of holder must be longer than that of holderWithArena
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After realizing I'd have to add the same comment in 3 places, I wanted to try a different approach (as now pushed into this PR)
- the
runAcquire()
here takes theholder
by value, i.e. leading to theholder
in the callingrunAcquireAfterAsyncPrefetch()
to be copied - the
holder
is percolated by moving down to the places where the modules'acquire()
is called, and now moved to theWaitingTaskWithArenaHolder
- I used rvalue references for the percolation. Strictly speaking the intermediate functions could take
WaitingTaskHolder
by value, but since the call chain is purely internal to the framework and not intended to be called by elsewhere, the rvalue reference enforcing the moves from the caller side felt nice.
- I used rvalue references for the percolation. Strictly speaking the intermediate functions could take
runAcquireAfterAsyncPrefetch()
has a comment that it is important to copy theholder
torunAcquire()
This way there is one place where the copying of the holder is important, and that place is very near (only some lines above) the place where the other copy is used (that has the comment of it being important). Now also the behavior of holder within runAcquire()
is decoupled from the requirements of runAcquireAfterAsyncPrefetch()
, i.e. code called from runAcquire()
can not mess it up (and the moves there are just an optimization).
@@ -42,7 +42,7 @@ namespace edm { | |||
|
|||
// Takes ownership of the underlying task and uses the current | |||
// arena. | |||
explicit WaitingTaskWithArenaHolder(WaitingTaskHolder&& iTask); | |||
explicit WaitingTaskWithArenaHolder(WaitingTaskHolder iTask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change allows a WaitingTaskHolder
to be "implicitly copied" into WaitingTaskWithArenaHolder
, i.e.
WaitingTaskHolder h;
WaitingTaskWithArenaHolder wta{h};
// instead of
WaitingTaskHolder h;
WaitingTaskWithArenaHolder wta{WaitingTaskHolder{h}};
The pattern of WaitingTaskHolder
being moved should work as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment before the constructor should be changed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the latest changes in the PR this particular change is not really needed. I find the by-value interface more clear than by-rvalue-reference (i.e. I'd keep it), but would be willing to revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the latest changes in the PR this particular change is not really needed
I was wrong, the TransformerBase::transformImpAsync()
makes use of this constructor.
1d9168d
to
16a69d8
Compare
Now the EventSetup side should be included as well |
enable gpu, threading |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47029/43146
|
Pull request #47029 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
@cmsbuild, please test |
-1 Failed Tests: GpuUnitTests GPU Unit TestsI found 2 errors in the following unit tests: ---> test testCudaDeviceAdditionWrapper had ERRORS ---> test testCudaDeviceAdditionKernel had ERRORS Comparison SummarySummary:
GPU Comparison SummarySummary:
|
AFAICT these tests are independent of the changes in this PR |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47029/43222
|
Pull request #47029 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
@cmsbuild, please test |
@Dr15Jones Could you take another look? |
-1 Failed Tests: UnitTests GpuUnitTests Unit TestsI found 1 errors in the following unit tests: ---> test runtestPhysicsToolsPatAlgos had ERRORS GPU Unit TestsI found 2 errors in the following unit tests: ---> test testCudaDeviceAdditionWrapper had ERRORS ---> test testCudaDeviceAdditionKernel had ERRORS Comparison SummarySummary:
GPU Comparison SummarySummary:
|
ignore tests-rejected with ib-failure |
+core |
This pull request is fully signed and it will be integrated in one of the next master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
In the framework the
doneWaiting()
is always called from within the main arena in the TBB thread pool, and therefore usingWaitingTaskHolder is safe (
WaitingTaskWithArenaHolder
is needed only whendoneWaiting()
is called outside of the TBB arena).Avoiding
WaitingTaskWithArenaHolder
allows to avoid enqueue() operation when thedoneWaiting()
calls in the framework are the ones that decrease the task reference count to 0.Resolves cms-sw/framework-team#1125
PR validation:
Checked with gdb that a single-threaded test configuration with a module that uses
ExternalWork
(Alpaka-based module on the CPU serial backend, to be exact) where theacquire()
does not leave anyWaitingTaskWithArenaHolder
alive (i.e. theacquire()
does not really launch any asynchronous work) does not lead anymore to two TBB threads as a consequence of thetask_arena::enqueue()
.