-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch FastTimerService to using a local thread observer #33261
Switch FastTimerService to using a local thread observer #33261
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33261/21747
|
A new Pull Request was created by @dan131riley (Dan Riley) for master. It involves the following packages: HLTrigger/Timer @Martin-Grunewald, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d1c5b/13747/summary.html Comparison SummarySummary:
|
// initialise the measurement point for a thread that has newly joining the TBB pool | ||
thread().measure(); | ||
FastTimerService::ThreadGuard::ThreadGuard() { | ||
auto err = ::pthread_key_create(&key_, retire_thread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, retire_thread
is called when the worker thread exits.
Why not call it from on_scheduler_exit
instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monitoring the primary arena, on_scheduler_entry
and on_scheduler_exit
get called many times as threads get moved between different arenas, and there's no way to tell when the call to on_scheduler_exit
is the final exit. This is the only reliable way I could think of to catch if a thread gets deleted before the end job sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so with this approach, the time (and resources) spent by a thread outside the main arena would be accounted as "overhead", right ?
Which is not a bad thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that with either the global observer or this PR, what gets measured as overhead is the time outside of any defined CMS module, irrespective of the arena. With the global observer you were getting one call to on_scheduler_enter
and on_scheduler_exit
per thread, and I'm trying to replicate that with the observer on the primary arena.
Measurement& thread(); | ||
void finalize(); | ||
|
||
tbb::concurrent_vector<std::unique_ptr<specific_t>> thread_resources_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a tbb::concurrent_vector<std::unique_ptr<specific_t>>
?
wouldn't tbb::concurrent_vector<specific_t>
also work, and avoid one indirection ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it needs to be std::shared_ptr
. Otherwise there's a potential use-after-free if the threads exit after the FastTimerService
gets destructed. We've observed that TBB often doesn't destroy threads until global destructors are called (I believe using the global observer you may actually have been missing some overhead from on_scheduler_exit
getting called after postEndJob
).
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33261/21760
|
Pull request #33261 was updated. @Martin-Grunewald, @cmsbuild, @fwyzard can you please check and sign again. |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d1c5b/13769/summary.html Comparison SummarySummary:
|
With this PR I no longer see those crashes, thanks! |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
what the ... |
did anyone actually check the performance of these changes, before merging ? |
@fwyzard Sorry if there is any misunderstanding. It was based on above PR validation and test results, also on comments/signature from hlt ( @Martin-Grunewald ).
Moreover, I don't see you have further comment... |
@fwyzard I did run some timing tests, and saw no significant change in the overall timing. There was a small increase in the "other" overhead category, but prior to this PR I believe "other" may have been undercounted. There's more investigation to do, particularly to see if a less defensive approach would be feasible, so I'd consider this an interim step, not necessarily final, but it's important for development to see that it fixes the crashes. |
@qliphy well, sorry for not being able to reply to every PR all the time. @dan131riley thanks, if you tell me that this will not necessarily be the final version, I'm OK with that. |
IB test CMSSW_11_3_X_2021-03-26-2300 (slc7_amd64_gcc900) shows no crash anymore. |
PR description:
With the new task_group interfaces, global thread observers cause crashes at the end of the job, and the interface for global thread observers is being removed in the latest versions of TBB. This PR modifies the FastTimerService to use a local thread observer on the primary framework arena to register threads, and then tracks individual threads using lower level pthreads interfaces instead of the TBB enumerable_thread_specific (which, under the hood, uses the same pthread mechanisms). In the normal case, this should reproduce the results of the FastTimerService with a global observer and account for any threads that are added to the arena.
PR validation:
Tested on one workflow (136.885501) that it produces results consistent with the results with the global thread observer, and doesn't crash on exit.