Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect WaitingTask*Holder::taskHasFailed from a data race #38765

Merged
merged 2 commits into from
Jul 18, 2022

Conversation

Dr15Jones
Copy link
Contributor

PR description:

WaitingTask*Holder::taskHasFailed could call exceptionPtr() while the underlying object is being set. This change makes calls to exceptionPtr() safe across threads.

Thanks to @wddgit for catching this case.

PR validation:

Code compiles and FWCore/Concurrency unit tests pass.

The call to WaitingTaskHolder::taskHasFailed could attempt to call
exceptionPtr while it is being set. The function is now protected
from a data race.
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38765/31092

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

  • FWCore/Concurrency (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel, @wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

///Called if waited for task failed
/**Allows transfer of the exception caused by the dependent task to be
* moved to another thread.
* This method should only be called by WaitingTaskList
*/
void dependentTaskFailed(std::exception_ptr iPtr) {
bool isSet = false;
if (iPtr and m_ptrSet.compare_exchange_strong(isSet, true)) {
unsigned char isSet = static_cast<unsigned int>(State::kUnset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general a nice solution to the overall issue. Are the static_cast's necessary? isSet and the enum are both already unsigned char.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wddgit it wouldn't compile without the static_casts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit cast is needed because of enum class (with plain enum cast should not be necessary). But why not cast via unsigned char as both source and destination are only that wide?

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-32d402/26278/summary.html
COMMIT: 7a502ff
CMSSW: CMSSW_12_5_X_2022-07-15-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38765/26278/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test TestSubProcess had ERRORS

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3654053
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3654017
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

The function now returns an std::exception_ptr by value not by
pointer anymore.
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38765/31093

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

Pull request #38765 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@Dr15Jones
Copy link
Contributor Author

Please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-32d402/26287/summary.html
COMMIT: 58e901e
CMSSW: CMSSW_12_5_X_2022-07-16-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38765/26287/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3654053
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3654023
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

};

template <typename F>
class FunctorWaitingTask : public WaitingTask {
public:
explicit FunctorWaitingTask(F f) : func_(std::move(f)) {}

void execute() final { func_(exceptionPtr() ? &exceptionPtr() : nullptr); };
void execute() final { func_(uncheckedExceptionPtr() ? &uncheckedExceptionPtr() : nullptr); };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for future reference, the uncheckedExceptionPtr() can be used here because this function will be called only after the reference count has reached 0, and therefore nothing else can be writing into the m_ptr at the same time, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@makortel
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 88f3ad0 into cms-sw:master Jul 18, 2022
@Dr15Jones Dr15Jones deleted the protect_taskHasFailed branch August 1, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants