Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infrequent deadlock in edm::ThreadHandoff #34984

Merged
merged 1 commit into from
Aug 24, 2021

Conversation

Dr15Jones
Copy link
Contributor

PR description:

  • Need to hold the lock until call to wait to avoid the possiblility of a deadlock.
  • The deadlock has been showing up in the IBs.

PR validation:

Ran workflow 159.1 with 4 threads and everything ran fine.

Need to hold the lock until call to `wait` to avoid the possiblility
of a deadlock.
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34984/24828

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

  • SimG4Core/Application (simulation)

@cmsbuild, @civanch, @mdhildreth can you please review it and eventually sign? Thanks.
@makortel, @cvuosalo, @rovere, @bsunanda, @fabiocos, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

please test

@Dr15Jones
Copy link
Contributor Author

type bug

@Dr15Jones
Copy link
Contributor Author

This fixes IB timeouts with the following tracebacks

Thread 5 (Thread 0x2af7dce00700 (LWP 28901) "cmsRun"):
#0  0x00002af7968b12fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002af7963f864c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /build/muz/g9/boot/BUILD/cc8_amd64_gcc9/external/gcc/9.3.0/gcc-9.3.0/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/x86_64-redhat-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00002af7c795a69b in omt::ThreadHandoff::threadLoop(void*) () from /cvmfs/cms-ib.cern.ch/week1/cc8_amd64_gcc9/cms/cmssw/CMSSW_12_1_X_2021-08-22-2300/biglib/cc8_amd64_gcc9/pluginSimulation.so
#4  0x00002af7968ab14a in start_thread () from /lib64/libpthread.so.0
#5  0x00002af796bbff23 in clone () from /lib64/libc.so.6
[cut]
Thread 1 (Thread 0x2af79790ff00 (LWP 28639) "cmsRun"):
[cut]
#4  <signal handler called>
#5  0x00002af7968b12fa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#6  0x00002af7963f864c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /build/muz/g9/boot/BUILD/cc8_amd64_gcc9/external/gcc/9.3.0/gcc-9.3.0/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/x86_64-redhat-linux-gnu/bits/gthr-default.h:865
#7  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#8  0x00002af7c796444b in OscarMTProducer::OscarMTProducer(edm::ParameterSet const&, OscarMTMasterThread const*) () from /cvmfs/cms-ib.cern.ch/week1/cc8_amd64_gcc9/cms/cmssw/CMSSW_12_1_X_2021-08-22-2300/biglib/cc8_amd64_gcc9/pluginSimulation.so

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef1e2c/17973/summary.html
COMMIT: 01f1c79
CMSSW: CMSSW_12_1_X_2021-08-23-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/34984/17973/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef1e2c/17973/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef1e2c/17973/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 3000352
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3000330
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@civanch
Copy link
Contributor

civanch commented Aug 23, 2021

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Aug 24, 2021

+1

@cmsbuild cmsbuild merged commit 130c6bc into cms-sw:master Aug 24, 2021
@Dr15Jones Dr15Jones deleted the fixDeadlockThreadHandoff branch August 24, 2021 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants