Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[12_3_X] Backport of 37617 + 37798 #37933

Merged
merged 2 commits into from
May 17, 2022

Conversation

AdrianoDee
Copy link
Contributor

PR description:

In order to run properly, a backport to 12_3_X of #37860 would need backports of #37798 and #37617. This PR is the combination of those backports.

PR validation:

Running 39434.501, 39434.502 and 11634.x , with x=501, 502 and 503.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AdrianoDee for CMSSW_12_3_X.

It involves the following packages:

  • CUDADataFormats/TrackingRecHit (heterogeneous, reconstruction)
  • DQM/SiPixelPhase1Heterogeneous (dqm)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoPixelVertexing/Configuration (reconstruction)
  • RecoPixelVertexing/PixelTrackFitting (reconstruction)

@emanueleusai, @makortel, @fwyzard, @ahmad3213, @cmsbuild, @jfernan2, @clacaputo, @slava77, @jpata, @pmandrik, @micsucmed, @rvenditti can you please review it and eventually sign? Thanks.
@ferencek, @hdelanno, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @gpetruc, @OzAmram, @fioriNTU, @jandrea, @mtosi, @idebruyn, @mmusich, @dkotlins, @threus, @dgulhan, @tvami this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 11634.503, 11634.502, 39434.502
  • workflows = 11634.501, 39434.501, 11634.502
  • relvals_opt= -w standard,highstats,pileup,generator,extendedgen,production,upgrade,cleanedupgrade,ged

@AdrianoDee
Copy link
Contributor Author

type trk

@cmsbuild cmsbuild added the trk label May 12, 2022
@AdrianoDee
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d844a/24680/summary.html
COMMIT: 604f84e
CMSSW: CMSSW_12_3_X_2022-05-12-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37933/24680/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 954
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 18920
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-7d844a/39434.501_TTbar_14TeV+2026D88_Patatrack_PixelOnlyCPU+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 3744552
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3744516
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 51 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 216 log files, 54 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

@emanueleusai
Copy link
Member

+1

@emanueleusai
Copy link
Member

tested successfully at P5

@jpata
Copy link
Contributor

jpata commented May 16, 2022

+reconstruction

  • technical, a backport of CPUvsGPU DQM rearrangement

@fwyzard
Copy link
Contributor

fwyzard commented May 16, 2022

+heterogeneous

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_3_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_5_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented May 17, 2022

+1

@cmsbuild cmsbuild merged commit a6dbacc into cms-sw:CMSSW_12_3_X May 17, 2022
@missirol
Copy link
Contributor

missirol commented May 31, 2022

I suspect this PR is related to a crash seen on HLT GPU nodes earlier today at P5. Preliminary checks suggest that there is a missing protection against nHits==0 in SiPixelRecHitSoAFromCUDA::produce.

Could experts please have a look?

@AdrianoDee

FYI: @fwyzard @zarucki @Sam-Harper @silviodonato @elfontan

@AdrianoDee
Copy link
Contributor Author

Hi @missirol, is there a log for the crash?

@AdrianoDee
Copy link
Contributor Author

I see what you mean, let me add the safeguard anyway.

@missirol
Copy link
Contributor

Thanks, @AdrianoDee . This [*] avoids the crash in my local checks, but I don't know if it is the correct fix.

IIuc from FOG, the fix is relevant for a situation (like earlier today) where Pixel is out of global, and the HLT runs on GPU nodes with triggers using Pixel reco. Ideally, the fix would be in the upcoming 12_3_X release, but I can't judge how critical this is (FYI: @fabiocos , ORM).

[*]

diff --git a/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc b/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc
index fda418320e7..2af4588f92f 100644
--- a/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc
+++ b/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc
@@ -82,7 +82,10 @@ void SiPixelRecHitSoAFromCUDA::acquire(edm::Event const& iEvent,
 
 void SiPixelRecHitSoAFromCUDA::produce(edm::Event& iEvent, edm::EventSetup const& es) {
   auto hmsp = std::make_unique<uint32_t[]>(nMaxModules_ + 1);
-  std::copy(hitsModuleStart_.get(), hitsModuleStart_.get() + nMaxModules_ + 1, hmsp.get());
+
+  if(nHits_ > 0) {
+    std::copy(hitsModuleStart_.get(), hitsModuleStart_.get() + nMaxModules_ + 1, hmsp.get());
+  }
 
   iEvent.emplace(hostPutToken_, std::move(hmsp));
   iEvent.emplace(hitsPutTokenCPU_, store32_.get(), store16_.get(), hitsModuleStart_.get(), nHits_);

@missirol
Copy link
Contributor

is there a log for the crash?

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Tue May 31 11:16:48 CEST 2022
Thread 12 (Thread 0x7fa5b63ff700 (LWP 292252) "cmsRun"):
#0 0x00007fa609468ddd in poll () from /lib64/libc.so.6
#1 0x00007fa5fd83a28f in full_read.constprop () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2 0x00007fa5fd83ac1c in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3 0x00007fa5fd83d56b in sig_dostack_then_abort () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4 <signal handler called>
#5 0x00007fa6094d1bc0 in __memmove_ssse3_back () from /lib64/libc.so.6
#6 0x00007fa5ab78af74 in SiPixelRecHitSoAFromCUDA::produce(edm::Event&, edm::EventSetup const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
#7 0x00007fa60bec5c63 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#8 0x00007fa60beaed8f in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#9 0x00007fa60be09fc5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule(edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*)::{lambda()#1}>(edm::Worker::runModule >(edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*)::{lambda()#1}) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#10 0x00007fa60be0a2bb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#11 0x00007fa60be0c8a5 in edm::Worker::RunModuleTask<edm::OccurrenceTraits::execute() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#12 0x00007fa60bd50b75 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#13 0x00007fa60a53e06c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fa54302a000, waiter=..., this=0x7fa606bdaa00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#14 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fa606bdaa00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#15 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/arena.cpp:138
#16 0x00007fa60a54a5b3 in tbb::detail::r1::market::process (j=..., this=0x7fa606bdb580) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/market.cpp:597
#17 tbb::detail::r1::rml::private_worker::run (this=0x7fa603b07000) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/private_server.cpp:267
#18 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fa603b07000) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/private_server.cpp:221
#19 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7fa5b75e9700 (LWP 292251) "cmsRun"):
#0 0x00007fa60943a9fd in nanosleep () from /lib64/libc.so.6
#1 0x00007fa60943a894 in sleep () from /lib64/libc.so.6
#2 0x00007fa5fd839e20 in sig_pause_for_stacktrace () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3 <signal handler called>
#4 0x00007fa60bf9d7c5 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
#5 0x00007fa60bf9e09f in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#6 0x00007fa60bfa2dee in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#7 0x00007fa60bfaaaaa in _dl_runtime_resolve_xsavec () from /lib64/ld-linux-x86-64.so.2
#8 0x00007fa5ab606e9f in PixelTrackSoAFromCUDA::produce(edm::Event&, edm::EventSetup const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginRecoPixelVertexingPixelTrackFittingPlugins.so
#9 0x00007fa60bec5c63 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#10 0x00007fa60beaed8f in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#11 0x00007fa60be09fc5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule(edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*)::{lambda()#1}>(edm::Worker::runModule >(edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*)::{lambda()#1}) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#12 0x00007fa60be0a2bb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#13 0x00007fa60be0c8a5 in edm::Worker::RunModuleTask<edm::OccurrenceTraits::execute() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#14 0x00007fa60bd50b75 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#15 0x00007fa60a53e06c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fa542f54f00, waiter=..., this=0x7fa606bda900) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fa606bda900) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#17 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/arena.cpp:138
#18 0x00007fa60a54a5b3 in tbb::detail::r1::market::process (j=..., this=0x7fa606bdb580) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/market.cpp:597
#19 tbb::detail::r1::rml::private_worker::run (this=0x7fa603b07100) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/private_server.cpp:267
#20 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fa603b07100) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/private_server.cpp:221
#21 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7fa5b7fea700 (LWP 292250) "cmsRun"):
#0 0x00007fa60943a9fd in nanosleep () from /lib64/libc.so.6
#1 0x00007fa60943a894 in sleep () from /lib64/libc.so.6
#2 0x00007fa5fd839e20 in sig_pause_for_stacktrace () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3 <signal handler called>
#4 0x00007fa60975273d in __libc_sigaction () from /lib64/libpthread.so.0
#5 0x00007fa5fd83d977 in sig_dostack_then_abort () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#6 <signal handler called>
#7 0x00007fa6094d1bc0 in __memmove_ssse3_back () from /lib64/libc.so.6
#8 0x00007fa5ab78af74 in SiPixelRecHitSoAFromCUDA::produce(edm::Event&, edm::EventSetup const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
#9 0x00007fa60bec5c63 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#10 0x00007fa60beaed8f in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#11 0x00007fa60be09fc5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule(edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*)::{lambda()#1}>(edm::Worker::runModule >(edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*)::{lambda()#1}) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#12 0x00007fa60be0a2bb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits::Context const*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#13 0x00007fa60be0c8a5 in edm::Worker::RunModuleTask<edm::OccurrenceTraits::execute() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#14 0x00007fa60bd50b75 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#15 0x00007fa60a53e06c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fa542f8c100, waiter=..., this=0x7fa606bda980) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fa606bda980) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#17 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/arena.cpp:138
#18 0x00007fa60a54a5b3 in tbb::detail::r1::market::process (j=..., this=0x7fa606bdb580) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/market.cpp:597
#19 tbb::detail::r1::rml::private_worker::run (this=0x7fa603b07080) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/private_server.cpp:267
#20 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fa603b07080) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/private_server.cpp:221
#21 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7fa53c608700 (LWP 292249) "cmsRun"):
#0 0x00007fa60943a9fd in nanosleep () from /lib64/libc.so.6
#1 0x00007fa60943a894 in sleep () from /lib64/libc.so.6
#2 0x00007fa5ffae2612 in evf::FastMonitoringService::snapshotRunner() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libEventFilterUtilities.so
#3 0x00007fa609d4af90 in std::execute_native_thread_routine (__p=0x7fa525c30880) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fa58c3ff700 (LWP 292132) "cmsRun"):
#0 0x00007fa60974ea35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa609d45a7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2 std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x00007fa59f4e194b in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#4 0x00007fa59f4e2198 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#5 0x00007fa59f4dea88 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function::_M_invoke(std::_Any_data const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#6 0x00007fa59c31cb61 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_framework.so.2
#7 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fa58d56e700 (LWP 292131) "cmsRun"):
#0 0x00007fa60974ea35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa609d45a7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2 std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x00007fa59f4e194b in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#4 0x00007fa59f4e2198 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#5 0x00007fa59f4dea88 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function::_M_invoke(std::_Any_data const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#6 0x00007fa59c31cb61 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_framework.so.2
#7 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fa58dd6f700 (LWP 292130) "cmsRun"):
#0 0x00007fa60974ea35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa609d45a7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2 std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x00007fa59f4e194b in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#4 0x00007fa59f4e2198 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#5 0x00007fa59f4dea88 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function::_M_invoke(std::_Any_data const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#6 0x00007fa59c31cb61 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libtensorflow_framework.so.2
#7 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fa594101700 (LWP 292110) "cmsRun"):
#0 0x00007fa60974ea35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa609d45a7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2 std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x00007fa5ffaf41ad in FedRawDataInputSource::readWorker(unsigned int) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libEventFilterUtilities.so
#4 0x00007fa609d4af90 in std::execute_native_thread_routine (__p=0x7fa5faa5fed0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#5 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fa5bffff700 (LWP 291820) "cuda-EvtHandlr"):
#0 0x00007fa609468ddd in poll () from /lib64/libc.so.6
#1 0x00007fa5ee5063c1 in ?? () from /lib64/libcuda.so.1
#2 0x00007fa5ee511d3a in ?? () from /lib64/libcuda.so.1
#3 0x00007fa5ee500f16 in ?? () from /lib64/libcuda.so.1
#4 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fa5d1529700 (LWP 291815) "cuda-EvtHandlr"):
#0 0x00007fa609468ddd in poll () from /lib64/libc.so.6
#1 0x00007fa5ee5063c1 in ?? () from /lib64/libcuda.so.1
#2 0x00007fa5ee511d3a in ?? () from /lib64/libcuda.so.1
#3 0x00007fa5ee500f16 in ?? () from /lib64/libcuda.so.1
#4 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fa5d7fff700 (LWP 291760) "cmsRun"):
#0 0x00007fa6097521d9 in waitpid () from /lib64/libpthread.so.0
#1 0x00007fa5fd839fd7 in edm::service::cmssw_stacktrace_fork() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2 0x00007fa5fd83ab4a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3 0x00007fa609d4af90 in std::execute_native_thread_routine (__p=0x7fa602ac5d20) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4 0x00007fa60974aea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fa609473b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fa6075c6540 (LWP 290336) "cmsRun"):
#0 0x00007fa60943a9fd in nanosleep () from /lib64/libc.so.6
#1 0x00007fa60943a894 in sleep () from /lib64/libc.so.6
#2 0x00007fa5fd839e20 in sig_pause_for_stacktrace () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3 <signal handler called>
#4 0x00007fa59ab374e0 in frontierMemData_append () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#5 0x00007fa59ab37a29 in frontierMemData_b64append () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#6 0x00007fa59ab37f8f in frontierPayload_append () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#7 0x00007fa59ab385f3 in xml_cdata () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#8 0x00007fa59aae77dc in doContent (parser=parser@entry=0x7fa527362c00, startTagLevel=startTagLevel@entry=0, enc=<optimized out>, s=, end=, nextPtr=0x7fa527362c30, haveMore=1 '\001') at lib/xmlparse.c:2653
#9 0x00007fa59aae853c in contentProcessor (parser=0x7fa527362c00, start=<optimized out>, end=, endPtr=) at lib/xmlparse.c:2105
#10 0x00007fa59aaeaa93 in XML_ParseBuffer (parser=0x7fa527362c00, len=<optimized out>, isFinal=0) at lib/xmlparse.c:1651
#11 0x00007fa59ab390be in FrontierResponse_append () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#12 0x00007fa59ab31d3e in write_data () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#13 0x00007fa59ab32f3e in get_data () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#14 0x00007fa59ab33632 in frontier_postRawData () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#15 0x00007fa59ab332a0 in frontier_getRawData () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#16 0x00007fa59ab2ab1b in frontier::Session::getData(std::vector<frontier::Request const*, std::allocator const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/libfrontier_client.so.2
#17 0x00007fa59ab9575b in coral::FrontierAccess::Statement::execute(coral::AttributeList const&, int) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/liblcg_FrontierAccess.so
#18 0x00007fa59abba2c2 in coral::FrontierAccess::Query::execute() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_4_patch2/external/slc7_amd64_gcc10/lib/liblcg_FrontierAccess.so
#19 0x00007fa5af6aeb44 in cond::persistency::PAYLOAD::Table::select(std::__cxx11::basic_string<char, std::char_traits > const&, std::__cxx11::basic_string, std::allocator >&, cond::Binary&, cond::Binary&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#20 0x00007fa59a4abf07 in std::unique_ptr<EcalCondObjectContainer > > cond::persistency::Session::fetchPayload >(std::__cxx11::basic_string, std::allocator > const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginCondCoreEcalPlugins.so
#21 0x00007fa59a4ac27d in cond::persistency::PayloadProxy<EcalCondObjectContainer::loadPayload() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginCondCoreEcalPlugins.so
#22 0x00007fa59a48167b in DataProxy<EcalLaserAlphasRcd, EcalCondObjectContainer > >::prefetch(edm::eventsetup::DataKey const&, edm::EventSetupRecordDetails) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/pluginCondCoreEcalPlugins.so
#23 0x00007fa60bd6567e in edm::eventsetup::ESSourceDataProxyBase::doPrefetchAndSignals(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, edm::ESParentContext const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#24 0x00007fa60bd65ac5 in edm::SerialTaskQueue::QueuedTask<edm::eventsetup::ESSourceDataProxyBase::prefetchAsyncImplTemplate::execute() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#25 0x00007fa60c003075 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so
#26 0x00007fa60a550b8c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7fa54305d900, this=0x7fa606bda880) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#27 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=, this=0x7fa606bda880) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#28 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/task_dispatcher.cpp:168
#29 0x00007fa60bd7a9d8 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#30 0x00007fa60bd8582b in edm::EventProcessor::runToCompletion() () from /opt/offline/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_4/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#31 0x000000000040a266 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#32 0x00007fa60a53f15b in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_1-slc7_amd64_gcc10/build/CMSSW_12_3_1-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-d3ee2fc4dbf589032bbf635c7b35f820/tbb-v2021.4.0/src/tbb/arena.cpp:698
#33 0x000000000040b094 in main::{lambda()#1}::operator()() const ()
#34 0x000000000040971c in main ()

Current Modules:

Module: SiPixelRecHitSoAFromCUDA:hltSiPixelRecHitsSoAFromGPU (crashed)
Module: SiPixelRecHitSoAFromCUDA:hltSiPixelRecHitsSoAFromGPU
Module: PixelTrackSoAFromCUDA:hltPixelTracksFromGPU
Module: none

@fwyzard
Copy link
Contributor

fwyzard commented Jun 1, 2022

@missirol I think your patch would leave uninitialised data in hmsp, which later gets put in the event for the hostPutToken_ collection.

@fwyzard
Copy link
Contributor

fwyzard commented Jun 1, 2022

the collection that goes to hitsPutTokenCPU_ has an internal protection for the case with 0 hits

@fwyzard
Copy link
Contributor

fwyzard commented Jun 1, 2022

OK, I admit I'm lost about whether std::make_unique<uint32_t[]>(size) will initialise the elements (to 0) or not :-/

If it does, the proposed fix is fine.
If it doesn't, we can initialise it explicitly:

diff --git a/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc b/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc
index fda418320e7..2af4588f92f 100644
--- a/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc
+++ b/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromCUDA.cc
@@ -82,7 +82,11 @@ void SiPixelRecHitSoAFromCUDA::acquire(edm::Event const& iEvent,
 
 void SiPixelRecHitSoAFromCUDA::produce(edm::Event& iEvent, edm::EventSetup const& es) {
   auto hmsp = std::make_unique<uint32_t[]>(nMaxModules_ + 1);
-  std::copy(hitsModuleStart_.get(), hitsModuleStart_.get() + nMaxModules_ + 1, hmsp.get());
+  if (nHits_ > 0) {
+    std::copy(hitsModuleStart_.get(), hitsModuleStart_.get() + nMaxModules_ + 1, hmsp.get());
+  } else {
+    std::fill(hmsp.get(), hmsp.get() + nMaxModules_ + 1, 0u);
+  }
 
   iEvent.emplace(hostPutToken_, std::move(hmsp));
   iEvent.emplace(hitsPutTokenCPU_, store32_.get(), store16_.get(), hitsModuleStart_.get(), nHits_);

@AdrianoDee
Copy link
Contributor Author

Seems to me it does initialize to 0 and so I've opened #38159, #38160, #38161 with @missirol fix

@missirol
Copy link
Contributor

missirol commented Jun 1, 2022

Thanks for following up, @AdrianoDee !

@fwyzard , thanks for the comments.

the collection that goes to hitsPutTokenCPU_ has an internal protection for the case with 0 hits

Interesting, I think I see it now here.

OK, I admit I'm lost about whether std::make_unique<uint32_t[]>(size) will initialise the elements (to 0) or not :-/

I was not sure either, all I did was to test and see no crash. :)

@makortel
Copy link
Contributor

makortel commented Jun 1, 2022

OK, I admit I'm lost about whether std::make_unique<uint32_t[]>(size) will initialise the elements (to 0) or not :-/

AFAICT it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants