Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably thread related crashes in aarch64 IBs #31123

Closed
Dr15Jones opened this issue Aug 12, 2020 · 118 comments
Closed

Probably thread related crashes in aarch64 IBs #31123

Dr15Jones opened this issue Aug 12, 2020 · 118 comments

Comments

@Dr15Jones
Copy link
Contributor

After switching to run the IB RelVals using multiple threads, we are seeing 'random' crashes in the aarch64 builds.

@cmsbuild
Copy link
Contributor

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

One such crash is
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/250207.18_NuGun_UP18+NuGun_UP18INPUT+DIGIPRMXUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25/step2_NuGun_UP18+NuGun_UP18INPUT+DIGIPRMXUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25.log#/

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sun Aug  9 16:23:54 CEST 2020
Thread 12 (Thread 0xffff344f8460 (LWP 119100)):
#2  0x0000ffff978643b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff99ccedd0 in free@plt () from /lib64/libc.so.6
#5  0x0000ffff99cf7a60 in vfprintf () from /lib64/libc.so.6
#6  0x0000ffff99d2431c in vsnprintf () from /lib64/libc.so.6
#7  0x0000ffff9a048014 in std::__convert_from_v (__cloc=@0xffff344f6e38: 0xffff99e2f7a0 <_nl_C_locobj>, __out=__out@entry=0xffff344f6d80 "0.00345574O4\377\377", __size=__size@entry=45, __fmt=__fmt@entry=0xffff344f6e40 "%.*g") at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/aarch64-unknown-linux-gnu/bits/c++locale.h:92
#8  0x0000ffff9a072498 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double> (this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>, __s=..., __io=..., __fill=32 ' ', __mod=<optimized out>, __v=0.0034557399339973927) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ios_base.h:622
#9  0x0000ffff9a07cb98 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put (__v=0.0034557399339973927, __fill=<optimized out>, __io=..., __s=..., this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_facets.h:2437
#10 std::ostream::_M_insert<double> (this=0xffff344f7568, __v=0.0034557399339973927) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:73
#11 0x0000ffff354848c4 in RPCSimSetUp::setRPCSetUp(std::vector<RPCStripNoises::NoiseItem, std::allocator<RPCStripNoises::NoiseItem> > const&, std::vector<RPCClusterSize::ClusterSizeItem, std::allocator<RPCClusterSize::ClusterSizeItem> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
#12 0x0000ffff35467d1c in RPCDigiProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
[cut]
Thread 11 (Thread 0xffff34f08460 (LWP 119099)):
#2  0x0000ffff978643b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff99cfd3a8 in __printf_fp_l () from /lib64/libc.so.6
#5  0x0000ffff99cfc998 in vfprintf () from /lib64/libc.so.6
#6  0x0000ffff99d2431c in vsnprintf () from /lib64/libc.so.6
#7  0x0000ffff9a048014 in std::__convert_from_v (__cloc=@0xffff34f06e38: 0xffff99e2f7a0 <_nl_C_locobj>, __out=__out@entry=0xffff34f06d80 "", __size=__size@entry=45, __fmt=__fmt@entry=0xffff34f06e40 "%.*g") at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/aarch64-unknown-linux-gnu/bits/c++locale.h:92
#8  0x0000ffff9a072498 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double> (this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>, __s=..., __io=..., __fill=32 ' ', __mod=<optimized out>, __v=0.15190799534320831) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ios_base.h:622
#9  0x0000ffff9a07cb98 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put (__v=0.15190799534320831, __fill=<optimized out>, __io=..., __s=..., this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_facets.h:2437
#10 std::ostream::_M_insert<double> (this=0xffff34f070d0, __v=0.15190799534320831) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:73
#11 0x0000ffff354846dc in RPCSimSetUp::setRPCSetUp(std::vector<RPCStripNoises::NoiseItem, std::allocator<RPCStripNoises::NoiseItem> > const&, std::vector<RPCClusterSize::ClusterSizeItem, std::allocator<RPCClusterSize::ClusterSizeItem> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
#12 0x0000ffff35467d1c in RPCDigiProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
[cut]
Thread 10 (Thread 0xffff3bc78460 (LWP 62581)):
#2  0x0000ffff978643b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff99ccedd0 in free@plt () from /lib64/libc.so.6
#5  0x0000ffff99cf7a60 in vfprintf () from /lib64/libc.so.6
#6  0x0000ffff99d2431c in vsnprintf () from /lib64/libc.so.6
#7  0x0000ffff9a048014 in std::__convert_from_v (__cloc=@0xffff3bc76e38: 0xffff99e2f7a0 <_nl_C_locobj>, __out=__out@entry=0xffff3bc76d80 "0.0249435q\307;\377\377", __size=__size@entry=45, __fmt=__fmt@entry=0xffff3bc76e40 "%.*g") at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/aarch64-unknown-linux-gnu/bits/c++locale.h:92
#8  0x0000ffff9a072498 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double> (this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>, __s=..., __io=..., __fill=32 ' ', __mod=<optimized out>, __v=0.024943500757217407) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ios_base.h:622
#9  0x0000ffff9a07cb98 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put (__v=0.024943500757217407, __fill=<optimized out>, __io=..., __s=..., this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_facets.h:2437
#10 std::ostream::_M_insert<double> (this=0xffff3bc770d0, __v=0.024943500757217407) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:73
#11 0x0000ffff354846dc in RPCSimSetUp::setRPCSetUp(std::vector<RPCStripNoises::NoiseItem, std::allocator<RPCStripNoises::NoiseItem> > const&, std::vector<RPCClusterSize::ClusterSizeItem, std::allocator<RPCClusterSize::ClusterSizeItem> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
#12 0x0000ffff35467d1c in RPCDigiProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
[cut]
Thread 1 (Thread 0xffff995d0000 (LWP 52173)):
#3  0x0000ffff978661cc in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000ffff99d2ac94 in _IO_str_init_static_internal () from /lib64/libc.so.6
#6  0x0000ffff99d242f8 in vsnprintf () from /lib64/libc.so.6
#7  0x0000ffff9a048014 in std::__convert_from_v (__cloc=@0xffffe9fc94b8: 0xffff99e2f7a0 <_nl_C_locobj>, __out=__out@entry=0xffffe9fc9400 "", __size=__size@entry=45, __fmt=__fmt@entry=0xffffe9fc94c0 "%.*g") at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/aarch64-unknown-linux-gnu/bits/c++locale.h:92
#8  0x0000ffff9a072498 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double> (this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>, __s=..., __io=..., __fill=32 ' ', __mod=<optimized out>, __v=0.032316599041223526) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ios_base.h:622
#9  0x0000ffff9a07cb98 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put (__v=0.032316599041223526, __fill=<optimized out>, __io=..., __s=..., this=0xffff9a0fa510 <(anonymous namespace)::num_put_c>) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_facets.h:2437
#10 std::ostream::_M_insert<double> (this=0xffffe9fc9be8, __v=0.032316599041223526) at /home/cmsbld/jenkins_a/workspace/auto-builds/CMSSW_11_1_0_pre7-slc7_aarch64_gcc820/build/CMSSW_11_1_0_pre7-build/BUILD/slc7_aarch64_gcc820/external/gcc/8.4.0/gcc-8.4.0/obj/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:73
#11 0x0000ffff3548450c in RPCSimSetUp::setRPCSetUp(std::vector<RPCStripNoises::NoiseItem, std::allocator<RPCStripNoises::NoiseItem> > const&, std::vector<RPCClusterSize::ClusterSizeItem, std::allocator<RPCClusterSize::ClusterSizeItem> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
#12 0x0000ffff35467d1c in RPCDigiProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonRPCDigitizer.so
[cut]
Current Modules:

Module: RPCDigiProducer:simMuonRPCDigis (crashed)
Module: RPCDigiProducer:simMuonRPCDigis
Module: none
Module: RPCDigiProducer:simMuonRPCDigis
Module: RPCDigiProducer:simMuonRPCDigis

A fatal system signal has occurred: segmentation violation

This one is failing while trying to write a numeric value to an ostream. This std implementation is calling the underlying vsnprintf which is where the failure occurs.

@Dr15Jones
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor Author

The routine RPCSimSetUp::setRPCSetUp does a tremendous amount of output formatting which is then never seen because the resulting string is passed to LogDebug. See

std::stringstream sslogclsitem;

@Dr15Jones
Copy link
Contributor Author

Here is another
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/250202.3_TTbar_13+TTbar_13INPUT+PREMIXUP15_PU25+DIGIPRMXLOCALUP15APVSimu_PU25+RECOPRMXUP15_PU25+HARVESTUP15_PU25/step4_TTbar_13+TTbar_13INPUT+PREMIXUP15_PU25+DIGIPRMXLOCALUP15APVSimu_PU25+RECOPRMXUP15_PU25+HARVESTUP15_PU25.log

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sun Aug  9 17:42:11 CEST 2020
Thread 5 (Thread 0xffff35228460 (LWP 227755)):
#2  0x0000ffff93df43b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff206ba478 in MuonAssociatorByHitsHelper::getShared(std::vector<std::unique_ptr<std::pair<unsigned int, std::vector<std::pair<unsigned int, EncodedEventId>, std::allocator<std::pair<unsigned int, EncodedEventId> > > >, std::default_delete<std::pair<unsigned int, std::vector<std::pair<unsigned int, EncodedEventId>, std::allocator<std::pair<unsigned int, EncodedEventId> > > > > >, std::allocator<std::unique_ptr<std::pair<unsigned int, std::vector<std::pair<unsigned int, EncodedEventId>, std::allocator<std::pair<unsigned int, EncodedEventId> > > >, std::default_delete<std::pair<unsigned int, std::vector<std::pair<unsigned int, EncodedEventId>, std::allocator<std::pair<unsigned int, EncodedEventId> > > > > > > >&, __gnu_cxx::__normal_iterator<TrackingParticle const*, std::vector<TrackingParticle, std::allocator<TrackingParticle> > >) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimMuonMCTruth.so
#5  0x0000ffff206c0e78 in MuonAssociatorByHitsHelper::associateSimToRecoIndices(std::vector<std::pair<__gnu_cxx::__normal_iterator<TrackingRecHit* const*, std::vector<TrackingRecHit*, std::allocator<TrackingRecHit*> > >, __gnu_cxx::__normal_iterator<TrackingRecHit* const*, std::vector<TrackingRecHit*, std::allocator<TrackingRecHit*> > > >, std::allocator<std::pair<__gnu_cxx::__normal_iterator<TrackingRecHit* const*, std::vector<TrackingRecHit*, std::allocator<TrackingRecHit*> > >, __gnu_cxx::__normal_iterator<TrackingRecHit* const*, std::vector<TrackingRecHit*, std::allocator<TrackingRecHit*> > > > > > const&, edm::RefVector<std::vector<TrackingParticle, std::allocator<TrackingParticle> >, TrackingParticle, edm::refhelper::FindUsingAdvance<std::vector<TrackingParticle, std::allocator<TrackingParticle> >, TrackingParticle> > const&, MuonAssociatorByHitsHelper::Resources const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimMuonMCTruth.so
#6  0x0000ffff206af2e4 in MuonAssociatorByHits::associateSimToReco(edm::RefToBaseVector<reco::Track> const&, edm::RefVector<std::vector<TrackingParticle, std::allocator<TrackingParticle> >, TrackingParticle, edm::refhelper::FindUsingAdvance<std::vector<TrackingParticle, std::allocator<TrackingParticle> >, TrackingParticle> > const&, edm::Event const*, edm::EventSetup const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimMuonMCTruth.so
#7  0x0000ffff206b9a4c in MuonAssociatorByHits::associateSimToReco(edm::Handle<edm::View<reco::Track> >&, edm::Handle<std::vector<TrackingParticle, std::allocator<TrackingParticle> > >&, edm::Event const*, edm::EventSetup const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimMuonMCTruth.so
#8  0x0000ffff205840cc in MuonAssociatorEDProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimMuonMCTruthPlugins.so
[cut]
Thread 4 (Thread 0xffff35c38460 (LWP 227754)):
#0  0x0000ffff99264e24 in poll () from /lib64/libc.so.6
#1  0x0000ffff93df4a6c in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x0000ffff93df51cc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  0x0000ffff93df61cc in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000ffff200000b8 in ?? ()
#6  0x0000ffff35c37380 in ?? ()
#7  0x000c001200033160 in ?? ()
Thread 3 (Thread 0xffff36648460 (LWP 227753)):
#2  0x0000ffff93df43b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff99218e6c in _wordcopy_fwd_aligned () from /lib64/libc.so.6
#5  0x0000ffff99218d94 in memcpy () from /lib64/libc.so.6
#6  0x00000000004163e8 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) ()
#7  0x0000ffff9b2b6c30 in edm::Event::commit_aux(std::vector<edm::propagate_const<std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> > >, std::allocator<edm::propagate_const<std::unique_ptr<edm::WrapperBase, std::default_delete<edm::WrapperBase> > > > >&, edm::Hash<5>*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#8  0x0000ffff9b2b6f0c in edm::Event::commit_(std::vector<unsigned int, std::allocator<unsigned int> > const&, edm::Hash<5>*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#9  0x0000ffff9b3f8004 in edm::stream::EDFilterAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetupImpl const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
[cut]
Thread 1 (Thread 0xffff98ab0000 (LWP 226821)):
#2  0x0000ffff93df43b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff3de26ee0 in Chi2MeasurementEstimator::estimate(TrajectoryStateOnSurface const&, TrackingRecHit const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libTrackingToolsKalmanUpdators.so
#5  0x0000ffff3df1d9d4 in TkStripMeasurementDet::simpleRecHits(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, std::vector<SiStripRecHit2D, std::allocator<SiStripRecHit2D> >&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerMeasurementDetPlugins.so
#6  0x0000ffff3df0f048 in void TkGluedMeasurementDet::doubleMatch<TkGluedMeasurementDet::HitCollectorForFastMeasurements>(TrajectoryStateOnSurface const&, MeasurementTrackerEvent const&, TkGluedMeasurementDet::HitCollectorForFastMeasurements&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerMeasurementDetPlugins.so
#7  0x0000ffff3df0bd2c in TkGluedMeasurementDet::measurements(TrajectoryStateOnSurface const&, MeasurementEstimator const&, MeasurementTrackerEvent const&, tracking::TempMeasurements&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerMeasurementDetPlugins.so
#8  0x0000ffff3ddd6548 in LayerMeasurements::groupedMeasurements(DetLayer const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libTrackingToolsMeasurementDet.so
#9  0x0000fffdf15a1634 in TrajectorySegmentBuilder::segments(TrajectoryStateOnSurface) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerCkfPatternPlugins.so
#10 0x0000fffdf158ddd0 in GroupedCkfTrajectoryBuilder::advanceOneLayer(TrajectorySeed const&, TempTrajectory&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerCkfPatternPlugins.so
#11 0x0000fffdf158e8dc in GroupedCkfTrajectoryBuilder::groupedLimitedCandidates(TrajectorySeed const&, TempTrajectory const&, TrajectoryFilter const*, Propagator const*, bool, std::vector<TempTrajectory, std::allocator<TempTrajectory> >&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerCkfPatternPlugins.so
#12 0x0000fffdf158f128 in GroupedCkfTrajectoryBuilder::buildTrajectories(TrajectorySeed const&, std::vector<Trajectory, std::allocator<Trajectory> >&, unsigned int&, TrajectoryFilter const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginRecoTrackerCkfPatternPlugins.so
#13 0x0000fffdf2589464 in cms::CkfTrackCandidateMakerBase::produceBase(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libRecoTrackerCkfPattern.so
[cut]
Current Modules:

Module: Type0PFMETcorrInputProducer:patPFMetT0Corr (crashed)
Module: CkfTrackCandidateMaker:convTrackCandidates
Module: none
Module: none
Module: none

The crash happened on thread 4 which has a 'corrupted' stack trace. I saw other RelVals which also had crashes where the thread that crashed had a 'corrupted' stack trace.

@Dr15Jones
Copy link
Contributor Author

Another RelVal with a corrupted stack is
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/250408.17_QCD_FlatPt_15_3000HS_13+FS_QCD_FlatPt_15_3000HS_13_PRMXUP17_PU50+HARVESTUP17FS+MINIAODMCUP17FS/step1_QCD_FlatPt_15_3000HS_13+FS_QCD_FlatPt_15_3000HS_13_PRMXUP17_PU50+HARVESTUP17FS+MINIAODMCUP17FS.log

which reports the following modules being run at the time of the crash

Module: PFProducer:particleFlowTmp (crashed)
Module: GsfElectronProducer:gedGsfElectronsTmp
Module: PFBlockProducer:particleFlowBlock
Module: none
Module: none

with the stack being

Thread 8 (Thread 0xfffee4e48460 (LWP 171263)):
#0  0x0000ffff8a5f4e24 in poll () from /lib64/libc.so.6
#1  0x0000ffff857d4a6c in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x0000ffff857d51cc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  0x0000ffff857d61cc in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000fffee00000ac in ?? ()
#6  0x0000fffee4e46520 in ?? ()
#7  0x0000fffee4e465f0 in ?? ()
#8  0x4019780aea94a1a7 in ?? ()

@Dr15Jones
Copy link
Contributor Author

Another corrupted stack is from
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/27434.0_TTbar_14TeV+2026D58+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal/step3_TTbar_14TeV+2026D58+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal.log#/158-158

with running modules

Module: PFProducer:particleFlowTmpBarrel (crashed)
Module: TrackstersProducer:ticlTrackstersTrk
Module: none
Module: TrackstersProducer:ticlTrackstersHFNoseMIP
Module: none

Notice that the problem happens again in PFProducer.

@Dr15Jones
Copy link
Contributor Author

Another corrupted stack is
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/25400.0_ZEE_13+FS_ZEE_13_UP15_PU25+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+FS_ZEE_13_UP15_PU25+HARVESTUP15FS+MINIAODMCUP15FS.log#/

with running modules

Module: PFProducer:particleFlowTmp (crashed)
Module: none
Module: TrackAssociatorEDProducer:trackingParticleRecoTrackAsssociation
Module: TrackAssociatorEDProducer:trackingParticleRecoTrackAsssociation
Module: none

@Dr15Jones
Copy link
Contributor Author

Here is a crash within ROOT
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/23434.99_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU/step2_TTbar_14TeV+2026D49PU_PMXS1S2+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14INPUT+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU.log

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sun Aug  9 16:35:29 CEST 2020
Thread 14 (Thread 0xffff2f868460 (LWP 82656)):
#2  0x0000ffff929543b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff94ded18c in __exp_finite () from /lib64/libm.so.6
#5  0x0000ffff94dfc5dc in exp () from /lib64/libm.so.6
#6  0x0000ffff92289628 in CLHEP::RandPoissonQ::poissonDeviateSmall (mean=1.5101454679374475, e=0xffff51def5d0) at /home/cmsbld/jenkins_b/workspace/auto-builds/CMSSW_11_2_0_pre1-slc7_aarch64_gcc820/build/CMSSW_11_2_0_pre1-build/BUILD/slc7_aarch64_gcc820/external/clhep/2.4.1.3-ghbfee/clhep-2.4.1.3/Random/src/RandPoissonQ.cc:299
#7  CLHEP::RandPoissonQ::poissonDeviateSmall (e=0xffff51def5d0, mean=1.5101454679374475) at /home/cmsbld/jenkins_b/workspace/auto-builds/CMSSW_11_2_0_pre1-slc7_aarch64_gcc820/build/CMSSW_11_2_0_pre1-build/BUILD/slc7_aarch64_gcc820/external/clhep/2.4.1.3-ghbfee/clhep-2.4.1.3/Random/src/RandPoissonQ.cc:257
#8  0x0000ffff319cd3ac in EcalHitResponse::analogSignalAmplitude(DetId const&, double, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimCalorimetryEcalSimAlgos.so
#9  0x0000ffff319cd514 in EcalHitResponse::putAnalogSignal(PCaloHit const&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimCalorimetryEcalSimAlgos.so
#10 0x0000ffff319cb4e0 in EcalTDigitizer<EBDigitizerTraits>::add(std::vector<PCaloHit, std::allocator<PCaloHit> > const&, int, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimCalorimetryEcalSimAlgos.so
#11 0x0000ffff31a0ef28 in EcalDigiProducer::accumulateCaloHits(edm::Handle<std::vector<PCaloHit, std::allocator<PCaloHit> > > const&, edm::Handle<std::vector<PCaloHit, std::allocator<PCaloHit> > > const&, edm::Handle<std::vector<PCaloHit, std::allocator<PCaloHit> > > const&, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimCalorimetryEcalSimProducers.so
#12 0x0000ffff31a1191c in EcalDigiProducer::accumulate(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimCalorimetryEcalSimProducers.so
#13 0x0000ffff363fa590 in edm::MixingModule::accumulateEvent(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#14 0x0000ffff363fa6dc in edm::MixingModule::pileAllWorkers(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#15 0x0000ffff36404798 in void edm::PileUp::readPileUp<std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)> >(edm::EventID const&, std::vector<edm::SecondaryEventIDAndFileInfo, std::allocator<edm::SecondaryEventIDAndFileInfo> >&, std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)>, int, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#16 0x0000ffff363fba18 in edm::MixingModule::doPileUp(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#17 0x0000ffff36341bc8 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libMixingBase.so
[cut]
Thread 10 (Thread 0xffff32ef8460 (LWP 61441)):
#2  0x0000ffff929543b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff3dcceac0 in void std::vector<HepMC::GenParticle*, std::allocator<HepMC::GenParticle*> >::_M_realloc_insert<HepMC::GenParticle* const&>(__gnu_cxx::__normal_iterator<HepMC::GenParticle**, std::vector<HepMC::GenParticle*, std::allocator<HepMC::GenParticle*> > >, HepMC::GenParticle* const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimDataFormatsGeneratorProducts.so
#5  0x0000ffff3dcceab8 in hepmc_rootio::add_to_particles_in(HepMC::GenVertex*, HepMC::GenParticle*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libSimDataFormatsGeneratorProducts.so
#6  0x0000ffff95f17004 in int TStreamerInfo::ReadBufferArtificial<char**>(TBuffer&, char** const&, TStreamerElement*, int, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libRIO.so
#7  0x0000ffff95fce9c0 in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libRIO.so
[cut]
#191 0x0000ffff964dd768 in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libTree.so
#192 0x0000ffff362cf1bc in edm::RootTree::getEntry(TBranch*, long long) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginIOPoolInput.so
#193 0x0000ffff362a7728 in edm::RootDelayedReader::getProduct_(edm::BranchID const&, edm::EDProductGetter const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginIOPoolInput.so
#194 0x0000ffff96cf4744 in edm::DelayedReader::getProduct(edm::BranchID const&, edm::EDProductGetter const*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#195 0x0000ffff96daea38 in edm::InputProductResolver::resolveProduct_(edm::Principal const&, bool, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#196 0x0000ffff96d9df2c in edm::Principal::findProductByLabel(edm::KindOfType, edm::TypeID const&, edm::InputTag const&, edm::EDConsumerBase const*, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#197 0x0000ffff96d9e148 in edm::Principal::getByLabel(edm::KindOfType, edm::TypeID const&, edm::InputTag const&, edm::EDConsumerBase const*, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#198 0x0000ffff3b17d490 in cms::PileupVertexAccumulator::accumulate(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralPileupInformationPlugins.so
#199 0x0000ffff363fa590 in edm::MixingModule::accumulateEvent(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#200 0x0000ffff363fa6dc in edm::MixingModule::pileAllWorkers(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#201 0x0000ffff36404798 in void edm::PileUp::readPileUp<std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)> >(edm::EventID const&, std::vector<edm::SecondaryEventIDAndFileInfo, std::allocator<edm::SecondaryEventIDAndFileInfo> >&, std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)>, int, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#202 0x0000ffff363fba18 in edm::MixingModule::doPileUp(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#203 0x0000ffff36341bc8 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libMixingBase.so
[cut]
Thread 9 (Thread 0xffff33908460 (LWP 61439)):
#0  0x0000ffff94c8a9c4 in nanosleep () from /lib64/libc.so.6
#1  0x0000ffff94c8a678 in sleep () from /lib64/libc.so.6
#2  0x0000ffff929543b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffff968bded4 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, unsigned long>, std::_Select1st<std::pair<unsigned int const, unsigned long> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned long> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned int const, unsigned long> >*)@plt () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreMessageLogger.so
#5  0x0000ffff968cc8e8 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, unsigned long>, std::_Select1st<std::pair<unsigned int const, unsigned long> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned long> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned int const, unsigned long> >*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreMessageLogger.so
[cut]
#18 0x0000ffff3175c1f4 in void TrackingTruthAccumulator::accumulateEvent<PileUpEventPrincipal>(PileUpEventPrincipal const&, edm::EventSetup const&, edm::Handle<edm::HepMCProduct> const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralTrackingAnalysisPlugins.so
#19 0x0000ffff31755a20 in TrackingTruthAccumulator::accumulate(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralTrackingAnalysisPlugins.so
#20 0x0000ffff363fa590 in edm::MixingModule::accumulateEvent(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#21 0x0000ffff363fa6dc in edm::MixingModule::pileAllWorkers(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#22 0x0000ffff36404798 in void edm::PileUp::readPileUp<std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)> >(edm::EventID const&, std::vector<edm::SecondaryEventIDAndFileInfo, std::allocator<edm::SecondaryEventIDAndFileInfo> >&, std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)>, int, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#23 0x0000ffff363fba18 in edm::MixingModule::doPileUp(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#24 0x0000ffff36341bc8 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libMixingBase.so
[cut]
Thread 1 (Thread 0xffff94500000 (LWP 202701)):
#0  0x0000ffff94cb4e24 in poll () from /lib64/libc.so.6
#1  0x0000ffff92954a6c in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x0000ffff929551cc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  0x0000ffff929561cc in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000ffff962d6720 in TThread::SelfId()@plt () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libThread.so
#6  0x0000ffff962ee7c0 in TThread::Tsd(void*, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libThread.so
#7  0x0000ffff95938d18 in TClass::GetCollectionProxy() const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libCore.so
#8  0x0000ffff95ed96c0 in int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libRIO.so
#9  0x0000ffff95da63e4 in TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libRIO.so
[cut]
/cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libTree.so
#138 0x0000ffff964dd658 in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libTree.so
#139 0x0000ffff964dd768 in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/external/slc7_aarch64_gcc820/lib/libTree.so
#140 0x0000ffff362cf1bc in edm::RootTree::getEntry(TBranch*, long long) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginIOPoolInput.so
#141 0x0000ffff362a7728 in edm::RootDelayedReader::getProduct_(edm::BranchID const&, edm::EDProductGetter const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginIOPoolInput.so
#142 0x0000ffff96cf4744 in edm::DelayedReader::getProduct(edm::BranchID const&, edm::EDProductGetter const*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#143 0x0000ffff96daea38 in edm::InputProductResolver::resolveProduct_(edm::Principal const&, bool, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#144 0x0000ffff96d9df2c in edm::Principal::findProductByLabel(edm::KindOfType, edm::TypeID const&, edm::InputTag const&, edm::EDConsumerBase const*, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#145 0x0000ffff96d9e148 in edm::Principal::getByLabel(edm::KindOfType, edm::TypeID const&, edm::InputTag const&, edm::EDConsumerBase const*, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#146 0x0000ffff3b17d490 in cms::PileupVertexAccumulator::accumulate(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralPileupInformationPlugins.so
#147 0x0000ffff363fa590 in edm::MixingModule::accumulateEvent(PileUpEventPrincipal const&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#148 0x0000ffff363fa6dc in edm::MixingModule::pileAllWorkers(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#149 0x0000ffff36404798 in void edm::PileUp::readPileUp<std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)> >(edm::EventID const&, std::vector<edm::SecondaryEventIDAndFileInfo, std::allocator<edm::SecondaryEventIDAndFileInfo> >&, std::_Bind<bool (edm::MixingModule::*(std::reference_wrapper<edm::MixingModule>, std::_Placeholder<1>, edm::ModuleCallingContext const*, int, std::_Placeholder<2>, int, std::reference_wrapper<edm::EventSetup const>, edm::StreamID))(edm::EventPrincipal const&, edm::ModuleCallingContext const*, int, int, int&, edm::EventSetup const&, edm::StreamID const&)>, int, edm::StreamID const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#150 0x0000ffff363fba18 in edm::MixingModule::doPileUp(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginSimGeneralMixingModulePlugins.so
#151 0x0000ffff36341bc8 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libMixingBase.so

Current Modules:

Module: MixingModule:mix (crashed)
Module: MixingModule:mix
Module: none
Module: none
Module: MixingModule:mix

Here we have some incredibly deep stacks (because of ROOT IO) and the crash is ROOT's thread local handling.

@Dr15Jones
Copy link
Contributor Author

This crash
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/10059.0_QCD_Pt_3000_3500_13+2017+QCD_Pt_3000_3500_13TeV_TuneCUETP8M1_GenSimINPUT+Digi+Reco+HARVEST+ALCA+Nano/step3_QCD_Pt_3000_3500_13+2017+QCD_Pt_3000_3500_13TeV_TuneCUETP8M1_GenSimINPUT+Digi+Reco+HARVEST+ALCA+Nano.log#/

did not generate a trace back but the running modules were

Module: Type0PFMETcorrInputProducer:patPFMetT0Corr (crashed)
Module: RecoTauProducer:combinatoricRecoTausBoosted
Module: none
Module: LowPtGsfElectronSeedProducer:lowPtGsfElectronSeeds
Module: none

Again we see a crash happening in Type0PFMETcorrInputProducer.

@Dr15Jones
Copy link
Contributor Author

Here in https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/2018.7_H125GGgluonfusion_13_UP18+H125GGgluonfusionFS_13_UP18+HARVESTUP18FS+MINIAODMCUP18FS/step1_H125GGgluonfusion_13_UP18+H125GGgluonfusionFS_13_UP18+HARVESTUP18FS+MINIAODMCUP18FS.log#/

we have another corrupted stack. This time the only module reported running is

Module: PFProducer:particleFlowTmp (crashed)

although the stack traces for the threads do show 3 other modules running.

@Dr15Jones
Copy link
Contributor Author

Here in https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/136.898_RunParkingBPH2018B+RunParkingBPH2018B+HLTDR2_2018+RECODR2_2018reHLT_skimParkingBPH_Offline+HARVEST2018/step3_RunParkingBPH2018B+RunParkingBPH2018B+HLTDR2_2018+RECODR2_2018reHLT_skimParkingBPH_Offline+HARVEST2018.log#/

we have another corrupted stack with modules running:

``
Module: PFProducer:particleFlowTmp (crashed)
Module: TrackingRecoMaterialAnalyser:materialDumperAnalyzer
Module: TrackingMonitor:TrackerCollisionSelectedTrackMonCommonhighPurityPtRange0to1
Module: none
Module: TrackingMonitor:TrackerCollisionSelectedTrackMonCommonhighPurityPtRange0to1

@Dr15Jones
Copy link
Contributor Author

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/136.852_RunJetHT2018A+RunJetHT2018A+HLTDR2_2018+RECODR2_2018reHLT_skimJetHT_Offline+HARVEST2018/step3_RunJetHT2018A+RunJetHT2018A+HLTDR2_2018+RECODR2_2018reHLT_skimJetHT_Offline+HARVEST2018.log#/

didn't have a traceback (it says it timed out) and shows running modules as

Module: ShiftedParticleProducer:shiftedPatTauEnUp (crashed)
Module: RecoTauProducer:pfTausCombiner
Module: none
Module: none
Module: none

@Dr15Jones
Copy link
Contributor Author

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/136.826_RunMuOnia2017E+RunMuOnia2017E+HLTDR2_2017+RECODR2_2017reHLT_skimMuOnia_Prompt+HARVEST2017/step3_RunMuOnia2017E+RunMuOnia2017E+HLTDR2_2017+RECODR2_2017reHLT_skimMuOnia_Prompt+HARVEST2017.log#/ has a corrupted stack trace with crash in

Module: Type0PFMETcorrInputProducer:patPFMetT0Corr (crashed)
Module: BTagPerformanceAnalyzerOnData:bTagAnalysis
Module: MuonIdProducer:muons1stStep
Module: none
Module: PATTauProducer:patTaus

@Dr15Jones
Copy link
Contributor Author

The crashes almost invariable happen during the first 4 events so are most likely a 1st time called related problem.

@Dr15Jones
Copy link
Contributor Author

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/136.738_RunDoubleMuon2016C+RunDoubleMuon2016C+HLTDR2_2016+RECODR2_2016reHLT_HIPM+HARVESTDR2/step2_RunDoubleMuon2016C+RunDoubleMuon2016C+HLTDR2_2016+RECODR2_2016reHLT_HIPM+HARVESTDR2.log#/144-144

has a crash in TBB's internals

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sun Aug  9 16:33:01 CEST 2020
Thread 12 (Thread 0xffff594f8460 (LWP 94166)):
#2  0x0000ffffba0343b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  std::_Rb_tree_insert_and_rebalance (__insert_left=false, __x=0xfffcc0699930, __p=0xfffcc0699900, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:203
#5  0x0000ffffac4dfc70 in std::pair<std::_Rb_tree_iterator<std::pair<DDName const, DDI::rep_type<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >*> >, bool> std::_Rb_tree<DDName, std::pair<DDName const, DDI::rep_type<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >*>, std::_Select1st<std::pair<DDName const, DDI::rep_type<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >*> >, std::less<DDName>, std::allocator<std::pair<DDName const, DDI::rep_type<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >*> > >::_M_emplace_unique<DDName const&, DDI::rep_type<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >*&>(DDName const&, DDI::rep_type<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >*&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDetectorDescriptionCore.so
#6  0x0000ffffac4dfe98 in DDI::Store<DDName, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> >, std::unique_ptr<ROOT::Math::Rotation3D, std::default_delete<ROOT::Math::Rotation3D> > >::create(DDName const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDetectorDescriptionCore.so
#7  0x0000ffffac4de5f8 in DDRotation::DDRotation(DDName const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDetectorDescriptionCore.so
#8  0x0000ffffa455155c in DDLPosPart::processElement(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DDCompactView&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDetectorDescriptionParser.so
#9  0x0000ffffa4558424 in DDLSAX2FileHandler::endElement(unsigned short const*, unsigned short const*, unsigned short const*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDetectorDescriptionParser.so
#10 0x0000ffffab9d2718 in xercesc_3_1::SAX2XMLReaderImpl::endElement (this=0xfffcc05b0788, elemDecl=..., uriId=1, isRoot=false, elemPrefix=0xfffcc0762800) at xercesc/parsers/SAX2XMLReaderImpl.cpp:889
#11 0x0000ffffab97eb80 in xercesc_3_1::IGXMLScanner::scanEndTag (this=this@entry=0xfffcc0641e08, gotData=@0xffff594f6427: true) at ./xercesc/framework/XMLBuffer.hpp:171
#12 0x0000ffffab982e58 in xercesc_3_1::IGXMLScanner::scanContent (this=this@entry=0xfffcc0641e08) at xercesc/internal/IGXMLScanner.cpp:881
#13 0x0000ffffab982fa8 in xercesc_3_1::IGXMLScanner::scanDocument (this=0xfffcc0641e08, src=...) at xercesc/internal/IGXMLScanner.cpp:217
#14 0x0000ffffab9d344c in xercesc_3_1::SAX2XMLReaderImpl::parse (this=0xfffcc05b0788, source=...) at xercesc/parsers/SAX2XMLReaderImpl.cpp:409
#15 0x0000ffffa454cfb0 in DDLParser::parse(std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDetectorDescriptionParser.so
#16 0x0000ffffa472386c in magneticfield::VolumeBasedMagneticFieldESProducerFromDB::produce(IdealMagneticFieldRecord const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginMagneticFieldGeomBuilderPlugins.so
#17 0x0000ffffa472d16c in decltype ({parm#1}()) edm::convertException::wrap<edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const::{lambda()#1}>(edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginMagneticFieldGeomBuilderPlugins.so
#18 0x0000ffffa472d358 in edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginMagneticFieldGeomBuilderPlugins.so
#19 0x0000ffffa472e158 in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}&>(edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginMagneticFieldGeomBuilderPlugins.so
#20 0x0000ffffa472e1fc in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}>(edm::eventsetup::Callback<magneticfield::VolumeBasedMagneticFieldESProducerFromDB, std::unique_ptr<MagneticField, std::default_delete<MagneticField> >, IdealMagneticFieldRecord, edm::eventsetup::CallbackSimpleDecorator<IdealMagneticFieldRecord> >::runProducerAsync(std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginMagneticFieldGeomBuilderPlugins.so
#21 0x0000ffffbc93f648 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0xffffa038fe00, context_guard=..., t=t@entry=0xffffa033c340, isolation=isolation@entry=0) at ../../include/tbb/machine/gcc_generic.h:101
#22 0x0000ffffbc93f89c in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0xffffa038fe00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#23 0x0000ffffbc93ac58 in tbb::interface7::internal::task_arena_base::internal_execute (this=0xffffbe604e60 <edm::esTaskArena()::s_arena>, d=...) at ../../src/tbb/arena.cpp:1105
#24 0x0000ffffbe3e02f0 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*, edm::EventSetupImpl const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#25 0x0000ffffbe44d4bc in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool, edm::EventSetupImpl const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#26 0x0000ffff5c8d0e54 in L1TMuon::GeometryTranslator::checkAndUpdateGeometry(edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuon.so
#27 0x0000ffff5c951218 in EMTFSetup::reload(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonEndCap.so
#28 0x0000ffff5c990cc0 in TrackFinder::process(edm::Event const&, edm::EventSetup const&, std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >&, std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonEndCap.so
#29 0x0000ffff5a5c7330 in L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginL1TriggerL1TMuonEndCapPlugins.so
[cut]
Thread 11 (Thread 0xffff59f08460 (LWP 94165)):
#2  0x0000ffffba0343b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffffbcbc6374 in bitmap_sfu (binfo=<optimized out>, bitmap=<optimized out>) at include/jemalloc/internal/bit_util.h:22
#5  arena_slab_reg_alloc_batch (ptrs=<optimized out>, cnt=25, bin_info=<optimized out>, slab=0xfffca002b940) at src/arena.c:296
#6  je_arena_tcache_fill_small (tsdn=0xffff59f0df60, arena=0xfffca0000c80, tcache=<optimized out>, tbin=0xffff59f0e1c8, binind=3, prof_accumbytes=<optimized out>) at src/arena.c:1402
#7  0x0000ffffbcc077fc in je_tcache_alloc_small_hard (tsdn=tsdn@entry=0xffff59f0df60, arena=arena@entry=0xfffca0000c80, tcache=tcache@entry=0xffff59f0e170, tbin=tbin@entry=0xffff59f0e1c8, binind=<optimized out>, tcache_success=tcache_success@entry=0xffff59f06b28) at src/tcache.c:94
#8  0x0000ffffbcbbcca0 in tcache_alloc_small (slow_path=false, zero=false, binind=<optimized out>, size=<optimized out>, tcache=0xffff59f0e170, arena=0xfffca0000c80, tsd=0xffff59f0df60) at include/jemalloc/internal/tsd.h:228
#9  arena_malloc (slow_path=false, tcache=0xffff59f0e170, zero=false, ind=<optimized out>, size=<optimized out>, arena=0x0, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:165
#10 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0xffff59f0e170, zero=false, ind=<optimized out>, size=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
#11 imalloc_no_sample (ind=<optimized out>, usize=48, size=<optimized out>, tsd=<optimized out>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1949
#12 imalloc_body (tsd=<optimized out>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2149
#13 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2260
#14 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2291
#15 0x0000ffffbcbbd254 in malloc (size=size@entry=40) at src/jemalloc.c:2390
#16 0x0000ffffbcc0be3c in newImpl<false> (size=40) at src/jemalloc_cpp.cpp:77
#17 operator new (size=40) at src/jemalloc_cpp.cpp:87
#18 0x0000ffffbddaddec in std::_Rb_tree_node<std::pair<short const, short> >* std::_Rb_tree<short, std::pair<short const, short>, std::_Select1st<std::pair<short const, short> >, std::less<short>, std::allocator<std::pair<short const, short> > >::_M_clone_node<std::_Rb_tree<short, std::pair<short const, short>, std::_Select1st<std::pair<short const, short> >, std::less<short>, std::allocator<std::pair<short const, short> > >::_Alloc_node>(std::_Rb_tree_node<std::pair<short const, short> > const*, std::_Rb_tree<short, std::pair<short const, short>, std::_Select1st<std::pair<short const, short> >, std::less<short>, std::allocator<std::pair<short const, short> > >::_Alloc_node&) [clone .isra.1496] () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libDataFormatsStdDictionaries.so
[cut]
#25 0x0000ffff5aac8724 in std::vector<std::map<short, short, std::less<short>, std::allocator<std::pair<short const, short> > >, std::allocator<std::map<short, short, std::less<short>, std::allocator<std::pair<short const, short> > > > >::operator=(std::vector<std::map<short, short, std::less<short>, std::allocator<std::pair<short const, short> > >, std::allocator<std::map<short, short, std::less<short>, std::allocator<std::pair<short const, short> > > > > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#26 0x0000ffff5aad46ec in L1MuBMLUTHandler::L1MuBMLUTHandler(L1TMuonBarrelParams const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#27 0x0000ffff5aaca000 in L1MuBMEUX::run(edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#28 0x0000ffff5aad6c10 in L1MuBMSEU::run(edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#29 0x0000ffff5aacf7ec in L1MuBMExtrapolationUnit::run(edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#30 0x0000ffff5aad7818 in L1MuBMSectorProcessor::run(int, edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#31 0x0000ffff5aae7c68 in L1MuBMTrackFinder::run(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonBarrel.so
#32 0x0000ffff5ab7bbbc in L1TMuonBarrelTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginL1TriggerL1TMuonBarrelPlugins.so
[cut]
Thread 10 (Thread 0xffff5e3c8460 (LWP 78072)):
#0  0x0000ffffbc3a4e24 in poll () from /lib64/libc.so.6
#1  0x0000ffffba034a6c in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x0000ffffba0351cc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  0x0000ffffba0361cc in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000ffffbc396908 in sched_yield () from /lib64/libc.so.6
#6  0x0000ffffbc93e9fc in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::receive_or_steal_task (this=0xffffbba3be00, completion_ref_count=@0xffffa03a3828: 2, isolation=0) at ../../src/tbb/mailbox.h:225
#7  0x0000ffffbc93fa20 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0xffffbba3be00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#8  0x0000ffffbc93ac58 in tbb::interface7::internal::task_arena_base::internal_execute (this=0xffffbe604e60 <edm::esTaskArena()::s_arena>, d=...) at ../../src/tbb/arena.cpp:1105
#9  0x0000ffffbe3e02f0 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*, edm::EventSetupImpl const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#10 0x0000ffffbe44d4bc in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool, edm::EventSetupImpl const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#11 0x0000ffff5c8d0e54 in L1TMuon::GeometryTranslator::checkAndUpdateGeometry(edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuon.so
#12 0x0000ffff5c951218 in EMTFSetup::reload(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonEndCap.so
#13 0x0000ffff5c990cc0 in TrackFinder::process(edm::Event const&, edm::EventSetup const&, std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >&, std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonEndCap.so
#14 0x0000ffff5a5c7330 in L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginL1TriggerL1TMuonEndCapPlugins.so
[cut]
Thread 9 (Thread 0xffff5edd8460 (LWP 78070)):
Thread 1 (Thread 0xffffbbbf0000 (LWP 232142)):
#2  0x0000ffffba0343b8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000ffffbc396908 in sched_yield () from /lib64/libc.so.6
#5  0x0000ffffbc93e8d4 in __TBB_Pause () at ../../include/tbb/tbb_machine.h:332
#6  tbb::internal::prolonged_pause () at ../../src/tbb/scheduler_common.h:322
#7  tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::receive_or_steal_task (this=0xffffbbae2600, completion_ref_count=@0xffffa03ce828: 2, isolation=0) at ../../src/tbb/custom_scheduler.h:305
#8  0x0000ffffbc93fa20 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0xffffbbae2600, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#9  0x0000ffffbc93ac58 in tbb::interface7::internal::task_arena_base::internal_execute (this=0xffffbe604e60 <edm::esTaskArena()::s_arena>, d=...) at ../../src/tbb/arena.cpp:1105
#10 0x0000ffffbe3e02f0 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*, edm::EventSetupImpl const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#11 0x0000ffffbe44d4bc in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool, edm::EventSetupImpl const*) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libFWCoreFramework.so
#12 0x0000ffff5c8d0e54 in L1TMuon::GeometryTranslator::checkAndUpdateGeometry(edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuon.so
#13 0x0000ffff5c951218 in EMTFSetup::reload(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonEndCap.so
#14 0x0000ffff5c990cc0 in TrackFinder::process(edm::Event const&, edm::EventSetup const&, std::vector<l1t::EMTFHit, std::allocator<l1t::EMTFHit> >&, std::vector<l1t::EMTFTrack, std::allocator<l1t::EMTFTrack> >&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/libL1TriggerL1TMuonEndCap.so
#15 0x0000ffff5a5c7330 in L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-09-0000/lib/slc7_aarch64_gcc820/pluginL1TriggerL1TMuonEndCapPlugins.so

Current Modules:

Module: L1TMuonEndCapTrackProducer:simEmtfDigis (crashed)
Module: L1TMuonEndCapTrackProducer:simEmtfDigis
Module: L1TMuonEndCapTrackProducer:simEmtfDigis
Module: none
Module: none

A fatal system signal has occurred: segmentation violation

@Dr15Jones
Copy link
Contributor Author

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/136.722_RunDoubleEG2016B+RunDoubleEG2016B+HLTDR2_2016+RECODR2_2016reHLT_skimDoubleEG_HIPM+HARVESTDR2/step3_RunDoubleEG2016B+RunDoubleEG2016B+HLTDR2_2016+RECODR2_2016reHLT_skimDoubleEG_HIPM+HARVESTDR2.log

has no stack trace and shows the running modules as

Module: Type0PFMETcorrInputProducer:patPFMetT0Corr (crashed)
Module: BTagPerformanceAnalyzerOnData:bTagAnalysis
Module: none
Module: none
Module: PFCand_AssoMap:pfCandidateToVertexAssociation

@Dr15Jones
Copy link
Contributor Author

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/134.706_RunMuonEG2015B+RunMuonEG2015B+HLTDR2_50ns+RECODR2_50nsreHLT_HIPM+HARVESTDR2/step3_RunMuonEG2015B+RunMuonEG2015B+HLTDR2_50ns+RECODR2_50nsreHLT_HIPM+HARVESTDR2.log#/

Has no stack trace and shows the running modules as

Module: PFProducer:particleFlowTmp (crashed)
Module: PFDisplacedVertexCandidateProducer:particleFlowDisplacedVertexCandidate
Module: none
Module: TrackingMonitor:TrackerCollisionSelectedTrackMonCommongeneralTracks
Module: none```

@Dr15Jones
Copy link
Contributor Author

Dr15Jones commented Aug 12, 2020

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/129.0_SinglePiPt1+SinglePiPt1+DIGI+RECO/step3_SinglePiPt1+SinglePiPt1+DIGI+RECO.log#/

Seems to be reporting multiple simultaneous crash reports. No stack traces are given

A fatal system signal has occurred: 

A fatal system signal has occurred: segmentation violationsegmentation violation
The following is the call stack containing the origin of the signal.


The following is the call stack containing the origin of the signal.

Mon Aug 10 01:23:11 CEST 2020

Current Modules:


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Mon Aug 10 01:23:12 CEST 2020

Current Modules:

Module: Type0PFMETcorrInputProducer:patPFMetT0Corr (crashed)
Module: Type0PFMETcorrInputProducer:patPFMetT0Corr (crashed)Mon Aug 10 01:23:12 CEST 2020

Module: GlobalRecHitsAnalyzer:globalrechitsanalyze
Module: Type0PFMETcorrInputProducer:patPFMetT0Corr
Module: Type0PFMETcorrInputProducer:patPFMetT0Corr
Module: none

A fatal system signal has occurred: segmentation violation

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/130.0_SinglePiPt10+SinglePiPt10+DIGI+RECO/step3_SinglePiPt10+SinglePiPt10+DIGI+RECO.log#/ seems to have the same sort of behavior.

@Dr15Jones
Copy link
Contributor Author

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-08-09-0000/pyRelValMatrixLogs/run/43.0_ZpMM_2250_8TeV+ZpMM_2250_8TeVINPUT+DIGI+RECO+HARVEST/step3_ZpMM_2250_8TeV+ZpMM_2250_8TeVINPUT+DIGI+RECO+HARVEST.log#/

Doesn't have a traceback and shows the running modules as

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sun Aug  9 22:12:36 CEST 2020

Current Modules:

Module: DeepDoubleXONNXJetTagsProducer:pfMassIndependentDeepDoubleCvBJetTagsSlimmedAK8DeepTags (crashed)
Module: none
Module: DeepDoubleXONNXJetTagsProducer:pfMassIndependentDeepDoubleCvBJetTagsSlimmedAK8DeepTags
Module: DeepDoubleXONNXJetTagsProducer:pfMassIndependentDeepDoubleCvBJetTagsSlimmedAK8DeepTags
Module: none

hahnjo added a commit to hahnjo/root that referenced this issue Apr 8, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.
@dan131riley
Copy link

The 23h00 IB seems to be taking a while for aarch64, but so far there are no TFormula crashes, all the crashes are in onnxruntime.

@smuzaffar
Copy link
Contributor

yes we have issues with one of arm nodes (disk full) that is why relval jobs were crashed. We have restarted the jobs but as we only have arm node now so it will take some time

hahnjo added a commit to root-project/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.
hahnjo added a commit to hahnjo/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
hahnjo added a commit to root-project/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
hahnjo added a commit to hahnjo/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
hahnjo added a commit to hahnjo/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
hahnjo added a commit to root-project/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
hahnjo added a commit to root-project/root that referenced this issue Apr 9, 2021
Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
@hahnjo
Copy link
Contributor

hahnjo commented Apr 9, 2021

The fix is now merged upstream in LLVM and in ROOT master as well as in the branches for 6.24, 6.22, and 6.20.

all the crashes are in onnxruntime.

@dan131riley does this also involve Cling or is this a separate issue?

@dan131riley
Copy link

@hahnjo The aarch64 IBs are still running slow, but it looks like the CMSSW_11_3 2021-04-07-2300 slc7_aarch64_gcc9 IB has finished, and I don't see any Cling-related crashes. There are lots of onnxruntime crashes, those are unrelated to Cling and ROOT, and there's a separate issue for that at #32899.

The Cling crashes were common enough that one IB is enough to convince that the problems have all been resolved and we can close this much-too-long ticket. Thanks!

@makortel
Copy link
Contributor

+1

The TCling issue seems to be resolved with the last fix, so let's close this issue (and open new ones for possible other crashes).

mrodozov pushed a commit to cms-sw/root that referenced this issue Apr 29, 2021
…project#7758)

Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
@slava77
Copy link
Contributor

slava77 commented Jul 22, 2021

+reconstruction

based on #31123 (comment)

let's close this issue

@civanch
Copy link
Contributor

civanch commented Jul 27, 2021

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@qliphy qliphy closed this as completed Jul 28, 2021
pcanal pushed a commit to pcanal/root that referenced this issue Oct 4, 2022
Backport of D27629, commit 18805ea951.

Original commit message:
---
Makes sure that the unwind info uses 64bits pcrel relocation if a large
code model is specified and handle the corresponding relocation in the
ExecutionEngine. This can happen with certain kernel configuration (the
same as the one in https://reviews.llvm.org/D27609, found at least on
the ArchLinux stock kernel and the one used on https://www.packet.net/)
using the builtin JIT memory manager.

Co-authored-by: Yichao Yu <yyc1992@gmail.com>
Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
---

Note: The handling in ExecutionEngine was committed in a different
revision and is already part of LLVM 9. We need the part about emitting
relocations because eh_frame (allocated in a data section) may be more
than 4Gb away from the code section it references. See the discussion
in cms-sw/cmssw#31123 for context.

(cherry picked from commit f481e8f)
pcanal pushed a commit to pcanal/root that referenced this issue Oct 4, 2022
…project#7807)

Backport of D99607, commit 6415f424bc.

Original commit message:
---
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

[...]
---

See the discussion in cms-sw/cmssw#31123
for context on the observed crashes.

(cherry picked from commit 9e104ac)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests