-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV in LogErrorEventFilter::globalEndLuminosityBlock #44413
Comments
cms-bot internal usage |
A new Issue was created by @iarspider. @smuzaffar, @rappoccio, @Dr15Jones, @sextonkennedy, @antoniovilela, @makortel can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core The only PR that could've caused this failure is #43522 |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Interesting the failure is specific to el9 |
#43522 should not have impacted |
On 140.009 the segfault is not easily reproducible, which together with the problem appearing only one of the IB flavors suggests a threading problem of some kind. |
in the stack trace point to either cmssw/DPGAnalysis/Skims/src/LogErrorEventFilter.cc Lines 207 to 208 in 7827a07
or cmssw/DPGAnalysis/Skims/src/LogErrorEventFilter.cc Lines 376 to 384 in 7827a07
(via
) In all cases the modifications seem to be protected with spinlocks. The cmssw/DPGAnalysis/Skims/src/LogErrorEventFilter.cc Lines 30 to 41 in 7827a07
|
Occurred in workflow 140.006 step 3 CMSSW_14_1_ROOT6_X_2024-03-15-2300 on el8_amd64_gcc12
Also here the job seemed to process 0 events, |
Occurred in wf 141.009 step 3 in CMSSW_14_1_NONLTO_X_2024-03-15-2300
|
Occurred in wf 140.008 step 3 in CMSSW_14_1_CLANG_X_2024-03-15-2300
|
Wf 140.004 step 3 in CMSSW_14_1_CLANG_X_2024-03-15-2300 shows likely related but different crash at job shutdown
Probably caused by memory corruption caused by earlier data races, and jemalloc being picky. |
Occurred in
In CMSSW_14_1_NONLTO_X_2024-03-16-1100 wf 140.009 step 3 showed new kind of stack trace
|
The stack traces from CLANG and latest from NONLTO with cmssw/DPGAnalysis/Skims/src/LogErrorEventFilter.cc Lines 205 to 209 in 7827a07
|
I ran valgrind on the step3 of 140.009, that showed
and I found that
is not initialized (in C++17, in C++20 atomic<T> would be value-initialized...). I opened #44447 to initialize statsGuard_ . With the PR the valgrind warning above is gone.
It is still strange that none of the crashes showed concurrent activity in |
I got 140.0009 to crash in gdb with line numbers. The result is consistent, but still not terribly enlightening. Here are the only active threads, 11 is the one that crashed:
|
I believe this is an incorrectly coded spin lock. If b is false on the first try, it works as it should. But if it is not, then expected is set to true and the exchange succeeds on the second try even if b is still true.
|
the really obvious question, which I decided not to ask earlier: why are they coding their own spinlock? |
It was done in #22329, so the question is more "why did we code our own spinlock there". There are also several other places where we have added ad-hock spinlock as similar loop over atomic. |
Just randomly looking for a correct spin lock and I stumbled on this one first, I was going to quote this one as the correct way to do it, but looking closely I think it is also incorrect. It doesn't bust out of the loop when it succeeds. It waits until the next iteration. https://cmssdt.cern.ch/lxr/source/FWCore/Framework/src/Path.cc#0136 |
This one looks correct. https://cmssdt.cern.ch/lxr/source/FWCore/Services/plugins/ConcurrentModuleTimer.cc#0206 |
Just skimming through looking for obvious spin locks with compare_exchange_strong, all the rest of them look OK. Just luck that the first other one I looked at was bad... There are not that many of them (I found 6 total including the 3 discussed above) that are obvious spin locks (I was grepping for compare_exchange_strong in a while loop without some more complicated logic involved). I am actually not aware of a standard library implementation of a spin lock. I think the design intent is that we only use them in cases where the lock is almost never taken and the time it is held is intentionally extremely short. |
Still wrong, atomic is not guaranteed to be lock-free. Better is to use std::atomic_flag, but even then, best is to just not do user-land spin locks. It’s just a bad idea. Unless you’re porting Doom to a CMSSW plugin, just trust the scheduler and take a mutex. |
I'm not completely surprised as the stack traces have never shown that the routine was being called concurrently. Maybe we need helgrind or valgrind? |
My valgrind showed only #44413 (comment) as relevant to this issue. |
I could easily fix the two spin locks to work correctly. Ask and I'll do that. I suspect that will fix the problem... That is what I would suggest. We could fix it in some other way, either now or as a second PR later after we've thought about it more. Some thoughts. On the one hand, I see that avoiding spin locks eliminates the possibility for this kind of mistake. A spin lock is only about 5 lines of code, but it has to be just right and if it's wrong it might not be immediately obvious. The compiler is not going to find the issue and if it rarely locks, problems might be hard to notice, reproduce and debug. I've seen advice to not use spin locks ever. I certainly would not recommend them to most of our users. On the other hand, the multi-threaded Framework is not built on simple things that are hard to get wrong. The Framework is fast partially because we avoid mutexes as much as possible and are using low level, non-locking, and difficult approaches. The standard does not require atomics to be non-locking. They can be implemented uses mutexes. And also locks can be made out of atomics. But my understanding is that on the platforms we use, atomics are in fact non-locking. I think that is the whole point of why the Framework uses them and why the Framework is fast. I see nothing wrong in ConcurrentModuleTimer.cc. Are you worried about the OS interrupting the spin lock thread while it is spinning? Even that would resolve itself with some delay almost always and would only occur very rarely. I suppose it is technically possible for such situations to lead to a deadlock, although I think that is probably rare enough to probably not happen in the lifetime of CMS. Also there is plenty of other multi-threading code in our Framework susceptible to the same kinds of problems that is much more complicated than a spin lock. Maybe I need to read about atomic_flag. I haven't used one of those yet... |
atomic_flag should also work. https://en.cppreference.com/w/cpp/atomic/atomic_flag If you don't use the C++20 extension, the interface looks simpler for a simple spin lock, maybe less prone to error. Guaranteed to be lock free is good, but I am not convinced the Or we could go with mutex or something else... |
I'm okay with the suggestion from @wddgit to fix the existing spinlocks and see if that fixes the crashes. I believe it is true that on all our current platforms, atomics of primitive types are lock free. (There is some code in the stack trace signal handler that actually cares about this, and I can be a bit pedantic about that.) I do believe that the difference between a std::mutex and a user-land spinlock is usually relatively immaterial compared to taking any kind of lock vs. avoiding blocking, and fine-grained vs. coarse-grained locking. We spend a lot of effort on minimizing the occurrence and scope of blocking locks, but I'm not convinced that spinlock vs. mutex makes much difference, except possibly in highly contested code paths. |
@makortel I could probably implement and submit that tomorrow morning if you would like, just a minimal fix. We could follow that up with other changes later if we decide some other approach is better... I searched for spin locks with atomic_flag and found CMSSW already has two of those also. Both looked OK. I'm off on vacation tomorrow afternoon and all next week. |
@wddgit Please do the minimal fix. I think the other points raised by you and @dan131riley needs more discussion. |
I just submitted #44517 which contains the minimal fixes. |
One other comment here. The failure probably requires the job be configured for more than 1 concurrent LuminosityBlocks and that there is actually more than 1 LuminosityBlock in the job. I suspect the probability that multiple global end lumi transitions are executing at the same time is much higher if there aren't any events. One of the comments above mentions the job was processing 0 events. |
Looks like the failure rate decreased after the #44447 was merged in CMSSW_14_1_X_2024-03-22-2300. Since then the failures were
Earlier pretty much every IB had up to a few failures |
I believe a data race would still be a possible cause (especially given the remaining flaw fixed in #44517, and the observation in #44413 (comment)). The mechanism could be (e.g.) the data race itself occurring silently, but the tree data structure getting corrupted such that a subsequent |
Based on two IBs after the #44517 was merged it indeed seems the issue got fixed. |
+core |
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
In CMSSW_14_1_X_2024-03-14-2300 IB for el9_amd64_gcc12, two RelVals 140.009, 140.021:
Full stack trace
``` A fatal system signal has occurred: segmentation violation The following is the call stack containing the origin of the signal.Fri Mar 15 07:11:08 CET 2024
Thread 8 (Thread 0x149120fff640 (LWP 4137871) "cmsRun"):
#0 0x00001491b353639a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00001491b3538ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2 0x0000149179a12cbe in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tsl::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#3 0x0000149179a13223 in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#4 0x0000149179a10a38 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#5 0x0000149168e4f422 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_framework.so.2
#6 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#7 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x1491217ff640 (LWP 4137870) "cmsRun"):
#0 0x00001491b353639a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00001491b3538ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2 0x0000149179a12cbe in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tsl::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#3 0x0000149179a13223 in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#4 0x0000149179a10a38 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#5 0x0000149168e4f422 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_framework.so.2
#6 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#7 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x149121ed9640 (LWP 4137869) "cmsRun"):
#0 0x00001491b353639a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00001491b3538ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2 0x0000149179a12cbe in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tsl::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#3 0x0000149179a13223 in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#4 0x0000149179a10a38 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#5 0x0000149168e4f422 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_framework.so.2
#6 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#7 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x14915e7ff640 (LWP 4137852) "cmsRun"):
#0 0x00001491b35ad975 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x00001491b35b2527 in nanosleep () from /lib64/libc.so.6
#2 0x00001491b35b245e in sleep () from /lib64/libc.so.6
#3 0x00001491af7d3be0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 0x00001491b34d8e5d in syscall () from /lib64/libc.so.6
#6 0x00001491b3aacdb2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x1491b04c4024) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/semaphore.h:100
#7 tbb::detail::r1::binary_semaphore::P (this=0x1491b04c4024) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/semaphore.h:253
#8 tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x1491b04c4020) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:235
#9 tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4000) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:273
#10 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4000) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#11 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#12 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x14915f708640 (LWP 4137851) "cmsRun"):
#0 0x00001491b35ad975 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x00001491b35b2527 in nanosleep () from /lib64/libc.so.6
#2 0x00001491b35b245e in sleep () from /lib64/libc.so.6
#3 0x00001491af7d3be0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 0x00007ffc61d9cbba in clock_gettime ()
#6 0x00001491b35ad84d in clock_gettime@GLIBC_2.2.5 () from /lib64/libc.so.6
#7 0x00001491b014f2c1 in boost::chrono::thread_clock::now() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libboost_chrono.so.1.80.0
#8 0x00001491ae9f8783 in FastTimerService::Measurement::measure_and_accumulate(FastTimerService::AtomicResources&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginHLTriggerTimerPlugins.so
#9 0x00001491b54a8297 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#10 0x00001491b54950cb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#11 0x00001491b545b7de in tbb::detail::d1::function_taskedm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#12 0x00001491b3aaa91b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x1490b72e1b00, waiter=..., this=0x1491b2589480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#13 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::outermost_worker_waiter (t=0x0, waiter=..., this=0x1491b2589480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#14 tbb::detail::r1::arena::process (tls=..., this=) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#15 tbb::detail::r1::market::process (this=, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/market.cpp:599
#16 0x00001491b3aacace in tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#17 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#18 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#19 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x149160109640 (LWP 4137850) "cmsRun"):
#0 0x00001491b35dc6ff in poll () from /lib64/libc.so.6
#1 0x00001491af821a9f in full_read.constprop () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00001491af7d60ac in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3 0x00001491af7d6230 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 std::local_Rb_tree_rotate_left (__root=@0x14915d749f78: 0x1490958006c0, __x=0x149092f17d00) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6 std::_Rb_tree_insert_and_rebalance (__insert_left=, __x=0x14908d1d80c0, __p=, _header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7 0x00001491574ca785 in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so
#8 0x00001491574ca9c2 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so
#9 0x00001491b555dbfb in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#10 0x00001491b5555270 in edm::WorkerTedm::global::EDFilterBase::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#11 0x00001491b54a825f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#12 0x00001491b54950cb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#13 0x00001491b545b7de in tbb::detail::d1::function_taskedm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#14 0x00001491b3aaa91b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x1490b7516400, waiter=..., this=0x1491b2589400) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#15 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::outermost_worker_waiter (t=0x0, waiter=..., this=0x1491b2589400) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#16 tbb::detail::r1::arena::process (tls=..., this=) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#17 tbb::detail::r1::market::process (this=, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/market.cpp:599
#18 0x00001491b3aacace in tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#19 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#20 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#21 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x149189dbe640 (LWP 4137784) "cmsRun"):
#0 0x00001491b35b230f in wait4 () from /lib64/libc.so.6
#1 0x00001491af7d3d37 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00001491af7d5fda in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3 0x00001491b38784d3 in std::execute_native_thread_routine (__p=0x1491a7b09590) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#4 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#5 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x1491b2e9d640 (LWP 4134141) "cmsRun"):
#0 0x00001491b35ad975 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x00001491b35b2527 in nanosleep () from /lib64/libc.so.6
#2 0x00001491b35b245e in sleep () from /lib64/libc.so.6
#3 0x00001491af7d3be0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 0x00001491b34d86bb in sched_yield () from /lib64/libc.so.6
#6 0x00001491b3ab1516 in __gthread_yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/x86_64-redhat-linux-gnu/bits/gthr-default.h:693
#7 std::this_thread::yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/std_thread.h:353
#8 tbb::detail::r1::stealing_loop_backoff::pause (this=0x7ffc61d72038) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/scheduler_common.h:266
#9 tbb::detail::r1::waiter_base::pause (this=0x7ffc61d72030) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/waiters.h:35
#10 tbb::detail::r1::external_waiter::pause (this=0x7ffc61d72030) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/waiters.h:138
#11 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=, tls=..., ed=..., waiter=..., isolation=, fifo_allowed=, critical_allowed=) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:231
#12 0x00001491b3ab3342 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x1491b2589380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:350
#13 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::external_waiter (waiter=..., t=, this=0x1491b2589380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#14 tbb::detail::r1::task_dispatcher::execute_and_wait (t=, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#15 0x00001491b546ba0b in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#16 0x00001491b547518a in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#17 0x00001491b54756e1 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#18 0x00000000004074f5 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#19 0x00001491b3a9f96d in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:688
#20 0x0000000000408ee2 in main::{lambda()#1}::operator()() const ()
#21 0x000000000040517c in main ()
Current Modules:
Module: LogErrorEventFilter:logErrorTooManyClusters (crashed)%MSG-w BeamFitter: AlcaBeamMonitor:AlcaBeamMonitor@endLumi 15-Mar-2024 07:11:48 CET Run: 353015 Lumi: 78
No event read! No Fitting!
%MSG
%MSG-w BeamFitter: AlcaBeamMonitor:AlcaBeamMonitor@endLumi 15-Mar-2024 07:11:48 CET Run: 353015 Lumi: 77
No event read! No Fitting!
%MSG
Module: L1TStage2MuonShowerComp:l1tStage2uGMTMuonShowerVsuGMTMuonShowerCopy1
Module: none
Module: none
The text was updated successfully, but these errors were encountered: