-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROOT] Unit Tests testDataFormatsScoutingRunX
failing in ROOT IBs
#41222
Comments
A new Issue was created by @aandvalenzuela Andrea Valenzuela. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
gdb shows this
|
@vgvassilev , these tests are failing for root 6.28 and root master based builds but work for root 6.26. tests fail at the end when python (https://github.com/cms-sw/cmssw/blob/master/DataFormats/Scouting/test/scoutingCollectionsDumper.py) process is finished. Any idea why this is failing for root 628/master? |
This area has been quite problematic due to the way that ROOT is being shut down. @pcanal, do you know how we can work this issue with |
The destruction order is delicate and I remember we had some issue with the tear down triggering from python. Any chance to have a standalone reproducer (so that I can start by bisecting where/when it started failing)? |
Also could you run the failing example with |
A number of reports like this seem related..
==19874== Invalid free() / delete / delete[] / realloc()
==19874== at 0x403C4CD: operator delete(void*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/external/valgrind/3.17.0-7ca83817e7379e83453f913e11e14834/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==19874== by 0x4ECCA891: ???
==19874== by 0x9EDBF06: cling::IncrementalExecutor::runAndRemoveStaticDestructors(cling::Transaction*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/lcg/root/6.28.01-24fc933274de02f0e1bb17b71bafaf30/lib/libCling.so)
==19874== by 0x9E65DB8: cling::Interpreter::runAndRemoveStaticDestructors() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/lcg/root/6.28.01-24fc933274de02f0e1bb17b71bafaf30/lib/libCling.so)
==19874== by 0x9C38B8B: TCling::ResetGlobals() (TCling.cxx:3718)
==19874== by 0x8026FE9: TROOT::EndOfProcessCleanups() (TROOT.cxx:1215)
==19874== by 0x551E600F: ???
==19874== by 0x7D400FC: WrapperCall(long, unsigned long, void*, void*, void*) (clingwrapper.cxx:774)
==19874== by 0x7D403EA: Cppyy::CallV(long, void*, unsigned long, void*) (clingwrapper.cxx:825)
==19874== by 0x7C82290: GILCallV(long, void*, CPyCppyy::CallContext*) (Executors.cxx:68)
==19874== by 0x7C846EA: CPyCppyy::(anonymous namespace)::VoidExecutor::Execute(long, void*, CPyCppyy::CallContext*) (Executors.cxx:410)
==19874== by 0x7C6730B: CPyCppyy::CPPMethod::ExecuteFast(void*, long, CPyCppyy::CallContext*) (CPPMethod.cxx:74)
==19874== Address 0x495f7cc0 is 0 bytes inside a block of size 20 free'd
… On Mar 31, 2023, at 5:49 AM, Malik Shahzad Muzaffar ***@***.***> wrote:
@pcanal , you can find the valgrid output here
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@pcanal probably meant |
|
Actually you need both. |
Also this seem eerily similar to cms-sw/root#136 |
Just providing the full context for @davidlange6 's (imo correct) issue diagnosis:
|
@pcanal any progress on this? |
It's a traditional double delete, here as a global static in the interpreter memory. We need to find out which object it is, and why it's registered twice as static global. @pcanal do you think you can try to reproduce this? If so please take it. If not please assign it to me and I will debug next week (I'm off now). |
@Dr15Jones Can you remind me the reproducer? |
@makortel could you help Philippe to reproduce the problem? |
something like
|
Just for the record, I shared a simpler recipe to Philippe. |
Yes, this is not resolved. |
@smuzaffar Would it be feasible to have a longer-than-IB-lifetime build for ROOT628? Or do we keep the tests failing for now? |
I would say, let the test fail for now. |
So I did a small amount of testing and I see the failure using the following minimum python import ROOT
f = ROOT.TFile.Open("testDataFormatsScoutingRun2_step1.root")
e = ROOT.fwlite.Event(f) the job fails as it is attempting to shutdown. |
I believe this may also be fixed via #42628, can you confirm? |
It seems so. The last failures were in CMSSW_13_3_ROOT6_X_2023-08-28-2300, and #42628 was merged in CMSSW_13_3_X_2023-08-29-1100. So far there has been 6 IBs (x2 to cover both 6.28 and master) without these failures. I also tested the simple reproducer of #41222 (comment) in CMSSW_13_3_ROOT628_X_2023-09-03-2300, and the job succeeded now. |
+core |
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
I'm going to now proceed with the deletion of these unit tests (as they were superseded earlier, and were kept only to the investigation of this issue). If anyone subscribed thinks there would still be value in keeping the tests, scream soon. |
Hello,
There are two Unit Tests (
testDataFormatsScoutingRun2
andtestDataFormatsScoutingRun3
from moduleDataFormats/Scouting
) failing in both ROOT6 and ROOT628 IBs.I have reproduced the failure using
CMSSW_13_1_ROOT628_X_2023-03-27-2300
:They fail when running
scoutingCollectionsDumper.py
(https://github.com/cms-sw/cmssw/blob/master/DataFormats/Scouting/test/testDataFormatsScoutingRun2.sh#L23) and I would say after analyzing the first event (https://github.com/cms-sw/cmssw/blob/master/DataFormats/Scouting/test/scoutingCollectionsDumper.py#L174).Find the full log here.
Let's follow-up the discussion started on #41189 (comment).
FYI, @Dr15Jones, @smuzaffar, @iarspider
The text was updated successfully, but these errors were encountered: