Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JetVertexRefiner #19

Closed
bogdanmishchenko opened this issue Apr 6, 2017 · 24 comments · Fixed by #21
Closed

JetVertexRefiner #19

bogdanmishchenko opened this issue Apr 6, 2017 · 24 comments · Fixed by #21

Comments

@bogdanmishchenko
Copy link

Dear LCFIPlus developers,

I have encountered memory allocation(malloc) error with running JetVertexRefiner( I have used ilcsoft release /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v01-19-01/ ). You can find steering file what I have used and log file attached in zip archive.
Files.zip

@jstrube
Copy link
Contributor

jstrube commented Apr 26, 2017

Thank you for reporting this, and for attaching the steering file.
I need a bit more info, though. What are the files you are running over?
What's the physics process, what's the detector model that was used?

@protopopescu
Copy link
Contributor

The input file was simulated and reconstructed using SiD_o2_v02, from single_b_jets_200GeV.slcio.

@bogdanmishchenko
Copy link
Author

We were able to run two steps:

1.vertexing ( I have used vertex.xml for for DST production) (vertexing works fine) (1-2 steps)

2.I have used jet clustering(jetclustering.xml) for non-flavor-tag applications (jetclustering works fine) (3-4 steps)

And malloc error occurred only after running JetVertexRefiner

@jstrube
Copy link
Contributor

jstrube commented Apr 26, 2017

Sorry, still stuck at the simulation stage:
Is this what you're doing?

source /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v01-19-02/init_ilcsoft.sh
source ../lcgeo/bin/thislcgeo.sh
ddsim -N=5 --compactFile=../lcgeo/SiD/compact/SiD_o2_v02/SiD_o2_v02.xml --runType=batch --inputFile=E250-TDR_ws.Pn2n2h_bb.Gwhizard-1_95.eL.pR.I109040.0001.slcio --outputFile=bb_sim_5_events.slcio

I am getting the error message

cling::DynamicLibraryManager::loadLibrary(): dlopen: cannot load any more object with static TLS
cling::DynamicLibraryManager::loadLibrary(): dlopen: cannot load any more object with static TLS
+--------------------------------------------------------------------------------------------------------+
|  Failed to load DDG4 library:                                                                          |
|  DDG4.py: Failed to load the DDG4 library libDDG4Plugins: No such file or directory                    |
+--------------------------------------------------------------------------------------------------------+

I've checked that LD_LIBRARY_PATH contains a path with the file libDDG4Plugins.so, so I'm not sure what the problem is here.

@bogdanmishchenko
Copy link
Author

bogdanmishchenko commented Apr 26, 2017

Sourcing ilcsoft and thislcgeo seems fine. However, I am not sure that ddsim command works fine in such order (usually - ddsim --compactFile=.xml file --runType= --inputFile=* -N(number of events)
--outputFile=*)

@jstrube
Copy link
Contributor

jstrube commented Apr 26, 2017

Thanks for the quick reply. Tried that, but I get the same error message. Are there other envvars that I am missing?
I've also tried to find the lib in root, but that works:

$ root
   ------------------------------------------------------------
  | Welcome to ROOT 6.08/02                http://root.cern.ch |
  |                               (c) 1995-2016, The ROOT Team |
  | Built for linuxx8664gcc                                    |
  | From tag v6-08-02, 2 December 2016                         |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] gSystem->Load("libDDG4Plugins")
(int) 0

However, from python:

python
Python 2.7.10 (default, Mar 10 2016, 14:55:16)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> ROOT.gSystem.Load("libDDG4Plugins")
cling::DynamicLibraryManager::loadLibrary(): dlopen: cannot load any more object with static TLS
-1

So it looks like this is a ROOT issue. Not sure why that's not a problem at CERN. Maybe they use a different dlopen?

@bogdanmishchenko
Copy link
Author

bogdanmishchenko commented Apr 26, 2017

It might be not the case. However, I used such command for cmake:
cmake -DCMAKE_CXX_COMPILER=which g++ -DCMAKE_C_COMPILER=which gcc
-DILCUTIL_DIR=/cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v01-17-19/ilcutil/v01-03/ -C $ILCSOFT/ILCSoft.cmake ..
It also might due to the recent modification of the lcgeo GitHub.

@jstrube
Copy link
Contributor

jstrube commented Apr 27, 2017

Well, the library libDDG4Plugins.so lives in /cvmfs, so I didn't compile it. dlopen is a system library, so I'll try on a different machine...

@jstrube
Copy link
Contributor

jstrube commented Apr 27, 2017

OK, I got further on a KEK machine, but ddsim expects an MCParticle list name "MCParticle". How do I tell it that my list has a different name?

@aidanrobson
Copy link

Hi Jan, not sure offhand, but if you use for example the file you sent us single_b_jets_200GeV.stdhep and stdhepjob to convert it, then the collection should anyway be named MCParticle:

stdhepjob single_b_jets_200GeV.stdhep single_b_jets_200GeV.slcio -1

ddsim --compactFile=./lcgeo/SiD/compact/SiD_o2_v02/SiD_o2_v02.xml --runType=batch --inputFile=single_b_jets_50GeV.slcio -N=10 --outputFile=single_b_jets_50GeV_sim.slcio

@protopopescu
Copy link
Contributor

Jan, here's my LCFIPlus testing sequence https://www.evernote.com/l/AJ0XEvoXDC9F45SB-bRI2pYDFKvdHcqDqVU

@jstrube
Copy link
Contributor

jstrube commented Apr 27, 2017

Thank you. The instructions from @protopopescu are very helpful. I am now able to reproduce the crash. Looking into it...

@jstrube
Copy link
Contributor

jstrube commented Apr 28, 2017

I re-compiled LCFIPlus with -g and ran gdb.

(gdb) where
#0  0x00007ffff53ed625 in raise () from /lib64/libc.so.6
#1  0x00007ffff53eee05 in abort () from /lib64/libc.so.6
#2  0x00007ffff542b537 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff5430f4e in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff5435528 in _int_malloc () from /lib64/libc.so.6
#5  0x00007ffff5435b1c in malloc () from /lib64/libc.so.6
#6  0x00007fffd5f8dbe1 in ROOT::Minuit2::Numerical2PGradientCalculator::operator()(ROOT::Minuit2::MinimumParameters const&, ROOT::Minuit2::FunctionGradient const&) const ()
    at /scratch/cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/build-6.08.02/include/Minuit2/StackAllocator.h:97
#7  0x00007fffd5f99977 in ROOT::Minuit2::VariableMetricBuilder::Minimum(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MinimumSeed const&, std::vector<ROOT::Minuit2::MinimumState, std::allocator<ROOT::Minuit2::MinimumState> >&, unsigned int, double) const () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/VariableMetricBuilder.cxx:350
#8  0x00007fffd5f9c402 in ROOT::Minuit2::VariableMetricBuilder::Minimum(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MinimumSeed const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/VariableMetricBuilder.cxx:124
#9  0x00007fffd5f8ac5c in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MinimumSeed const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/ModularFunctionMinimizer.cxx:166
#10 0x00007fffd5f89360 in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::FCNBase const&, ROOT::Minuit2::MnUserParameterState const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const ()
    at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/ModularFunctionMinimizer.cxx:120
#11 0x00007fffd5f4aecc in ROOT::Minuit2::Minuit2Minimizer::Minimize() () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/Minuit2Minimizer.cxx:504
#12 0x00007fffd628caf7 in lcfiplus::Helix::LogLikelihood(TVector3 const&, double&) const () at /home/ilc/jstrube/ILC/work/LCFIPlus/src/geometry.cc:328
#13 0x00007fffd629323e in lcfiplus::Helix::LogLikelihood(TVector3 const&) const () at /home/ilc/jstrube/ILC/work/LCFIPlus/./include/geometry.h:120
#14 0x00007fffd628ffbc in ROOT::Math::FunctorHandler<ROOT::Math::Functor, lcfiplus::GeometryHandler::PointFitFunctor>::DoEval(double const*) const () at /home/ilc/jstrube/ILC/work/LCFIPlus/./include/geometry.h:234
#15 0x00007fffd5f81e94 in ROOT::Minuit2::MnUserFcn::operator()(ROOT::Minuit2::LAVector const&) const () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/MnUserFcn.cxx:42
#16 0x00007fffd5f7a31b in ROOT::Minuit2::MnSeedGenerator::operator()(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MnUserParameterState const&, ROOT::Minuit2::MnStrategy const&) const
    () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/MnSeedGenerator.cxx:66
#17 0x00007fffd5f89332 in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::FCNBase const&, ROOT::Minuit2::MnUserParameterState const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const ()
    at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/ModularFunctionMinimizer.cxx:118
#18 0x00007fffd5f88ceb in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::FCNBase const&, ROOT::Minuit2::MnUserParameters const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const ()
    at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/root/6.08.02/math/minuit2/src/ModularFunctionMinimizer.cxx:80
#19 0x00007fffd62898d7 in lcfiplus::GeometryHandler::PointFit(std::vector<lcfiplus::PointBase*, std::allocator<lcfiplus::PointBase*> > const&, TVector3 const&, lcfiplus::Point*) ()
    at /home/ilc/jstrube/ILC/work/LCFIPlus/src/geometry.cc:1316
#20 0x00007fffd6307e24 in lcfiplus::VertexFitterSimple<std::_List_iterator<lcfiplus::Track const*> >::operator()(std::_List_iterator<lcfiplus::Track const*>, std::_List_iterator<lcfiplus::Track const*>, lcfiplus::Vertex*, bool) () at /home/ilc/jstrube/ILC/work/LCFIPlus/./include/VertexFitterSimple.h:35
#21 0x00007fffd6306c88 in lcfiplus::findPrimaryVertex(std::vector<lcfiplus::Track const*, std::allocator<lcfiplus::Track const*> > const&, double, bool, bool) ()
    at /home/ilc/jstrube/ILC/work/LCFIPlus/./include/VertexFinderTearDown.h:49
#22 0x00007fffd625acca in lcfiplus::PrimaryVertexFinder::process() () at /home/ilc/jstrube/ILC/work/LCFIPlus/src/process.cc:81
#23 0x00007fffd62729f2 in LcfiplusProcessor::processEvent(EVENT::LCEvent*) () at /home/ilc/jstrube/ILC/work/LCFIPlus/src/LcfiplusProcessor.cc:234
#24 0x00007ffff7b9705c in marlin::ProcessorMgr::processEvent(EVENT::LCEvent*) () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v01-19-02/Marlin/v01-11/source/src/ProcessorMgr.cc:468
#25 0x00007ffff75133ed in SIO::SIOReader::readStream(int) () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v01-19-02/lcio/v02-08/src/cpp/src/SIO/SIOReader.cc:732
#26 0x0000000000412a57 in main () at /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v01-19-02/Marlin/v01-11/source/src/Marlin.cc:499

Not quite sure yet, what's going on, but I'll keep digging.

@jstrube
Copy link
Contributor

jstrube commented May 10, 2017

@suehara Have you seen something like this before? If you don't have time to look at this yourself right now, could you point us in the right direction?

@andresailer
Copy link
Collaborator

Could you please try

export MALLOC_CHECK_=3

And then rerun Marlin and post the error message and stacktrace if there is one?
Thanks

@protopopescu
Copy link
Contributor

protopopescu commented May 16, 2017

Ok, here's the error with MALLOC_CHECK_=3; it now says
*** Marlin: free(): invalid pointer: 0x0000000003c0a710 ***
stack.txt

@andresailer
Copy link
Collaborator

Thanks! That is the same error that we see.

@protopopescu
Copy link
Contributor

I've narrowed it down to a crash in Minimize() in geometry.cc (both options). I try to understand whether it crashes because there's nothing to minimize, or because an intrinsic ROOT Minimize() issue. With MALLOC_CHECK_=1 the code sometimes runs without crashing. Still digging ...

@protopopescu
Copy link
Contributor

protopopescu commented May 19, 2017

It seems that the crash in PointFit() is caused by the fact that points[i], where i>0, are unusable.

So, to summarise, the algorithms work fine for the first Event, then the VertexRefiner somehow deletes or overwrites something such that at the second Event points[i] passed in VertexFitterSimple to PointFit, are junk or unusable for i>0.

@jstrube
Copy link
Contributor

jstrube commented May 19, 2017

so that means all points are junk?
An in that case the size of the array should have been set to 0?

@andresailer
Copy link
Collaborator

@protopopescu , Nacho could you please test the changes from #21

@jstrube
Copy link
Contributor

jstrube commented May 23, 2017

@SaiLeR Many thanks for the pull request. I tested it and it looks good. I'd like an ok from another developer before merging first, since I haven't been using LCFIPlus personally in a while.
@protopopescu @bogdanmishchenko You can check like this (from an existing LCFIPlus clone)

git checkout -b andresailer-fixParameters master
git pull https://github.com/andresailer/LCFIPlus.git fixParameters

@protopopescu
Copy link
Contributor

I can confirm that replacing _map(ref._map) with _map() fixes the crash. Thanks for the fix, Andre!

@nachogargar
Copy link
Contributor

@jstrube I also confirm that the fix from André solves the issue. LCFIPlus working properly locally and in Grid.
Thanks André!

jstrube added a commit that referenced this issue May 27, 2017
Fix parameter copy ctor
Closes #19 
Many thanks for the fix and the follow-up tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants