Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrating to multithreading of RPCPointProducer #9760

Merged
merged 10 commits into from
Jul 6, 2015

Conversation

cmsbuild
Copy link
Contributor

  • migration of the module to a stream producer
  • make const the configuration parameters
  • add override to inherited methods

will be backported in 74X
Automatically ported from CMSSW_7_5_X #9100 (original by @HuguesBrun).

@cmsbuild
Copy link
Contributor Author

A new Pull Request was created by @cmsbuild for CMSSW_7_6_X.

migrating to multithreading of RPCPointProducer

It involves the following packages:

RecoLocalMuon/RPCRecHit

@cmsbuild, @cvuosalo, @slava77 can you please review it and eventually sign? Thanks.
@bellan, @jhgoh this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@HuguesBrun HuguesBrun force-pushed the moveRPCalcaPathToMultiThread branch from d09f458 to f01858e Compare June 30, 2015 19:18
@slava77
Copy link
Contributor

slava77 commented Jun 30, 2015

@cmsbuild please test

@HuguesBrun the 76X picked PR picked up your changes automatically since these are based on the same topic branch

@cmsbuild
Copy link
Contributor Author

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor Author

Pull request #9760 was updated. @cmsbuild, @cvuosalo, @slava77 can you please check and sign again.

@cmsbuild
Copy link
Contributor Author

cmsbuild commented Jul 5, 2015

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_6_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @smuzaffar

@davidlange6
Copy link
Contributor

+1

@Martin-Grunewald
Copy link
Contributor

It looks like this PR is creating a crash in the frozen HLT menus for 25ns and 50ns:

Begin processing the 17th record. Run 1, Event 517, LumiSection 6 at 08-Jul-2015 07:47:29.711 CEST
%MSG-e FatalSystemSignal:  RPCPointProducer:hltRPCPointProducer  08-Jul-2015 07:47:30 CEST Run: 1 Event: 517
A fatal system signal has occurred: segmentation violation
%MSG


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
NOTE:The first few functions on the stack are artifacts of processing the signal and can be ignored

#0  0x0000003e40aac61e in waitpid () from /lib64/libc.so.6
#1  0x0000003e40a3e609 in do_system () from /lib64/libc.so.6
#2  0x00007f0d28313d37 in TUnixSystem::StackTrace() () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/external/slc6_amd64_gcc491/lib/libCore.so
#3  0x00007f0d2264c925 in sig_dostack_then_abort () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/libFWCoreServices.so
#4  <signal handler called>
#5  0x00007f0d09019a5e in std::_Rb_tree<RPCDetId, RPCDetId, std::_Identity<RPCDetId>, std::less<RPCDetId>, std::allocator<RPCDetId> >::_M_copy(std::_Rb_tree_node<RPCDetId> const*, std::_Rb_tree_node<RPCDetId>*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/pluginRecoLocalMuonRPCRecHit.so
#6  0x00007f0d09017e13 in CSCSegtoRPC::CSCSegtoRPC(edm::Handle<edm::RangeMap<CSCDetId, edm::OwnVector<CSCSegment, edm::ClonePolicy<CSCSegment> >, edm::ClonePolicy<CSCSegment> > >, edm::EventSetup const&, edm::Event const&, bool, double, ObjectMapCSC const*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/pluginRecoLocalMuonRPCRecHit.so
#7  0x00007f0d090220d5 in RPCPointProducer::produce(edm::Event&, edm::EventSetup const&) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/pluginRecoLocalMuonRPCRecHit.so
#8  0x00007f0d28ee7e79 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/libFWCoreFramework.so
#9  0x00007f0d28edbd1f in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventPrincipal&, edm::EventSetup const&, edm::ModuleCallingContext const*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/libFWCoreFramework.so

@HuguesBrun
Copy link
Contributor

Hi @Martin-Grunewald ,

Which sample are you using to see this crash ?
I tried to run the HLT with the PR merged to make sure it does not change the RPC alca path behavior and I did no see anything (I did that with ~5000 DY events).

Thank you,
Hugues

@slava77
Copy link
Contributor

slava77 commented Jul 8, 2015

@Martin-Grunewald does this show up in the addOnTests?

@Martin-Grunewald
Copy link
Contributor

It appears on ttbar MC in the v1 frozen HLT menus (those used, eg, in 74X MC production).
(It is not seen in the addOn tests as those run the HLT development menus)

@Martin-Grunewald
Copy link
Contributor

Traceback with line numbers:

Begin processing the 17th record. Run 1, Event 517, LumiSection 6 at 08-Jul-2015 17:59:17.259 CEST
%MSG-e FatalSystemSignal:  RPCPointProducer:hltRPCPointProducer  08-Jul-2015 17:59:17 CEST Run: 1 Event: 517
A fatal system signal has occurred: segmentation violation
%MSG


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
NOTE:The first few functions on the stack are artifacts of processing the signal and can be ignored

#0  0x0000003e40aac61e in waitpid () from /lib64/libc.so.6
#1  0x0000003e40a3e609 in do_system () from /lib64/libc.so.6
#2  0x00007f476a11ad37 in TUnixSystem::StackTrace() () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/external/slc6_amd64_gcc491/lib/libCore.so
#3  0x00007f47640e2925 in sig_dostack_then_abort () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/libFWCoreServices.so
#4  <signal handler called>
#5  0x00007f474ad17ac6 in __gnu_cxx::new_allocator<std::_Rb_tree_node<RPCDetId> >::construct<RPCDetId, RPCDetId const&> (this=0x7fffc86bc210, __p=0x7f4723e057e0) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/ext/new_allocator.h:120
#6  0x00007f474ad17235 in std::allocator_traits<std::allocator<std::_Rb_tree_node<RPCDetId> > >::_S_construct<RPCDetId, RPCDetId const&> (__a=..., __p=0x7f4723e057e0) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/alloc_traits.h:253
#7  0x00007f474ad164c9 in std::allocator_traits<std::allocator<std::_Rb_tree_node<RPCDetId> > >::construct<RPCDetId, RPCDetId const&> (__a=..., __p=0x7f4723e057e0) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/alloc_traits.h:399
#8  0x00007f474ad14836 in std::_Rb_tree<RPCDetId, RPCDetId, std::_Identity<RPCDetId>, std::less<RPCDetId>, std::allocator<RPCDetId> >::_M_create_node<RPCDetId const&> (this=0x7fffc86bc210) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/stl_tree.h:421
#9  0x00007f474ad13efe in std::_Rb_tree<RPCDetId, RPCDetId, std::_Identity<RPCDetId>, std::less<RPCDetId>, std::allocator<RPCDetId> >::_M_clone_node (this=0x7fffc86bc210, __x=0x656e6d6968635f56) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/stl_tree.h:445
#10 0x00007f474ad11010 in std::_Rb_tree<RPCDetId, RPCDetId, std::_Identity<RPCDetId>, std::less<RPCDetId>, std::allocator<RPCDetId> >::_M_copy (this=0x7fffc86bc210, __x=0x656e6d6968635f56, __p=0x7fffc86bc218) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/stl_tree.h:1207
#11 0x00007f474ad0e79d in std::_Rb_tree<RPCDetId, RPCDetId, std::_Identity<RPCDetId>, std::less<RPCDetId>, std::allocator<RPCDetId> >::_Rb_tree (this=0x7fffc86bc210, __x=...) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/stl_tree.h:676
#12 0x00007f474ad0ce2d in std::set<RPCDetId, std::less<RPCDetId>, std::allocator<RPCDetId> >::set (this=0x7fffc86bc210, __x=...) at /afs/cern.ch/cms/sw/ReleaseCandidates/volB/slc6_amd64_gcc491/external/gcc/4.9.1-cms/include/c++/4.9.1/bits/stl_set.h:197
#13 0x00007f474ad0c778 in ObjectMapCSC::getRolls (this=0x7f472d358cd0, cscstationindex=...) at /data/CMS0/CMSSW_7_6_X_2015-07-07-1100/src/RecoLocalMuon/RPCRecHit/interface/CSCSegtoRPC.h:45
#14 0x00007f474ad0aa48 in CSCSegtoRPC::CSCSegtoRPC (this=0x7fffc86bc550, allCSCSegments=..., iSetup=..., iEvent=..., debug=false, eyr=0.5, TheObjectCSC=0x7f472d358cd0) at /data/CMS0/CMSSW_7_6_X_2015-07-07-1100/src/RecoLocalMuon/RPCRecHit/src/CSCSegtoRPC.cc:167
#15 0x00007f474ad2516f in RPCPointProducer::produce (this=0x7f4738ead200, iEvent=..., iSetup=...) at /data/CMS0/CMSSW_7_6_X_2015-07-07-1100/src/RecoLocalMuon/RPCRecHit/src/RPCPointProducer.cc:90
#16 0x00007f476aceee79 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_6_X_2015-07-07-1100/lib/slc6_amd64_gcc491/libFWCoreFramework.so

@Martin-Grunewald
Copy link
Contributor

To reproduce: make a developer area, then

cd src
cmsenv
rehash
git cms-addpkg HLTrigger/Configuration
git cms-checkdeps -A -a
scram build -j 4
cd HLTrigger/Configuration/test/
./runAll.csh 50ns_5e33_v1

the last cmd takes some time to produce workflow cfg files, and then runs a subset of them.

@HuguesBrun
Copy link
Contributor

Hi,
Thank you for the recipe, I just tried again to run the full menu with a configuration obtained by the hltGetConfiguration and it still does not crash...
Hugues

@Martin-Grunewald
Copy link
Contributor

Yep, it is not every HLT menu, but some...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants