reduce ML inference time in b-tag related jet taggers #25230

slava77 · 2018-11-14T14:08:55Z

with #24918 included, b-tag-related ML inference is at around 20% of the miniAOD processing time

Here are the averages (excluding peak [initialization?] time) from 1K events in workflow 136.8311 running with this PR on top of 10_4_0_pre1

   0.33 pfDeepCSVJetTagsAK8PFPuppiSoftDropSubjets          DeepFlavourJetTagsProducer
   0.41            pfDeepCSVJetTagsAK8Puppi          DeepFlavourJetTagsProducer 
   0.94               pfDeepCSVJetTagsPuppi          DeepFlavourJetTagsProducer
  16.01 pfDeepDoubleCvLJetTagsSlimmedAK8DeepTags        DeepDoubleXTFJetTagsProducer
  16.09 pfDeepDoubleBvLJetTagsSlimmedAK8DeepTags        DeepDoubleXTFJetTagsProducer
  16.10 pfDeepDoubleCvBJetTagsSlimmedAK8DeepTags        DeepDoubleXTFJetTagsProducer
  16.20 pfMassIndependentDeepDoubleCvLJetTagsSlimmedAK8DeepTags        DeepDoubleXTFJetTagsProducer
  16.58 pfMassIndependentDeepDoubleCvBJetTagsSlimmedAK8DeepTags        DeepDoubleXTFJetTagsProducer
  28.86 pfDeepBoostedJetTagsSlimmedAK8DeepTags          DeepBoostedJetTagsProducer
  29.72 pfMassDecorrelatedDeepBoostedJetTagsSlimmedAK8DeepTags          DeepBoostedJetTagsProducer
 153.85 pfDeepFlavourJetTagsSlimmedDeepFlavour        DeepFlavourTFJetTagsProducer
Total from above: 295.09 ms/ev

This issue is to keep track of the progress, aiming for a reduction.
The next BTV-related jet-tag update should not increase the total time.

@ferencek @mverzett @kskovpen @andrzejnovak @jmduarte @hqucms

The text was updated successfully, but these errors were encountered:

slava77 · 2018-11-14T14:09:04Z

assign reconstruction

cmsbuild · 2018-11-14T14:09:09Z

New categories assigned: reconstruction

@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2018-11-14T14:09:11Z

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

kpedro88 · 2018-11-14T15:22:49Z

For miniAOD, one way to reduce the total time used by ML tagging algorithms is to run them only on the jets for which the results will actually be used/stored. This is relevant for AK8 jets, since the low-pt ones are not stored. My understanding is that DeepAK8 has an enforced pT cut in the tagger, while DeepDoubleB does not, but maybe this is out of date.

Tagging a few more people: @gouskos @rappoccio

rappoccio · 2018-11-14T18:31:49Z

That's a good point. The same preselection could be used, and only run this on the constituents of a preclustered sample with a pt cut on the jet.

kpedro88 · 2018-11-15T16:23:34Z

Also, @violatingcp found that for CNN inference, TensorFlow v1.10 is 2.4x faster than TensorFlow v1.6 (which we're currently using).

slava77 · 2018-11-15T16:28:12Z

On 11/15/18 8:23 AM, Kevin Pedro wrote: Also, @violatingcp found that for CNN inference, TensorFlow v1.10 is 2.4x faster than TensorFlow v1.6 (which we're currently using).

Is this based on similarly built version? Recall that the CMSSW production version is compiled with SSE3 and does not use AVX/AVX512 instructions.

violatingcp · 2018-11-15T16:42:48Z

Ah, so TF v1.6 used the CMSSW standard version. Indeed it looks like the tf version I am using is using AVX, but not AVX2. I can disable and try to run.

slava77 · 2019-05-24T19:13:20Z

@ferencek @mverzett @kskovpen @andrzejnovak @jmduarte @hqucms
was there any significant progress on this topic of speeding up ML b-tags in the recent months?
Please comment.
Thank you.

mverzett · 2019-05-27T08:06:33Z

Not really, we are looking into ways to reduce the input size and eventually the graph size, but we are a long way from a usable solution.

hqucms · 2019-06-27T16:41:25Z

@slava77
In some recent studies, we found that there is a significant difference (4-5x) between standalone MXNet and MXNet in CMSSW in the inference speed. After some investigation, the cause seems to be the linking with the BLAS library. The standalone MXNet is statically linked to OpenBLAS. The one in CMSSW, despite being dynamically linked to OpenBLAS [0], seems not calling into OpenBLAS when running inside cmsRun.

A simple test to show this:

[Test setup]
Using CMSSW_10_6_1 on lxplus7
Python cfg: RecoBTag/MXNet/test/test_deep_boosted_jet_cfg.py
Input file: /store/mc/RunIIFall17MiniAODv2/ZprimeToTT_M1000_W10_TuneCP2_13TeV-madgraphMLM-pythia8/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14_ext1-v1/90000/0C1D564B-DB28-E911-912C-0CC47A4D76D0.root (copied to local)

> cmsRun test_deep_boosted_jet_cfg.py inputFiles=file:TT-M1000.root maxEvents=1000 >& orig.log &                                                                                                            
TimeReport   0.020148     0.020148     0.020148  pfDeepBoostedJetTags
TimeReport   0.018684     0.018684     0.018684  pfMassDecorrelatedDeepBoostedJetTags

> LD_PRELOAD=/cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/OpenBLAS/0.3.5/lib/libopenblas.so cmsRun test_deep_boosted_jet_cfg.py inputFiles=file:TT-M1000.root maxEvents=1000 >& preload_openblas_cmssw.log &
TimeReport   0.005091     0.005091     0.005091  pfDeepBoostedJetTags
TimeReport   0.004962     0.004962     0.004962  pfMassDecorrelatedDeepBoostedJetTags

When using LD_PRELOAD we saw a speedup of ~4x, roughly consistent with standalone MXNet now.

(BTW, this still leave ~20-30% difference, which is likely because the CMSSW OpenBLAS is compiled with DYNAMIC_ARCH=0 therefore not able to exploit newer instructions eg AVX available on the machine -- maybe we should also consider enabling DYNAMIC_ARCH?)

A search in the external libraries reveal that there are two other libraries that can provide the same cblas functions as OpenBLAS:

/cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_1/external/slc7_amd64_gcc700/lib/libcblas.so
0000000000007c44 T cblas_sgemm

/cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_1/external/slc7_amd64_gcc700/lib/libgslcblas.so
0000000000002c10 T cblas_sgemm

/cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_1/external/slc7_amd64_gcc700/lib/libopenblas.so
000000000008e010 T cblas_sgemm

It seems that one of them is loaded before OpenBLAS and therefore hiding the OpenBLAS symbols. Any suggestion on how to fix this?

Thank you!

[0]

> ldd /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/mxnet-predict/1.2.1-pafccj2/lib/libmxnetpredict.so 
        linux-vdso.so.1 =>  (0x00007ffd9af8a000)
        libopenblas.so.0 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_1/external/slc7_amd64_gcc700/lib/libopenblas.so.0 (0x00007f9a63d0e000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f9a63b06000)
        libstdc++.so.6 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libstdc++.so.6 (0x00007f9a674df000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f9a63804000)
        libgcc_s.so.1 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libgcc_s.so.1 (0x00007f9a674c6000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9a635e8000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f9a6321b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9a67471000)
        libgfortran.so.4 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libgfortran.so.4 (0x00007f9a63048000)
        libquadmath.so.0 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libquadmath.so.0 (0x00007f9a63008000)

slava77 · 2019-06-27T17:27:50Z

@mrodozov @smuzaffar @fabiocos
please take a look and comment on #25230 (comment)
It looks like we have multiple libraries providing the symbols and perhaps the libopenblas.so should replace the rest

smuzaffar · 2019-06-28T07:46:01Z

@slava77 , looks like root MathMore library (which was loaded first) is linked against gslcblas. We are lokking in to root configuration to see if it can use openblas.

mrodozov · 2019-06-28T12:47:52Z

Input file: /store/mc/RunIIFall17MiniAODv2/ZprimeToTT_M1000_W10_TuneCP2_13TeV-madgraphMLM-pythia8/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14_ext1-v1/90000/0C1D564B-DB28-E911-912C-0CC47A4D76D0.root

@hqucms
File is not there anymore, would you put it somewhere on afs or any other place we can copy it so we can test with CMSSW pls ?
Thanks

hqucms · 2019-06-28T14:19:10Z

Thank you for looking into this!
@mrodozov I copied the file to my EOS space:
/eos/cms/store/user/hqu/forTesting/TT-M1000.root

smuzaffar · 2019-06-28T15:41:55Z

cms-sw/cmsdist#5063 should allow us to link against openblas instead of gslcblas. Once integrated in IB then we will inform you so that you can re-run the tests

smuzaffar · 2019-07-01T21:05:47Z

@slava77 and @hqucms , running this test with cms-sw/cmsdist#5063 I get these numbers

TimeReport 0.003751 0.003751 0.003751 pfDeepBoostedJetTags
TimeReport 0.003788 0.003788 0.003788 pfMassDecorrelatedDeepBoostedJetTags

@slava77 , if you need to test it then I can include it in GCC 8 IBs.

kpedro88 · 2019-07-01T21:16:08Z

It would be great to backport this to 10_6_X so UL reco and analysis can profit from the speed improvement.

slava77 · 2019-07-02T05:10:45Z

tests in cms-sw/cmsdist#5063 showed a small regression in some b-tag discriminant.
We need to understand the origins of that before merging to the master branch or backporting.

hqucms · 2019-07-02T10:27:27Z

@slava77 and @hqucms , running this test with cms-sw/cmsdist#5063 I get these numbers

TimeReport 0.003751 0.003751 0.003751 pfDeepBoostedJetTags
TimeReport 0.003788 0.003788 0.003788 pfMassDecorrelatedDeepBoostedJetTags

Thank you @smuzaffar !
These numbers are consistent with my tests -- they are better than the ones I quote in #25230 (comment), which I think is due to the enabling of DYNAMIC_ARCH.

fabiocos · 2019-07-02T12:57:58Z

@kpedro88 @slava77 as the production for 2017 UL is about to start, if this is not merely technical we need to understand up to which point we may afford to backport things

slava77 · 2019-07-02T13:08:17Z

@kpedro88 @slava77 as the production for 2017 UL is about to start, if this is not merely technical we need to understand up to which point we may afford to backport things

The impact of cms-sw/cmsdist#5063 is small for the UL.
The backports may be more important for the releases where these taggers have to run in nano.

hqucms · 2019-07-07T19:17:46Z

Another thing: I came across a fairly new ML inference engine called ONNX Runtime recently and did some tests with it. It seems that it can bring another ~1.5x speed-up compared to MXNet+OpenBLAS for DeepAK8. More interestingly, it seems to bring ~3-5x speed-up for the AK4 DeepJet model compared to TensorFlow. Might be interesting to get this into CMSSW.

fabiocos · 2019-07-08T05:25:31Z

@hqucms I think that we are mixing different things in this thread:

effect of revised BLAS implementation used by MXNet: which is the outcome? Do we understand the regression? Is a backport safe in this respect?
possible alternative engines to MXNet: this is a new development, that may be of course tested upon agreement with the RECO group, but I see it as a separate issue

davidlange6 · 2019-07-08T11:34:23Z

i'll look at adding the onnx runtime tool to cmssw - looks interesting... if it would be a way to avoid the non-supported use of the tensorflow c++ interface that would be great (and if its faster, thats all the better, of course one has to do a comparison with like-built libraries, which can make a big difference)

…

On Jul 8, 2019, at 7:25 AM, Fabio Cossutti ***@***.***> wrote: @hqucms I think that we are mixing different things in this thread: • effect of revised BLAS implementation used by MXNet: which is the outcome? Do we understand the regression? Is a backport safe in this respect? • possible alternative engines to MXNet: this is a new development, that may be of course tested upon agreement with the RECO group, but I see it as a separate issue — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hqucms · 2019-07-08T13:16:34Z

@davidlange6 I can make a PR to cmsdist since I have already tested it a bit.
Also per @fabiocos's request I opened a new issue (#27458) regarding ONNX runtime to avoid mixing up two fronts.

davidlange6 · 2019-07-08T13:23:25Z

ah - that would be great. thanks.

…

On Jul 8, 2019, at 3:16 PM, Huilin Qu ***@***.***> wrote: @davidlange6 I can make a PR to cmsdist since I have already tested it a bit. Also per @fabiocos's request I opened a new issue (#27458) regarding ONNX runtime to avoid mixing up two fronts. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 · 2019-07-08T13:48:17Z

@hqucms I think that we are mixing different things in this thread:

@fabiocos
this is a generic thread, opened in November last year. Actually, back then there was some promising development unrelated to either the openblas or the ONNX, which did not materialize with a solution.

fabiocos · 2019-07-09T18:42:30Z

@slava77 you are correct, but at some point the discussing was about the regression due to cms-sw/cmsdist#5063 and I believe it would be good to keep that issue the whatever other development distinct, just that

slava77 · 2019-07-11T23:14:43Z

@slava77 you are correct, but at some point the discussing was about the regression due to cms-sw/cmsdist#5063 and I believe it would be good to keep that issue the whatever other development distinct, just that

So, it looks like this regression is an indication of a random changes in the algorithm
also seen in #27488 (comment) and #27428 (comment)

I created #27504 to keep track of the problem

slava77 · 2021-02-12T02:37:35Z

this could partly be closed now after ONNX inference was implemented ( #28112; via #27458).
However, the total cost is still fairly large after other additions.

Since new inference engines showed up since the late 2018 when this issue was opened, I will reopen a fresh one (#32883) and close this one.

cmsbuild added pending-signatures reconstruction-pending labels Nov 14, 2018

slava77 mentioned this issue Nov 14, 2018

DeepDoubleCvL and DeepDoubleCvB tagger integration #24918

Merged

hqucms mentioned this issue Jul 8, 2019

Accelerate ML inference with ONNX Runtime #27458

Closed

hqucms mentioned this issue Jul 12, 2019

deepAK8 reproducibility problems after switching to openblas #27504

Closed

hqucms mentioned this issue Feb 10, 2020

MXNet slowdown -- OpenBLAS hidden by another BLAS library again? cms-sw/cmsdist#5528

Closed

slava77 mentioned this issue Feb 12, 2021

reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883

Closed

5 tasks

slava77 closed this as completed Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce ML inference time in b-tag related jet taggers #25230

reduce ML inference time in b-tag related jet taggers #25230

slava77 commented Nov 14, 2018

slava77 commented Nov 14, 2018

cmsbuild commented Nov 14, 2018

cmsbuild commented Nov 14, 2018

kpedro88 commented Nov 14, 2018

rappoccio commented Nov 14, 2018

kpedro88 commented Nov 15, 2018

slava77 commented Nov 15, 2018 via email

violatingcp commented Nov 15, 2018

slava77 commented May 24, 2019

mverzett commented May 27, 2019

hqucms commented Jun 27, 2019

slava77 commented Jun 27, 2019

smuzaffar commented Jun 28, 2019

mrodozov commented Jun 28, 2019

hqucms commented Jun 28, 2019

smuzaffar commented Jun 28, 2019

smuzaffar commented Jul 1, 2019

kpedro88 commented Jul 1, 2019

slava77 commented Jul 2, 2019

hqucms commented Jul 2, 2019

fabiocos commented Jul 2, 2019

slava77 commented Jul 2, 2019

hqucms commented Jul 7, 2019

fabiocos commented Jul 8, 2019

davidlange6 commented Jul 8, 2019 via email

hqucms commented Jul 8, 2019

davidlange6 commented Jul 8, 2019 via email

slava77 commented Jul 8, 2019

fabiocos commented Jul 9, 2019

slava77 commented Jul 11, 2019

slava77 commented Feb 12, 2021

reduce ML inference time in b-tag related jet taggers #25230

reduce ML inference time in b-tag related jet taggers #25230

Comments

slava77 commented Nov 14, 2018

slava77 commented Nov 14, 2018

cmsbuild commented Nov 14, 2018

cmsbuild commented Nov 14, 2018

kpedro88 commented Nov 14, 2018

rappoccio commented Nov 14, 2018

kpedro88 commented Nov 15, 2018

slava77 commented Nov 15, 2018 via email

violatingcp commented Nov 15, 2018

slava77 commented May 24, 2019

mverzett commented May 27, 2019

hqucms commented Jun 27, 2019

slava77 commented Jun 27, 2019

smuzaffar commented Jun 28, 2019

mrodozov commented Jun 28, 2019

hqucms commented Jun 28, 2019

smuzaffar commented Jun 28, 2019

smuzaffar commented Jul 1, 2019

kpedro88 commented Jul 1, 2019

slava77 commented Jul 2, 2019

hqucms commented Jul 2, 2019

fabiocos commented Jul 2, 2019

slava77 commented Jul 2, 2019

hqucms commented Jul 7, 2019

fabiocos commented Jul 8, 2019

davidlange6 commented Jul 8, 2019 via email

hqucms commented Jul 8, 2019

davidlange6 commented Jul 8, 2019 via email

slava77 commented Jul 8, 2019

fabiocos commented Jul 9, 2019

slava77 commented Jul 11, 2019

slava77 commented Feb 12, 2021