-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce ML inference time in b-tag related jet taggers #25230
Comments
assign reconstruction |
A new Issue was created by @slava77 Slava Krutelyov. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
For miniAOD, one way to reduce the total time used by ML tagging algorithms is to run them only on the jets for which the results will actually be used/stored. This is relevant for AK8 jets, since the low-pt ones are not stored. My understanding is that DeepAK8 has an enforced pT cut in the tagger, while DeepDoubleB does not, but maybe this is out of date. Tagging a few more people: @gouskos @rappoccio |
That's a good point. The same preselection could be used, and only run this on the constituents of a preclustered sample with a pt cut on the jet. |
Also, @violatingcp found that for CNN inference, TensorFlow v1.10 is 2.4x faster than TensorFlow v1.6 (which we're currently using). |
On 11/15/18 8:23 AM, Kevin Pedro wrote:
Also, @violatingcp found that for CNN inference, TensorFlow v1.10 is 2.4x faster than TensorFlow v1.6 (which we're currently using).
Is this based on similarly built version?
Recall that the CMSSW production version is compiled with SSE3 and does
not use AVX/AVX512 instructions.
|
Ah, so TF v1.6 used the CMSSW standard version. Indeed it looks like the tf version I am using is using AVX, but not AVX2. I can disable and try to run. |
Not really, we are looking into ways to reduce the input size and eventually the graph size, but we are a long way from a usable solution. |
@slava77 A simple test to show this: [Test setup]
When using (BTW, this still leave ~20-30% difference, which is likely because the CMSSW OpenBLAS is compiled with A search in the external libraries reveal that there are two other libraries that can provide the same cblas functions as OpenBLAS:
It seems that one of them is loaded before OpenBLAS and therefore hiding the OpenBLAS symbols. Any suggestion on how to fix this? Thank you! [0] > ldd /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/mxnet-predict/1.2.1-pafccj2/lib/libmxnetpredict.so
linux-vdso.so.1 => (0x00007ffd9af8a000)
libopenblas.so.0 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_1/external/slc7_amd64_gcc700/lib/libopenblas.so.0 (0x00007f9a63d0e000)
librt.so.1 => /lib64/librt.so.1 (0x00007f9a63b06000)
libstdc++.so.6 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libstdc++.so.6 (0x00007f9a674df000)
libm.so.6 => /lib64/libm.so.6 (0x00007f9a63804000)
libgcc_s.so.1 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libgcc_s.so.1 (0x00007f9a674c6000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9a635e8000)
libc.so.6 => /lib64/libc.so.6 (0x00007f9a6321b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9a67471000)
libgfortran.so.4 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libgfortran.so.4 (0x00007f9a63048000)
libquadmath.so.0 => /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/lib64/libquadmath.so.0 (0x00007f9a63008000) |
@mrodozov @smuzaffar @fabiocos |
@slava77 , looks like root MathMore library (which was loaded first) is linked against gslcblas. We are lokking in to root configuration to see if it can use openblas. |
@hqucms |
Thank you for looking into this! |
cms-sw/cmsdist#5063 should allow us to link against |
@slava77 and @hqucms , running this test with cms-sw/cmsdist#5063 I get these numbers TimeReport 0.003751 0.003751 0.003751 pfDeepBoostedJetTags @slava77 , if you need to test it then I can include it in GCC 8 IBs. |
It would be great to backport this to 10_6_X so UL reco and analysis can profit from the speed improvement. |
tests in cms-sw/cmsdist#5063 showed a small regression in some b-tag discriminant. |
Thank you @smuzaffar ! |
The impact of cms-sw/cmsdist#5063 is small for the UL. |
Another thing: I came across a fairly new ML inference engine called ONNX Runtime recently and did some tests with it. It seems that it can bring another ~1.5x speed-up compared to MXNet+OpenBLAS for DeepAK8. More interestingly, it seems to bring ~3-5x speed-up for the AK4 DeepJet model compared to TensorFlow. Might be interesting to get this into CMSSW. |
@hqucms I think that we are mixing different things in this thread:
|
i'll look at adding the onnx runtime tool to cmssw - looks interesting... if it would be a way to avoid the non-supported use of the tensorflow c++ interface that would be great (and if its faster, thats all the better, of course one has to do a comparison with like-built libraries, which can make a big difference)
… On Jul 8, 2019, at 7:25 AM, Fabio Cossutti ***@***.***> wrote:
@hqucms I think that we are mixing different things in this thread:
• effect of revised BLAS implementation used by MXNet: which is the outcome? Do we understand the regression? Is a backport safe in this respect?
• possible alternative engines to MXNet: this is a new development, that may be of course tested upon agreement with the RECO group, but I see it as a separate issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@davidlange6 I can make a PR to cmsdist since I have already tested it a bit. |
ah - that would be great. thanks.
… On Jul 8, 2019, at 3:16 PM, Huilin Qu ***@***.***> wrote:
@davidlange6 I can make a PR to cmsdist since I have already tested it a bit.
Also per @fabiocos's request I opened a new issue (#27458) regarding ONNX runtime to avoid mixing up two fronts.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@slava77 you are correct, but at some point the discussing was about the regression due to cms-sw/cmsdist#5063 and I believe it would be good to keep that issue the whatever other development distinct, just that |
So, it looks like this regression is an indication of a random changes in the algorithm I created #27504 to keep track of the problem |
this could partly be closed now after ONNX inference was implemented ( #28112; via #27458). Since new inference engines showed up since the late 2018 when this issue was opened, I will reopen a fresh one (#32883) and close this one. |
with #24918 included, b-tag-related ML inference is at around 20% of the miniAOD processing time
Here are the averages (excluding peak [initialization?] time) from 1K events in workflow 136.8311 running with this PR on top of 10_4_0_pre1
This issue is to keep track of the progress, aiming for a reduction.
The next BTV-related jet-tag update should not increase the total time.
@ferencek @mverzett @kskovpen @andrzejnovak @jmduarte @hqucms
The text was updated successfully, but these errors were encountered: