ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

hqucms · 2019-10-03T16:55:06Z

PR description:

This PR adds a preliminary ONNXRuntime-based implementation of DeepJet and DeepAK8, following the proposal in #27458. The runtime of DeepJet (DeepFlavour) b-tagger is reduced by ~7x, while for DeepAK8 the runtime is reduced by a few percent.

Timing benchmark based on 1000 JetHT Run2017F events (running the standard NanoAOD sequence), measured on a lxplus7 node:

[before]

TimeReport   0.057633     0.057633     0.057633  pfDeepFlavourJetTagsWithDeepInfo
TimeReport   0.003382     0.003382     0.003382  pfDeepBoostedJetTagsAK8WithDeepInfo
TimeReport   0.003456     0.003456     0.003456  pfMassDecorrelatedDeepBoostedJetTagsAK8WithDeepInfo

[after]

TimeReport   0.009149     0.009149     0.009149  pfDeepFlavourJetTagsWithDeepInfo
TimeReport   0.003199     0.003199     0.003199  pfDeepBoostedJetTagsAK8WithDeepInfo
TimeReport   0.003124     0.003124     0.003124  pfMassDecorrelatedDeepBoostedJetTagsAK8WithDeepInfo

(Similar speedup is also observed on an AMD CPU.)

Currently we set MLAS_DYNAMIC_CPU_ARCH=0 (introduced in hqucms/onnxruntime@7222aea) so ONNXRuntime does not attempt to use AVX/AVX2-based kernels even if they are available on the machine. Allowing ONNXRuntime to dynamically switch to AVX/AVX2-based kernels on CPUs supporting these instructions can bring another speed-up of 1.5x ~ 2x, e.g.,

MLAS_DYNAMIC_CPU_ARCH=99

TimeReport   0.005881     0.005881     0.005881  pfDeepFlavourJetTagsWithDeepInfo
TimeReport   0.001769     0.001769     0.001769  pfDeepBoostedJetTagsAK8WithDeepInfo
TimeReport   0.001763     0.001763     0.001763  pfMassDecorrelatedDeepBoostedJetTagsAK8WithDeepInfo

(But then the results are not bitwise reproducible across different CPU architectures).

Overview of the speedups

(DeepTauID implementation is not included in this PR.)

More details in https://indico.cern.ch/event/855787/contributions/3601398/attachments/1929206/3194789/ML_inference_ONNXRuntime_RECOAT_20191018_H_Qu.pdf.

PR Dependencies:

PR validation:

Local tests on a large file (~20k events) show changes in the DeepJet/DeepAK8 outputs only at ~ numerical precision level.
Thread-safety test: compared results from single and 4-, 8-thread runs and obtained consistent results.
No additional thread pool is created after hqucms/onnxruntime@04f3c76: pstree shows no additional threads being created during the job run.

cmsbuild · 2019-10-03T16:55:38Z

The code-checks are being triggered in jenkins.

cmsbuild · 2019-10-03T17:02:46Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28112/12128

This PR adds an extra 32KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File PhysicsTools/NanoAOD/python/nano_cff.py modified in PR(s): NanoAODv6 updates, part 1 #28103

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28112/12128/code-format.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28112/12128/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

cmsbuild · 2019-10-03T17:36:12Z

The code-checks are being triggered in jenkins.

cmsbuild · 2019-10-03T17:45:02Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28112/12129

This PR adds an extra 32KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File PhysicsTools/NanoAOD/python/nano_cff.py modified in PR(s): NanoAODv6 updates, part 1 #28103

cmsbuild · 2019-10-03T17:45:42Z

A new Pull Request was created by @hqucms (Huilin Qu) for master.

It involves the following packages:

PhysicsTools/NanoAOD
PhysicsTools/ONNXRuntime
RecoBTag/Configuration
RecoBTag/MXNet
RecoBTag/ONNXRuntime

The following packages do not have a category, yet:

PhysicsTools/ONNXRuntime
RecoBTag/ONNXRuntime
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@perrotta, @cmsbuild, @fgolf, @slava77, @santocch, @peruzzim can you please review it and eventually sign? Thanks.
@emilbols, @smoortga, @acaudron, @HeinerTholen, @JyothsnaKomaragiri, @mverzett, @ferencek, @gpetruc, @andrzejnovak, @pvmulder this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

makortel

Just a bunch of basic questions.

PhysicsTools/ONNXRuntime/interface/ONNXRuntime.h

PhysicsTools/ONNXRuntime/src/ONNXRuntime.cc

RecoBTag/ONNXRuntime/plugins/DeepBoostedJetTagsONNXProducer.cc

makortel · 2019-10-03T18:13:27Z

PhysicsTools/ONNXRuntime/interface/ONNXRuntime.h

+    // Returns: a std::vector<std::vector<float>>, with the order matched to `output_names`.
+    // When `output_names` is empty, will return all outputs ordered as in `getOutputNames()`.
+    FloatArrays run(const std::vector<std::string>& input_names,
+                    FloatArrays input_values,


Is this vector of vectors intentionally passed by value (leading to copying)?

Yes, but thinking again it may not be the best way -- The problem is that Value::CreateTensor needs T * rather than const T *, so here we have to pass a non-const FloatArrays & which does not seem very nice to me. Still, I change it to passing by reference, as this object is typically fairly large.

makortel · 2019-10-03T18:15:01Z

RecoBTag/ONNXRuntime/plugins/DeepBoostedJetTagsONNXProducer.cc

+  std::vector<std::vector<unsigned int>> input_shapes_;  // shapes of each input group
+  std::unordered_map<std::string, PreprocessParams> prep_info_map_;  // preprocessing info for each input group
+
+  FloatArrays data_;


How big is the data_?

~5k floats.

slava77 · 2019-11-25T21:18:30Z

+1

for #28112 c02969d

code changes are in line with the PR description and the follow up review
jenkins tests pass and comparisons with the baseline show only small differences in the jet tag discriminants for which the inference engine was changed in this PR
on the technical side,
- the speedups quantified in the PR description are up to a factor of 8
- the locked memory use is down by about a factor of 2 in a single thread, improving even further (to x3.7 at \infty) at the larger number of threads, based on the details posted earlier http://hqu.web.cern.ch/hqu/dev/cgi-bin/igprof-navigator/NANO-JetHT2017D-CMSSW_11_0_0_pre10-MEM_LIVE_999 vs http://hqu.web.cern.ch/hqu/dev/cgi-bin/igprof-navigator/NANO-JetHT2017D-CMSSW_11_0_0_pre10-PR28112-ONNXRuntime-MEM_LIVE_999
there is a plan to also truncate the values of the computed discriminants; its completion is tracked via apply 1e-4 rounding/truncation in ONNXRuntime jet tags #28469

santocch · 2019-11-25T21:40:37Z

+1

fabiocos · 2019-11-26T07:14:34Z

@peruzzim @fgolf could you please check?

slava77 · 2019-11-29T12:33:22Z

@peruzzim @fgolf could you please check?

ping
the xpog signature is missing and it seems like there were not comments from xpog so far yet.

fabiocos · 2019-12-03T13:37:45Z

@peruzzim when do you expect we may finally have a scrutiny of this update on the xpog side?

peruzzim · 2019-12-05T10:37:12Z

+xpog
for the minor configuration changes to PhysicsTools/NanoAOD/python/nano_cff.py

cmsbuild · 2019-12-05T10:37:42Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

fabiocos · 2019-12-05T21:14:10Z

+1

hqucms added 3 commits October 3, 2019 04:21

Add ONNXRuntime helper.

a1b9223

Run DeepJet and DeepBoostedJet w/ ONNXRuntime.

6e6aa50

Use ONNXRuntime based DeepJet/DeepBoostedJet.

d69b80a

cmsbuild added this to the CMSSW_11_0_X milestone Oct 3, 2019

cmsbuild added analysis-pending code-checks-pending comparison-pending new-package-pending orp-pending pending-signatures reconstruction-pending tests-pending xpog-pending labels Oct 3, 2019

hqucms mentioned this pull request Oct 3, 2019

Accelerate ML inference with ONNX Runtime #27458

Closed

cmsbuild added code-checks-rejected and removed code-checks-pending labels Oct 3, 2019

Apply code-checks.

920bd8a

cmsbuild added code-checks-pending and removed code-checks-rejected labels Oct 3, 2019

cmsbuild added code-checks-approved and removed code-checks-pending labels Oct 3, 2019

makortel reviewed Oct 3, 2019

View reviewed changes

Simplify implementation.

14fd6e7

cmsbuild added code-checks-pending and removed code-checks-approved labels Oct 3, 2019

slava77 mentioned this pull request Nov 25, 2019

apply 1e-4 rounding/truncation in ONNXRuntime jet tags #28469

Open

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Nov 25, 2019

cmsbuild added analysis-approved and removed analysis-pending labels Nov 25, 2019

smuzaffar modified the milestones: CMSSW_11_0_X, CMSSW_11_1_X Dec 2, 2019

cmsbuild added fully-signed xpog-approved and removed pending-signatures xpog-pending labels Dec 5, 2019

cmsbuild added orp-approved and removed orp-pending labels Dec 5, 2019

cmsbuild merged commit 0993fad into cms-sw:master Dec 5, 2019

slava77 mentioned this pull request Feb 13, 2020

recover TF inference for deepJet and deepDoubleX and MXNET inf for deepBoosted (migrated to ONNX in #28112) #28959

Closed

This was referenced Mar 10, 2020

Restore TF and MXNet-based inference for DeepJet, DeepDoubleX and DeepAK8 #29172

Closed

Technical update to the DeepJet and DeepDoubleX TF models cms-data/RecoBTag-Combined#27

Closed

hqucms mentioned this pull request Apr 17, 2020

features for UL re-miniAOD in 10_6_X #27889

Closed

41 tasks

smuzaffar mentioned this pull request May 7, 2020

ppc64le support microsoft/onnxruntime#2921

Closed

hqucms mentioned this pull request Jun 5, 2020

[10_6_X] ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #30123

Merged

slava77 mentioned this pull request Feb 12, 2021

reduce ML inference time in b-tag related jet taggers #25230

Closed

dan131riley mentioned this pull request Sep 20, 2022

Crashes in workflow 39434.911 #39445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

hqucms commented Oct 3, 2019 •

edited

Loading

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

makortel left a comment

makortel Oct 3, 2019

hqucms Oct 4, 2019

makortel Oct 3, 2019

hqucms Oct 4, 2019

slava77 commented Nov 25, 2019

santocch commented Nov 25, 2019

fabiocos commented Nov 26, 2019

slava77 commented Nov 29, 2019

fabiocos commented Dec 3, 2019

peruzzim commented Dec 5, 2019

cmsbuild commented Dec 5, 2019

fabiocos commented Dec 5, 2019

ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

Conversation

hqucms commented Oct 3, 2019 • edited Loading

PR description:

Overview of the speedups

PR Dependencies:

PR validation:

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

cmsbuild commented Oct 3, 2019

makortel left a comment

Choose a reason for hiding this comment

makortel Oct 3, 2019

Choose a reason for hiding this comment

hqucms Oct 4, 2019

Choose a reason for hiding this comment

makortel Oct 3, 2019

Choose a reason for hiding this comment

hqucms Oct 4, 2019

Choose a reason for hiding this comment

slava77 commented Nov 25, 2019

santocch commented Nov 25, 2019

fabiocos commented Nov 26, 2019

slava77 commented Nov 29, 2019

fabiocos commented Dec 3, 2019

peruzzim commented Dec 5, 2019

cmsbuild commented Dec 5, 2019

fabiocos commented Dec 5, 2019

hqucms commented Oct 3, 2019 •

edited

Loading