reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883

slava77 · 2021-02-12T02:36:43Z

This is a replacement/refresh for #25230 where the total of ML jet taggers was 20% of miniAOD time.

in a recent variant of reminiAOD (now the 2018 UL remini wf 136.88811) jet tagging inference takes 15% of the miniAOD processing time, as measured in CMSSW_11_3_0_pre2

   0.39 pfDeepCSVJetTagsAK8PFPuppiSoftDropSubjets          DeepFlavourJetTagsProducer
   0.51            pfDeepCSVJetTagsAK8Puppi          DeepFlavourJetTagsProducer
   1.06               pfDeepCSVJetTagsPuppi          DeepFlavourJetTagsProducer
   1.51 pfMassIndependentDeepDoubleCvLV2JetTagsSlimmedAK8DeepTags      DeepDoubleXONNXJetTagsProducer
   1.52 pfMassIndependentDeepDoubleCvBV2JetTagsSlimmedAK8DeepTags      DeepDoubleXONNXJetTagsProducer
   1.61 pfMassIndependentDeepDoubleBvLV2JetTagsSlimmedAK8DeepTags      DeepDoubleXONNXJetTagsProducer
   7.96 pfMassDecorrelatedDeepBoostedJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
   8.23 pfDeepBoostedJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  16.56 pfDeepFlavourJetTagsSlimmedDeepFlavour      DeepFlavourONNXJetTagsProducer
  17.62 pfHiggsInteractionNetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  65.43 pfMassDecorrelatedParticleNetJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  65.53 pfParticleNetJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  66.13 pfParticleNetAK4JetTagsSlimmedDeepFlavour       BoostedJetONNXJetTagsProducer
Total of the above: 254.06 ms/ev

About 3/4 is in ParticleNet (3 modules/taggers). This can be the primary target for reduction.
InteractionNet (1 module/tagger) is next at 7% of the jet ML taggers
DeepFlavour [aka deepJet?] (1 module/tagger) is just about 7%
DeepBoostedJet (2 modules/taggers) is closer to 6%
DDxV2 (3 modules/taggers) is about 2% of the taggers, it is probably close to be OK based on reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883 (comment)

The text was updated successfully, but these errors were encountered:

cmsbuild · 2021-02-12T02:37:03Z

A new Issue was created by @slava77 Slava Krutelyov.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

slava77 · 2021-02-12T02:40:11Z

@hqucms @emilbols @camclean @alefisico @riga

slava77 · 2021-02-12T02:40:20Z

assign reconstruction

cmsbuild · 2021-02-12T02:40:29Z

New categories assigned: reconstruction

@slava77,@perrotta,@jpata you have been requested to review this Pull request/Issue and eventually sign? Thanks

kpedro88 · 2021-02-12T02:52:28Z

@mialiu149 @jmduarte

andrzejnovak · 2021-02-12T09:16:34Z

Analyzing inputs with LRP/Integrated Gradient and removing some 40% lowest-scoring variables reduced the inference time for DDX V2 by half (not counting model load/initiation).

slava77 · 2021-02-12T13:18:42Z

Analyzing inputs with LRP/Integrated Gradient and removing some 40% lowest-scoring variables reduced the inference time for DDX V2 by half (not counting model load/initiation).

please clarify if V2 already has this reduction or if it is a possible improvement.
Thank you.

slava77 · 2021-02-12T13:23:40Z

to avoid potential improvements from the last few months, I updated the timing values using 11_3_0_pre2 (instead of 11_2_0_pre9). The results did not really change.

andrzejnovak · 2021-02-12T13:36:33Z

Analyzing inputs with LRP/Integrated Gradient and removing some 40% lowest-scoring variables reduced the inference time for DDX V2 by half (not counting model load/initiation).

please clarify if V2 already has this reduction or if it is a possible improvement.
Thank you.

V2 already has this. The V2 is similar in time to the V1 (after the ONNX update) even though it considered more inputs

riga · 2021-02-18T08:16:07Z

ONNXRuntime was updated from 1.3.0 to 1.6.0 yesterday (cms-sw/cmsdist#6649), and should be available with IB CMSSW_11_3_X_2021-02-17-2300. After going through a few commits, the update should also bring some performance improvements, so maybe it's worth checking the impact on the miniAOD time again.

slava77 · 2021-02-18T15:24:31Z

11_3_0_pre2 -> CMSSW_11_3_X_2021-02-17-2300 in ms/ev

   0.39 -> 0.35 pfDeepCSVJetTagsAK8PFPuppiSoftDropSubjets          DeepFlavourJetTagsProducer
   0.51 -> 0.46          pfDeepCSVJetTagsAK8Puppi          DeepFlavourJetTagsProducer
   1.06 -> 0.99             pfDeepCSVJetTagsPuppi          DeepFlavourJetTagsProducer
   1.51 -> 1.48 pfMassIndependentDeepDoubleCvLV2JetTagsSlimmedAK8DeepTags      DeepDoubleXONNXJetTagsProducer
   1.52 -> 1.47 pfMassIndependentDeepDoubleCvBV2JetTagsSlimmedAK8DeepTags      DeepDoubleXONNXJetTagsProducer
   1.61 -> 1.54 pfMassIndependentDeepDoubleBvLV2JetTagsSlimmedAK8DeepTags      DeepDoubleXONNXJetTagsProducer
   7.96 -> 7.67 pfMassDecorrelatedDeepBoostedJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
   8.23 -> 8.03 pfDeepBoostedJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  16.56 -> 13.92 pfDeepFlavourJetTagsSlimmedDeepFlavour      DeepFlavourONNXJetTagsProducer
  17.62 -> 16.04 pfHiggsInteractionNetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  65.43 -> 64.01 pfMassDecorrelatedParticleNetJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  65.53 -> 63.97 pfParticleNetJetTagsSlimmedAK8DeepTags       BoostedJetONNXJetTagsProducer
  66.13 -> 62.91 pfParticleNetAK4JetTagsSlimmedDeepFlavour       BoostedJetONNXJetTagsProducer
Total of the above: 254.06 ms/ev -> 242.84 ms/ev.

There is about 5% reduction, which kind of looks correlated with use of ONNX rather than the job running generally faster or some other changes between the releases.

I would not consider the 5% reduction a significant enough effect to resolve this issue.

hqucms · 2021-02-18T16:47:31Z

Do we plan to enable AVX/AVX2 support in ONNXRuntime at some point, either explicitly or implicitly via the MLAS_DYNAMIC_CPU_ARCH flag? It will speed things up quite a lot (e.g., ~2x for ParticleNet).

slava77 · 2021-04-28T11:56:04Z

Do we plan to enable AVX/AVX2 support in ONNXRuntime at some point, either explicitly or implicitly via the MLAS_DYNAMIC_CPU_ARCH flag? It will speed things up quite a lot (e.g., ~2x for ParticleNet).

What is the range of the "Dynamic"? Is it smart enough to stay with AVX2 or will it push for AVX512 wherever available regardless of possible frequency scaling implications?

Considering that I found out recently that we are effectively using dynamic in TF (#33442) and operationally things were OK, I think that it's reasonable to try it wider.
@hqucms would you be available to make a PR to test the feature?
Lets try it.
Thanks.

hqucms · 2021-04-28T12:33:47Z

@slava77

Yes the level of "dynamic" can be controlled:

MLAS_DYNAMIC_CPU_ARCH=0: no AVX
MLAS_DYNAMIC_CPU_ARCH=1: up to AVX
MLAS_DYNAMIC_CPU_ARCH=2: up to AVX2
MLAS_DYNAMIC_CPU_ARCH>2: up to AVX512

Sure I can open a PR. Any suggestion on how dynamic we want to use?

slava77 · 2021-04-28T13:18:51Z

Sure I can open a PR. Any suggestion on how dynamic we want to use?

=2 looks reasonable; I'm not sure if we'd need to "regress" to =1.

hqucms · 2021-04-28T13:51:33Z

@slava77 OK I made the PR: cms-sw/cmsdist#6855.
What kind of tests do we want to do with it?

slava77 · 2021-04-28T14:31:52Z

@slava77 OK I made the PR: cms-sw/cmsdist#6855.
What kind of tests do we want to do with it?

jenkins tests with timing monitored in miniAOD should be enough to confirm the benefits.

I guess that there will be small differences between the =0 and =2 versions.
So, merging may required a bit of a leap of faith that the differences will go away because the available infrastructure has AVX2 already.

hqucms · 2021-04-28T15:08:05Z

I guess that there will be small differences between the =0 and =2 versions.

@slava77 You are referring to the small numerical difference of the outputs right?

slava77 · 2021-04-28T15:44:40Z

@slava77 You are referring to the small numerical difference of the outputs right?

yes

jpata · 2022-04-11T11:57:46Z

@emilbols please take note of this performance issue, and let us know the plans to address this.

@cms-sw/btv-pog-l2

emilbols · 2022-04-12T09:10:51Z

@emilbols please take note of this performance issue, and let us know the plans to address this.

@cms-sw/btv-pog-l2

I believe after PR cms-sw/cmsdist#6855 there was a reduction for all the ONNX modules cms-sw/cmsdist#6855 (comment). If im not mistaken the table referenced here is before that.

A simple thing that might be useful to do, is to make sure the tagger is only running of the phase space that is needed. For instance I believe DeepJet runs on jets beyond eta 2.5 and below pt 20 GeV even though it is not used in this phase space. On the actual ML inference side, we have to investigate further how to improve the situation. I will bring it up with the BTV conveners.

@hqucms @ademoor @riga @jmduarte @andrzejnovak

jpata · 2022-04-13T09:15:07Z

Thanks for confirming.

Comparing 11_3_0 (which already includes the AVX2 fix I think) and 12_4_0_pre2 in Run3 MINIAOD, 400evs, on the exact same machine:

BoostedJetONNXJetTagsProducer: 78ms (11%) -> 37ms (6%)
DeepFlavourONNXJetTagsProducer: 11ms (1.5%) -> 5ms (1%)

Additional improvements (e.g. not doing inference in unused phase space) would be useful.

jpata · 2022-05-05T14:52:16Z

+reconstruction

jpata · 2022-05-05T14:52:23Z

@cmsbuild please close

cmsbuild · 2022-05-05T14:52:36Z

This issue is fully signed and ready to be closed.

cmsbuild added the pending-assignment label Feb 12, 2021

slava77 mentioned this issue Feb 12, 2021

reduce ML inference time in b-tag related jet taggers #25230

Closed

cmsbuild added pending-signatures reconstruction-pending and removed pending-assignment labels Feb 12, 2021

riga mentioned this issue Feb 12, 2021

Apparent data race in onnxruntime on aarch64 #32899

Closed

hqucms mentioned this issue Apr 28, 2021

Enable DYNAMIC_CPU_ARCH in ONNXRuntime to use up to AVX2 cms-sw/cmsdist#6855

Merged

cmsbuild added fully-signed reconstruction-approved and removed reconstruction-pending pending-signatures labels May 5, 2022

cmsbuild closed this as completed May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883

reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883

slava77 commented Feb 12, 2021 •

edited

Loading

cmsbuild commented Feb 12, 2021

slava77 commented Feb 12, 2021

slava77 commented Feb 12, 2021

cmsbuild commented Feb 12, 2021

kpedro88 commented Feb 12, 2021

andrzejnovak commented Feb 12, 2021

slava77 commented Feb 12, 2021

slava77 commented Feb 12, 2021

andrzejnovak commented Feb 12, 2021

riga commented Feb 18, 2021

slava77 commented Feb 18, 2021

hqucms commented Feb 18, 2021

slava77 commented Apr 28, 2021

hqucms commented Apr 28, 2021

slava77 commented Apr 28, 2021

hqucms commented Apr 28, 2021

slava77 commented Apr 28, 2021

hqucms commented Apr 28, 2021

slava77 commented Apr 28, 2021

jpata commented Apr 11, 2022

emilbols commented Apr 12, 2022

jpata commented Apr 13, 2022

jpata commented May 5, 2022

jpata commented May 5, 2022

cmsbuild commented May 5, 2022

reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883

reduce ML inference time in b-tag and related jet taggers: focus on ParticleNet #32883

Comments

slava77 commented Feb 12, 2021 • edited Loading

cmsbuild commented Feb 12, 2021

slava77 commented Feb 12, 2021

slava77 commented Feb 12, 2021

cmsbuild commented Feb 12, 2021

kpedro88 commented Feb 12, 2021

andrzejnovak commented Feb 12, 2021

slava77 commented Feb 12, 2021

slava77 commented Feb 12, 2021

andrzejnovak commented Feb 12, 2021

riga commented Feb 18, 2021

slava77 commented Feb 18, 2021

hqucms commented Feb 18, 2021

slava77 commented Apr 28, 2021

hqucms commented Apr 28, 2021

slava77 commented Apr 28, 2021

hqucms commented Apr 28, 2021

slava77 commented Apr 28, 2021

hqucms commented Apr 28, 2021

slava77 commented Apr 28, 2021

jpata commented Apr 11, 2022

emilbols commented Apr 12, 2022

jpata commented Apr 13, 2022

jpata commented May 5, 2022

jpata commented May 5, 2022

cmsbuild commented May 5, 2022

slava77 commented Feb 12, 2021 •

edited

Loading