Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

Merged
merged 14 commits into from
Dec 5, 2019

Conversation

hqucms
Copy link
Contributor

@hqucms hqucms commented Oct 3, 2019

PR description:

This PR adds a preliminary ONNXRuntime-based implementation of DeepJet and DeepAK8, following the proposal in #27458. The runtime of DeepJet (DeepFlavour) b-tagger is reduced by ~7x, while for DeepAK8 the runtime is reduced by a few percent.

Timing benchmark based on 1000 JetHT Run2017F events (running the standard NanoAOD sequence), measured on a lxplus7 node:

[before]

TimeReport   0.057633     0.057633     0.057633  pfDeepFlavourJetTagsWithDeepInfo
TimeReport   0.003382     0.003382     0.003382  pfDeepBoostedJetTagsAK8WithDeepInfo
TimeReport   0.003456     0.003456     0.003456  pfMassDecorrelatedDeepBoostedJetTagsAK8WithDeepInfo

[after]

TimeReport   0.009149     0.009149     0.009149  pfDeepFlavourJetTagsWithDeepInfo
TimeReport   0.003199     0.003199     0.003199  pfDeepBoostedJetTagsAK8WithDeepInfo
TimeReport   0.003124     0.003124     0.003124  pfMassDecorrelatedDeepBoostedJetTagsAK8WithDeepInfo

(Similar speedup is also observed on an AMD CPU.)

Currently we set MLAS_DYNAMIC_CPU_ARCH=0 (introduced in hqucms/onnxruntime@7222aea) so ONNXRuntime does not attempt to use AVX/AVX2-based kernels even if they are available on the machine. Allowing ONNXRuntime to dynamically switch to AVX/AVX2-based kernels on CPUs supporting these instructions can bring another speed-up of 1.5x ~ 2x, e.g.,

MLAS_DYNAMIC_CPU_ARCH=99

TimeReport   0.005881     0.005881     0.005881  pfDeepFlavourJetTagsWithDeepInfo
TimeReport   0.001769     0.001769     0.001769  pfDeepBoostedJetTagsAK8WithDeepInfo
TimeReport   0.001763     0.001763     0.001763  pfMassDecorrelatedDeepBoostedJetTagsAK8WithDeepInfo

(But then the results are not bitwise reproducible across different CPU architectures).

Overview of the speedups

Screen Shot 2019-10-19 at 14 32 58

(DeepTauID implementation is not included in this PR.)

More details in https://indico.cern.ch/event/855787/contributions/3601398/attachments/1929206/3194789/ML_inference_ONNXRuntime_RECOAT_20191018_H_Qu.pdf.

PR Dependencies:

PR validation:

  • Local tests on a large file (~20k events) show changes in the DeepJet/DeepAK8 outputs only at ~ numerical precision level.
  • Thread-safety test: compared results from single and 4-, 8-thread runs and obtained consistent results.
  • No additional thread pool is created after hqucms/onnxruntime@04f3c76: pstree shows no additional threads being created during the job run.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2019

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2019

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28112/12128

  • This PR adds an extra 32KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2019

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2019

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28112/12129

  • This PR adds an extra 32KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2019

A new Pull Request was created by @hqucms (Huilin Qu) for master.

It involves the following packages:

PhysicsTools/NanoAOD
PhysicsTools/ONNXRuntime
RecoBTag/Configuration
RecoBTag/MXNet
RecoBTag/ONNXRuntime

The following packages do not have a category, yet:

PhysicsTools/ONNXRuntime
RecoBTag/ONNXRuntime
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@perrotta, @cmsbuild, @fgolf, @slava77, @santocch, @peruzzim can you please review it and eventually sign? Thanks.
@emilbols, @smoortga, @acaudron, @HeinerTholen, @JyothsnaKomaragiri, @mverzett, @ferencek, @gpetruc, @andrzejnovak, @pvmulder this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bunch of basic questions.

PhysicsTools/ONNXRuntime/interface/ONNXRuntime.h Outdated Show resolved Hide resolved
PhysicsTools/ONNXRuntime/interface/ONNXRuntime.h Outdated Show resolved Hide resolved
PhysicsTools/ONNXRuntime/src/ONNXRuntime.cc Outdated Show resolved Hide resolved
PhysicsTools/ONNXRuntime/src/ONNXRuntime.cc Outdated Show resolved Hide resolved
// Returns: a std::vector<std::vector<float>>, with the order matched to `output_names`.
// When `output_names` is empty, will return all outputs ordered as in `getOutputNames()`.
FloatArrays run(const std::vector<std::string>& input_names,
FloatArrays input_values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this vector of vectors intentionally passed by value (leading to copying)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but thinking again it may not be the best way -- The problem is that Value::CreateTensor needs T * rather than const T *, so here we have to pass a non-const FloatArrays & which does not seem very nice to me. Still, I change it to passing by reference, as this object is typically fairly large.

std::vector<std::vector<unsigned int>> input_shapes_; // shapes of each input group
std::unordered_map<std::string, PreprocessParams> prep_info_map_; // preprocessing info for each input group

FloatArrays data_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the data_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~5k floats.

@slava77
Copy link
Contributor

slava77 commented Nov 25, 2019

+1

for #28112 c02969d

@santocch
Copy link

+1

@fabiocos
Copy link
Contributor

@peruzzim @fgolf could you please check?

@slava77
Copy link
Contributor

slava77 commented Nov 29, 2019

@peruzzim @fgolf could you please check?

ping
the xpog signature is missing and it seems like there were not comments from xpog so far yet.

@smuzaffar smuzaffar modified the milestones: CMSSW_11_0_X, CMSSW_11_1_X Dec 2, 2019
@fabiocos
Copy link
Contributor

fabiocos commented Dec 3, 2019

@peruzzim when do you expect we may finally have a scrutiny of this update on the xpog side?

@peruzzim
Copy link
Contributor

peruzzim commented Dec 5, 2019

+xpog
for the minor configuration changes to PhysicsTools/NanoAOD/python/nano_cff.py

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 5, 2019

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

fabiocos commented Dec 5, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.