Optimize softmax cpu by parallel using openmp. #36

zhanghuanrong · 2018-11-27T19:06:16Z

No description provided.

ke1337 · 2018-11-27T23:54:29Z

onnxruntime/core/providers/cpu/math/softmax_shared.cc

-
-  // Put the intermediate result X - max(X) into Y by first copying X to Y, and then subtracting max from each entry
-  gsl::copy(gsl::make_span(Xdata, nd), gsl::make_span(Ydata, nd));
+  static const int kGROUP = 8;


8 [](start = 28, length = 1)

Why this hardcoded number? Would it be more readable and efficient to use Eigen reduction and broadcast directly for the whole Softmax computation? https://eigen.tuxfamily.org/dox/group__TutorialReductionsVisitorsBroadcasting.html

This is to control the parallel. I could change it to let openmp decide it dynamically.

Currently no idea on how to use Eigen for such parallel. Could you provide more information?

This? https://eigen.tuxfamily.org/dox/TopicMultiThreading.html

You considered the same thing when I start working on this. I read this before, it have parallel gemm and some other matrix algebra. But no generic support to control the parallel on rowwise, colwise, cwise, etc. Some body raised such issue for years, yet no implementation accepted yet.
So outside control parallel is manually needed now.

Also I changed the group count calculation logic. Please check again.
Thanks, Lei

…nning.

ke1337

* Add inference and evaluation for BERT QDQ * Add fp16 and int8 flags * Modify data reader to handle multiple features * Set batch size to 1 due to some qdq model might have issues processing batch size greate than 1 * Add comment and ignore last commit commit * Add evaluation for squad v2.0 * Run evaluate script directly in e2e script

* enable MIGraphX EP on Windows * [MIGraphX EP] Fix provider options * fix formatting * unify the package name for both rocm and migraphx * fix compilation after moving to rocm6.2 * make STREAM_SYNC the default * workaround hip sdk bug on windows * Revert rename of private var for now --------- Co-authored-by: Filip Jankovic <filip.jankovic@amd.com> Co-authored-by: Ted Themistokleous <107195283+TedThemistokleous@users.noreply.github.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>

… (#21084)" This reverts commit 1d7bf56.

* enable MIGraphX EP on Windows * [MIGraphX EP] Fix provider options * fix formatting * unify the package name for both rocm and migraphx * fix compilation after moving to rocm6.2 * make STREAM_SYNC the default * workaround hip sdk bug on windows * Revert rename of private var for now --------- Co-authored-by: Filip Jankovic <filip.jankovic@amd.com> Co-authored-by: Ted Themistokleous <107195283+TedThemistokleous@users.noreply.github.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>

Optimize softmax cpu by parallel using openmp.

e7bdfa0

ke1337 reviewed Nov 27, 2018

View reviewed changes

Better opemmp parallel group count calculation in Softmax parallel ru…

c530064

…nning.

zhanghuanrong requested a review from a team as a code owner November 28, 2018 19:16

ke1337 previously approved these changes Nov 28, 2018

View reviewed changes

Simpler unused parameter in #if defined() switch.

6b00e6b

zhanghuanrong dismissed ke1337’s stale review via 6b00e6b November 28, 2018 19:32

ke1337 approved these changes Nov 29, 2018

View reviewed changes

zhanghuanrong merged commit cd1042c into master Nov 29, 2018

pranavsharma deleted the zhalei/softmax_optimize branch December 3, 2018 23:02

lanyuer mentioned this pull request May 16, 2022

crashed in construction of Ort::Env, xcode13, iPhoneX #11446

Closed

horror-proton mentioned this pull request Feb 21, 2024

A bug occurs when the program terminates #15174

Open

snnn pushed a commit that referenced this pull request Jun 20, 2024

[MIGraphX EP] enable compilation and execution on Windows (#36) (#21084)

1d7bf56

snnn added a commit that referenced this pull request Jun 21, 2024

Revert "[MIGraphX EP] enable compilation and execution on Windows (#36)…

d9670c3

… (#21084)" This reverts commit 1d7bf56.

FaithyQi mentioned this pull request Oct 22, 2024

[Mobile] null pointer dereference #22538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize softmax cpu by parallel using openmp. #36

Optimize softmax cpu by parallel using openmp. #36

zhanghuanrong commented Nov 27, 2018

ke1337 Nov 27, 2018 •

edited

Loading

zhanghuanrong Nov 28, 2018

ke1337 Nov 28, 2018

zhanghuanrong Nov 28, 2018

ke1337 left a comment

Optimize softmax cpu by parallel using openmp. #36

Optimize softmax cpu by parallel using openmp. #36

Conversation

zhanghuanrong commented Nov 27, 2018

ke1337 Nov 27, 2018 • edited Loading

Choose a reason for hiding this comment

zhanghuanrong Nov 28, 2018

Choose a reason for hiding this comment

ke1337 Nov 28, 2018

Choose a reason for hiding this comment

zhanghuanrong Nov 28, 2018

Choose a reason for hiding this comment

ke1337 left a comment

Choose a reason for hiding this comment

ke1337 Nov 27, 2018 •

edited

Loading