-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ARM_CPU] Conv2d int8 intrinsic for cortex-A72 (#10310)
* [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 Add an intrinsic that performs a dot product of 8 4-element vectors at once. Also conditionally inline fused operators into the main convolution loop depending on convolutions size. Small convolution = no inlining. Performance improves by ~20% on mobilenet on raspberry pi 4 and ~30% improvement on performance for the individual convolutions. * ignore incorrect lints * fixup fstring * revert changes to conv2d_NCHWc (not int8) * remove error check, apparently tests rely on it * refactor alter op layout
- Loading branch information
Tristan Konolige
authored
Feb 23, 2022
1 parent
97648d8
commit 6c6e873
Showing
10 changed files
with
679 additions
and
226 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.