multiheadattention int8 quantization #5733

nihui · 2024-10-14T11:53:56Z

…ize-r

codecov-commenter · 2024-10-15T08:52:36Z

Codecov Report

Attention: Patch coverage is 98.03279% with 6 lines in your changes missing coverage. Please review.

Project coverage is 94.68%. Comparing base (1c7af00) to head (85ddd8a).
Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layer/multiheadattention.cpp	97.35%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5733      +/-   ##
==========================================
- Coverage   95.15%   94.68%   -0.47%     
==========================================
  Files         793      553     -240     
  Lines      270315   200255   -70060     
==========================================
- Hits       257218   189618   -67600     
+ Misses      13097    10637    -2460

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nihui · 2024-10-16T07:25:08Z

mha * 12 (ms)	fp32	fp16	bf16 (disabled atm)	int8/fp32	int8/fp16	int8/bf16 (disabled atm)
RES (kb)	51180	27816	27744	15380	15744	15548
qcom855plus big core 1t	34.68	18.59	46.30	11.69	11.70	11.94
qcom855plus small core 1t	188.14	104.37	176.96	62.78	59.10	61.01
mtk9000 big core 1t	60.12	30.83	69.63	13.00	12.77	12.89
mtk9000 small core 1t	153.16	103.58	181.27	49.55	45.01	50.39
imx7d 1t	863.34		1987.19	452.76		445.55

thus, armv7 gemm needs inline assembly optimization

github-actions bot added test layer arm labels Oct 14, 2024

nihui and others added 3 commits October 14, 2024 20:01

mha int8

07b739e

add test

9896fd8

apply code-format changes

5deab1a

github-actions bot added tool doc labels Oct 15, 2024

wip

0f0f310

github-actions bot added vulkan x86 labels Oct 15, 2024

nihui and others added 2 commits October 15, 2024 07:28

apply code-format changes

6e765a2

x86 vulkan fallback

8113cb8

nihui closed this Oct 15, 2024

nihui reopened this Oct 15, 2024

nihui changed the title ~~[WIP] multiheadattention int8 quantization~~ multiheadattention int8 quantization Oct 15, 2024

nihui merged commit 66b54cb into Tencent:master Oct 15, 2024
60 of 67 checks passed

nihui added 2 commits October 15, 2024 16:45

comment about bf16s

f50f277

Merge branch 'mha-quantize-r' of github.com:nihui/ncnn into mha-quant…

85ddd8a

…ize-r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiheadattention int8 quantization #5733

multiheadattention int8 quantization #5733

nihui commented Oct 14, 2024 •

edited

Loading

codecov-commenter commented Oct 15, 2024 •

edited

Loading

nihui commented Oct 16, 2024 •

edited

Loading

multiheadattention int8 quantization #5733

multiheadattention int8 quantization #5733

Conversation

nihui commented Oct 14, 2024 • edited Loading

codecov-commenter commented Oct 15, 2024 • edited Loading

Codecov Report

nihui commented Oct 16, 2024 • edited Loading

nihui commented Oct 14, 2024 •

edited

Loading

codecov-commenter commented Oct 15, 2024 •

edited

Loading

nihui commented Oct 16, 2024 •

edited

Loading