Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT8 量化模型在 ARM Android上的性能严重缩水,无法复现在benchmark.md中的加速比 #10437

Open
YingkunZhou opened this issue Jan 23, 2024 · 2 comments

Comments

@YingkunZhou
Copy link

YingkunZhou commented Jan 23, 2024

按照官方文档https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/docs/performance/benchmark_tools.md

./lite/tools/build_android.sh --toolchain=clang --with_benchmark=ON full_publish

编译出benchmark_bin,在Android的termux上运行

使用的模型也是官方提供

使用的开发板是khadas edge2, 其中大核是cortex-a76, 和 https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/docs/performance/benchmark.md 中提到的Xiaomi MI9 骁龙 855 相同,主频我锁定在了2.2GHz

下面是运行的log

Resnet50
./benchmark_bin --uncombined_model_dir=ResNet50 --input_shape=1,3,224,224 --backend=arm --repeats=20 --warmup=5 

======= Opt Info =======
Load paddle model from ResNet50
Save optimized model to ResNet50/opt.nb

======= Device Info =======
Brand: rockchip
Device: kedge2
Model: Edge2
Android Version: 13
Android API Level: 33

======= Model Info =======
optimized_model_file: ResNet50/opt.nb
input_data_path: All 1.f
input_shape: 1,3,224,224
output tensor num: 1
--- output tensor 0 ---
output shape(NCHW): 1 1000 
output tensor 0 elem num: 1000
output tensor 0 mean value: 0.001
output tensor 0 standard deviation: 0.00187264

======= Runtime Info =======
benchmark_bin version: 3c61295
threads: 1
power_mode: 0
warmup: 5
repeats: 20
result_path: 

======= Backend Info =======
backend: arm
cpu precision: fp32

======= Perf Info =======
Time(unit: ms):
init  = 92.089      
first = 592.211     
min   = 215.645     
max   = 216.656     
avg   = 215.811
ResNet50_quant
./benchmark_bin --uncombined_model_dir=ResNet50_quant --input_shape=1,3,224,224 --backend=arm --repeats=20 --warmup=5                                        

======= Opt Info =======
Load paddle model from ResNet50_quant
Save optimized model to ResNet50_quant/opt.nb

======= Device Info =======
Brand: rockchip
Device: kedge2
Model: Edge2
Android Version: 13
Android API Level: 33

======= Model Info =======
optimized_model_file: ResNet50_quant/opt.nb
input_data_path: All 1.f
input_shape: 1,3,224,224
output tensor num: 1
--- output tensor 0 ---
output shape(NCHW): 1 1000 
output tensor 0 elem num: 1000
output tensor 0 mean value: 0.001
output tensor 0 standard deviation: 0.0101229

======= Runtime Info =======
benchmark_bin version: 3c61295
threads: 1
power_mode: 0
warmup: 5
repeats: 20
result_path: 

======= Backend Info =======
backend: arm
cpu precision: fp32

======= Perf Info =======
Time(unit: ms):
init  = 35.849      
first = 532.699     
min   = 281.046     
max   = 281.470     
avg   = 281.205

概括一下:cortex-a76 2.2GHz

  • Resnet50 (fp32): 216 ms
  • ResNet50_quant (int8): 281 ms

而官方文档中给的MI9的性能数据为:cortex-a76 2.84GHz

  • Resnet50 (fp32): 163 ms (163*2.84/2.2 = 210ms 和我edge2上实测对上了)
  • ResNet50_quant (int8): 67 ms (67*2.84/2.2 = 86.5ms 远远小于我实际测试的281 ms)
@qili93
Copy link
Collaborator

qili93 commented Feb 5, 2024

感谢您的反馈,我们会定位下这个具体的性能下降是什么原因导致的。您方便提供下您使用的PaddleLite的版本信息吗?谢谢!

@YingkunZhou
Copy link
Author

您好,就是默认分支的最新commit: 3c61295

@cmcamdy cmcamdy closed this as completed Jan 24, 2025
@cmcamdy cmcamdy reopened this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants