INT8 量化模型在 ARM Android上的性能严重缩水，无法复现在benchmark.md中的加速比 #10437

YingkunZhou · 2024-01-23T06:11:02Z

按照官方文档https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/docs/performance/benchmark_tools.md

./lite/tools/build_android.sh --toolchain=clang --with_benchmark=ON full_publish

编译出benchmark_bin，在Android的termux上运行

使用的模型也是官方提供

Resnet50: https://paddle-inference-dist.bj.bcebos.com/AI-Rank/mobile/ResNet50.tar.gz
ResNet50_quant: https://paddle-inference-dist.bj.bcebos.com/AI-Rank/mobile/ResNet50_quant.tar.gz

使用的开发板是khadas edge2, 其中大核是cortex-a76，和 https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/docs/performance/benchmark.md 中提到的Xiaomi MI9 骁龙 855 相同，主频我锁定在了2.2GHz

下面是运行的log

Resnet50

./benchmark_bin --uncombined_model_dir=ResNet50 --input_shape=1,3,224,224 --backend=arm --repeats=20 --warmup=5 

======= Opt Info =======
Load paddle model from ResNet50
Save optimized model to ResNet50/opt.nb

======= Device Info =======
Brand: rockchip
Device: kedge2
Model: Edge2
Android Version: 13
Android API Level: 33

======= Model Info =======
optimized_model_file: ResNet50/opt.nb
input_data_path: All 1.f
input_shape: 1,3,224,224
output tensor num: 1
--- output tensor 0 ---
output shape(NCHW): 1 1000 
output tensor 0 elem num: 1000
output tensor 0 mean value: 0.001
output tensor 0 standard deviation: 0.00187264

======= Runtime Info =======
benchmark_bin version: 3c61295
threads: 1
power_mode: 0
warmup: 5
repeats: 20
result_path: 

======= Backend Info =======
backend: arm
cpu precision: fp32

======= Perf Info =======
Time(unit: ms):
init  = 92.089      
first = 592.211     
min   = 215.645     
max   = 216.656     
avg   = 215.811

ResNet50_quant

./benchmark_bin --uncombined_model_dir=ResNet50_quant --input_shape=1,3,224,224 --backend=arm --repeats=20 --warmup=5                                        

======= Opt Info =======
Load paddle model from ResNet50_quant
Save optimized model to ResNet50_quant/opt.nb

======= Device Info =======
Brand: rockchip
Device: kedge2
Model: Edge2
Android Version: 13
Android API Level: 33

======= Model Info =======
optimized_model_file: ResNet50_quant/opt.nb
input_data_path: All 1.f
input_shape: 1,3,224,224
output tensor num: 1
--- output tensor 0 ---
output shape(NCHW): 1 1000 
output tensor 0 elem num: 1000
output tensor 0 mean value: 0.001
output tensor 0 standard deviation: 0.0101229

======= Runtime Info =======
benchmark_bin version: 3c61295
threads: 1
power_mode: 0
warmup: 5
repeats: 20
result_path: 

======= Backend Info =======
backend: arm
cpu precision: fp32

======= Perf Info =======
Time(unit: ms):
init  = 35.849      
first = 532.699     
min   = 281.046     
max   = 281.470     
avg   = 281.205

概括一下：cortex-a76 2.2GHz

Resnet50 (fp32): 216 ms
ResNet50_quant (int8): 281 ms

而官方文档中给的MI9的性能数据为：cortex-a76 2.84GHz

Resnet50 (fp32): 163 ms (163*2.84/2.2 = 210ms 和我edge2上实测对上了)
ResNet50_quant (int8): 67 ms (67*2.84/2.2 = 86.5ms 远远小于我实际测试的281 ms)

qili93 · 2024-02-05T13:23:57Z

感谢您的反馈，我们会定位下这个具体的性能下降是什么原因导致的。您方便提供下您使用的PaddleLite的版本信息吗？谢谢！

YingkunZhou · 2024-02-16T09:29:19Z

您好，就是默认分支的最新commit: 3c61295

cmcamdy closed this as completed Jan 24, 2025

cmcamdy reopened this Jan 24, 2025

paddle-bot bot added the status/reopen label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT8 量化模型在 ARM Android上的性能严重缩水，无法复现在benchmark.md中的加速比 #10437

INT8 量化模型在 ARM Android上的性能严重缩水，无法复现在benchmark.md中的加速比 #10437

YingkunZhou commented Jan 23, 2024 •

edited

Loading

qili93 commented Feb 5, 2024

YingkunZhou commented Feb 16, 2024

INT8 量化模型在 ARM Android上的性能严重缩水，无法复现在benchmark.md中的加速比 #10437

INT8 量化模型在 ARM Android上的性能严重缩水，无法复现在benchmark.md中的加速比 #10437

Comments

YingkunZhou commented Jan 23, 2024 • edited Loading

qili93 commented Feb 5, 2024

YingkunZhou commented Feb 16, 2024

YingkunZhou commented Jan 23, 2024 •

edited

Loading