CPU: ncnn, ONNXRuntime, OpenVINO
GPU: ncnn, TensorRT, PPLNN
- Ubuntu 18.04
- ncnn 20211208
- Cuda 11.3
- TensorRT 7.2.3.4
- Docker 20.10.8
- NVIDIA tesla T4 tensor core GPU for TensorRT
- 静态图导出
- batch 大小为 1
- 测试时,计算各个数据集中 100 张图片的平均耗时
用户可以直接通过model profiling获得想要的速度测试结果。下面是我们环境中的测试结果:
mmcls |
TensorRT(ms) |
PPLNN(ms) |
ncnn(ms) |
Ascend(ms) |
model |
spatial |
T4 |
JetsonNano2GB |
Jetson TX2 |
T4 |
SnapDragon888 |
Adreno660 |
Ascend310 |
fp32 |
fp16 |
int8 |
fp32 |
fp16 |
fp32 |
fp16 |
fp32 |
fp32 |
fp32 |
ResNet |
224x224 |
2.97 |
1.26 |
1.21 |
59.32 |
30.54 |
24.13 |
1.30 |
33.91 |
25.93 |
2.49 |
ResNeXt |
224x224 |
4.31 |
1.42 |
1.37 |
88.10 |
49.18 |
37.45 |
1.36 |
133.44 |
69.38 |
- |
SE-ResNet |
224x224 |
3.41 |
1.66 |
1.51 |
74.59 |
48.78 |
29.62 |
1.91 |
107.84 |
80.85 |
- |
ShuffleNetV2 |
224x224 |
1.37 |
1.19 |
1.13 |
15.26 |
10.23 |
7.37 |
4.69 |
9.55 |
10.66 |
- |
mmdet part1 |
TensorRT(ms) |
PPLNN(ms) |
model |
spatial |
T4 |
Jetson TX2 |
T4 |
fp32 |
fp16 |
int8 |
fp32 |
fp16 |
YOLOv3 |
320x320 |
14.76 |
24.92 |
24.92 |
- |
18.07 |
SSD-Lite |
320x320 |
8.84 |
9.21 |
8.04 |
1.28 |
19.72 |
RetinaNet |
800x1344 |
97.09 |
25.79 |
16.88 |
780.48 |
38.34 |
FCOS |
800x1344 |
84.06 |
23.15 |
17.68 |
- |
- |
FSAF |
800x1344 |
82.96 |
21.02 |
13.50 |
- |
30.41 |
Faster R-CNN |
800x1344 |
88.08 |
26.52 |
19.14 |
733.81 |
65.40 |
Mask R-CNN |
800x1344 |
104.83 |
58.27 |
- |
- |
86.80 |
mmdet part2 |
ncnn |
model |
spatial |
SnapDragon888 |
Adreno660 |
fp32 |
fp32 |
MobileNetv2-YOLOv3 |
320x320 |
48.57 |
66.55 |
SSD-Lite |
320x320 |
44.91 |
66.19 |
YOLOX |
416x416 |
111.60 |
134.50 |
mmedit |
TensorRT(ms) |
PPLNN(ms) |
model |
spatial |
T4 |
Jetson TX2 |
T4 |
fp32 |
fp16 |
int8 |
fp32 |
fp16 |
ESRGAN |
32x32 |
12.64 |
12.42 |
12.45 |
- |
7.67 |
SRCNN |
32x32 |
0.70 |
0.35 |
0.26 |
58.86 |
0.56 |
mmocr |
TensorRT(ms) |
PPLNN(ms) |
ncnn(ms) |
model |
spatial |
T4 |
T4 |
SnapDragon888 |
Adreno660 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
fp32 |
DBNet |
640x640 |
10.70 |
5.62 |
5.00 |
34.84 |
- |
- |
CRNN |
32x32 |
1.93 |
1.40 |
1.36 |
- |
10.57 |
20.00 |
mmseg |
TensorRT(ms) |
PPLNN(ms) |
model |
spatial |
T4 |
Jetson TX2 |
T4 |
fp32 |
fp16 |
int8 |
fp32 |
fp16 |
FCN |
512x1024 |
128.42 |
23.97 |
18.13 |
1682.54 |
27.00 |
PSPNet |
1x3x512x1024 |
119.77 |
24.10 |
16.33 |
1586.19 |
27.26 |
DeepLabV3 |
512x1024 |
226.75 |
31.80 |
19.85 |
- |
36.01 |
DeepLabV3+ |
512x1024 |
151.25 |
47.03 |
50.38 |
2534.96 |
34.80 |
mmcls |
PyTorch |
TorchScript |
ONNX Runtime |
TensorRT |
PPLNN |
Ascend |
model |
metric |
fp32 |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
ResNet-18 |
top-1 |
69.90 |
69.90 |
69.88 |
69.88 |
69.86 |
69.86 |
69.86 |
69.91 |
top-5 |
89.43 |
89.43 |
89.34 |
89.34 |
89.33 |
89.38 |
89.34 |
89.43 |
ResNeXt-50 |
top-1 |
77.90 |
77.90 |
77.90 |
77.90 |
- |
77.78 |
77.89 |
- |
top-5 |
93.66 |
93.66 |
93.66 |
93.66 |
- |
93.64 |
93.65 |
- |
SE-ResNet-50 |
top-1 |
77.74 |
77.74 |
77.74 |
77.74 |
77.75 |
77.63 |
77.73 |
- |
top-5 |
93.84 |
93.84 |
93.84 |
93.84 |
93.83 |
93.72 |
93.84 |
- |
ShuffleNetV1 1.0x |
top-1 |
68.13 |
68.13 |
68.13 |
68.13 |
68.13 |
67.71 |
68.11 |
- |
top-5 |
87.81 |
87.81 |
87.81 |
87.81 |
87.81 |
87.58 |
87.80 |
- |
ShuffleNetV2 1.0x |
top-1 |
69.55 |
69.55 |
69.55 |
69.55 |
69.54 |
69.10 |
69.54 |
- |
top-5 |
88.92 |
88.92 |
88.92 |
88.92 |
88.91 |
88.58 |
88.92 |
- |
MobileNet V2 |
top-1 |
71.86 |
71.86 |
71.86 |
71.86 |
71.87 |
70.91 |
71.84 |
71.87 |
top-5 |
90.42 |
90.42 |
90.42 |
90.42 |
90.40 |
89.85 |
90.41 |
90.42 |
Vision Transformer |
top-1 |
85.43 |
85.43 |
- |
85.43 |
85.42 |
- |
- |
85.43 |
top-5 |
97.77 |
97.77 |
- |
97.77 |
97.76 |
- |
- |
97.77 |
mmdet |
Pytorch |
TorchScript |
ONNXRuntime |
TensorRT |
PPLNN |
Ascend |
model |
task |
dataset |
metric |
fp32 |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
YOLOV3 |
Object Detection |
COCO2017 |
box AP |
33.7 |
33.7 |
- |
33.5 |
33.5 |
33.5 |
- |
- |
SSD |
Object Detection |
COCO2017 |
box AP |
25.5 |
25.5 |
- |
25.5 |
25.5 |
- |
- |
- |
RetinaNet |
Object Detection |
COCO2017 |
box AP |
36.5 |
36.4 |
- |
36.4 |
36.4 |
36.3 |
36.5 |
36.4 |
FCOS |
Object Detection |
COCO2017 |
box AP |
36.6 |
- |
- |
36.6 |
36.5 |
- |
- |
- |
FSAF |
Object Detection |
COCO2017 |
box AP |
37.4 |
37.4 |
- |
37.4 |
37.4 |
37.2 |
37.4 |
- |
YOLOX |
Object Detection |
COCO2017 |
box AP |
40.5 |
40.3 |
- |
40.3 |
40.3 |
29.3 |
- |
- |
Faster R-CNN |
Object Detection |
COCO2017 |
box AP |
37.4 |
37.3 |
- |
37.3 |
37.3 |
37.1 |
37.3 |
- |
ATSS |
Object Detection |
COCO2017 |
box AP |
39.4 |
- |
- |
39.4 |
39.4 |
- |
- |
- |
Cascade R-CNN |
Object Detection |
COCO2017 |
box AP |
40.4 |
- |
- |
40.4 |
40.4 |
- |
40.4 |
- |
GFL |
Object Detection |
COCO2017 |
box AP |
40.2 |
- |
40.2 |
40.2 |
40.0 |
- |
- |
- |
RepPoints |
Object Detection |
COCO2017 |
box AP |
37.0 |
- |
- |
36.9 |
- |
- |
- |
- |
DETR |
Object Detection |
COCO2017 |
box AP |
40.1 |
40.1 |
- |
40.1 |
40.1 |
- |
- |
Mask R-CNN |
Instance Segmentation |
COCO2017 |
box AP |
38.2 |
38.1 |
- |
38.1 |
38.1 |
- |
38.0 |
- |
mask AP |
34.7 |
34.7 |
- |
33.7 |
33.7 |
- |
- |
- |
Swin-Transformer |
Instance Segmentation |
COCO2017 |
box AP |
42.7 |
- |
42.7 |
42.5 |
37.7 |
- |
- |
- |
mask AP |
39.3 |
- |
39.3 |
39.3 |
35.4 |
- |
- |
- |
mmedit |
Pytorch |
TorchScript |
ONNX Runtime |
TensorRT |
PPLNN |
NCNN |
model |
task |
dataset |
metric |
fp32 |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
int8 |
SRCNN |
Super Resolution |
Set5 |
PSNR |
28.4316 |
28.4120 |
28.4323 |
28.4323 |
28.4286 |
28.1995 |
28.4311 |
- |
- |
SSIM |
0.8099 |
0.8106 |
0.8097 |
0.8097 |
0.8096 |
0.7934 |
0.8096 |
- |
- |
ESRGAN |
Super Resolution |
Set5 |
PSNR |
28.2700 |
28.2619 |
28.2592 |
28.2592 |
- |
- |
28.2624 |
- |
- |
SSIM |
0.7778 |
0.7784 |
0.7764 |
0.7774 |
- |
- |
0.7765 |
- |
- |
ESRGAN-PSNR |
Super Resolution |
Set5 |
PSNR |
30.6428 |
30.6306 |
30.6444 |
30.6430 |
- |
- |
27.0426 |
- |
- |
SSIM |
0.8559 |
0.8565 |
0.8558 |
0.8558 |
- |
- |
0.8557 |
- |
- |
SRGAN |
Super Resolution |
Set5 |
PSNR |
27.9499 |
27.9252 |
27.9408 |
27.9408 |
- |
- |
27.9388 |
- |
- |
SSIM |
0.7846 |
0.7851 |
0.7839 |
0.7839 |
- |
- |
0.7839 |
- |
- |
SRResNet |
Super Resolution |
Set5 |
PSNR |
30.2252 |
30.2069 |
30.2300 |
30.2300 |
- |
- |
30.2294 |
- |
- |
SSIM |
0.8491 |
0.8497 |
0.8488 |
0.8488 |
- |
- |
0.8488 |
- |
- |
Real-ESRNet |
Super Resolution |
Set5 |
PSNR |
28.0297 |
- |
27.7016 |
27.7016 |
- |
- |
27.7049 |
- |
- |
SSIM |
0.8236 |
- |
0.8122 |
0.8122 |
- |
- |
0.8123 |
- |
- |
EDSRx4 |
Super Resolution |
Set5 |
PSNR |
30.2223 |
30.2192 |
30.2214 |
30.2214 |
30.2211 |
30.1383 |
- |
30.2194 |
29.9340 |
SSIM |
0.8500 |
0.8507 |
0.8497 |
0.8497 |
0.8497 |
0.8469 |
- |
0.8498 |
0.8409 |
EDSRx2 |
Super Resolution |
Set5 |
PSNR |
35.7592 |
- |
- |
- |
- |
- |
- |
35.7733 |
35.4266 |
SSIM |
0.9372 |
- |
- |
- |
- |
- |
- |
0.9365 |
0.9334 |
mmocr |
Pytorch |
TorchScript |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
model |
task |
dataset |
metric |
fp32 |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
DBNet* |
TextDetection |
ICDAR2015 |
recall |
0.7310 |
0.7308 |
0.7304 |
0.7198 |
0.7179 |
0.7111 |
0.7304 |
0.7309 |
precision |
0.8714 |
0.8718 |
0.8714 |
0.8677 |
0.8674 |
0.8688 |
0.8718 |
0.8714 |
hmean |
0.7950 |
0.7949 |
0.7950 |
0.7868 |
0.7856 |
0.7821 |
0.7949 |
0.7950 |
PSENet |
TextDetection |
ICDAR2015 |
recall |
0.7526 |
0.7526 |
0.7526 |
0.7526 |
0.7520 |
0.7496 |
- |
0.7526 |
precision |
0.8669 |
0.8669 |
0.8669 |
0.8669 |
0.8668 |
0.8550 |
- |
0.8669 |
hmean |
0.8057 |
0.8057 |
0.8057 |
0.8057 |
0.8054 |
0.7989 |
- |
0.8057 |
PANet |
TextDetection |
ICDAR2015 |
recall |
0.7401 |
0.7401 |
0.7401 |
0.7357 |
0.7366 |
- |
- |
0.7401 |
precision |
0.8601 |
0.8601 |
0.8601 |
0.8570 |
0.8586 |
- |
- |
0.8601 |
hmean |
0.7955 |
0.7955 |
0.7955 |
0.7917 |
0.7930 |
- |
- |
0.7955 |
CRNN |
TextRecognition |
IIIT5K |
acc |
0.8067 |
0.8067 |
0.8067 |
0.8067 |
0.8063 |
0.8067 |
0.8067 |
- |
SAR |
TextRecognition |
IIIT5K |
acc |
0.9517 |
- |
0.9287 |
- |
- |
- |
- |
- |
SATRN |
TextRecognition |
IIIT5K |
acc |
0.9470 |
0.9487 |
0.9487 |
0.9487 |
0.9483 |
0.9483 |
- |
- |
mmseg |
Pytorch |
TorchScript |
ONNXRuntime |
TensorRT |
PPLNN |
Ascend |
model |
dataset |
metric |
fp32 |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
FCN |
Cityscapes |
mIoU |
72.25 |
72.36 |
- |
72.36 |
72.35 |
74.19 |
72.35 |
72.35 |
PSPNet |
Cityscapes |
mIoU |
78.55 |
78.66 |
- |
78.26 |
78.24 |
77.97 |
78.09 |
78.67 |
deeplabv3 |
Cityscapes |
mIoU |
79.09 |
79.12 |
- |
79.12 |
79.12 |
78.96 |
79.12 |
79.06 |
deeplabv3+ |
Cityscapes |
mIoU |
79.61 |
79.60 |
- |
79.60 |
79.60 |
79.43 |
79.60 |
79.51 |
Fast-SCNN |
Cityscapes |
mIoU |
70.96 |
70.96 |
- |
70.93 |
70.92 |
66.00 |
70.92 |
- |
UNet |
Cityscapes |
mIoU |
69.10 |
- |
- |
69.10 |
69.10 |
68.95 |
- |
- |
ANN |
Cityscapes |
mIoU |
77.40 |
- |
- |
77.32 |
77.32 |
- |
- |
- |
APCNet |
Cityscapes |
mIoU |
77.40 |
- |
- |
77.32 |
77.32 |
- |
- |
- |
BiSeNetV1 |
Cityscapes |
mIoU |
74.44 |
- |
- |
74.44 |
74.43 |
- |
- |
- |
BiSeNetV2 |
Cityscapes |
mIoU |
73.21 |
- |
- |
73.21 |
73.21 |
- |
- |
- |
CGNet |
Cityscapes |
mIoU |
68.25 |
- |
- |
68.27 |
68.27 |
- |
- |
- |
EMANet |
Cityscapes |
mIoU |
77.59 |
- |
- |
77.59 |
77.6 |
- |
- |
- |
EncNet |
Cityscapes |
mIoU |
75.67 |
- |
- |
75.66 |
75.66 |
- |
- |
- |
ERFNet |
Cityscapes |
mIoU |
71.08 |
- |
- |
71.08 |
71.07 |
- |
- |
- |
FastFCN |
Cityscapes |
mIoU |
79.12 |
- |
- |
79.12 |
79.12 |
- |
- |
- |
GCNet |
Cityscapes |
mIoU |
77.69 |
- |
- |
77.69 |
77.69 |
- |
- |
- |
ICNet |
Cityscapes |
mIoU |
76.29 |
- |
- |
76.36 |
76.36 |
- |
- |
- |
ISANet |
Cityscapes |
mIoU |
78.49 |
- |
- |
78.49 |
78.49 |
- |
- |
- |
OCRNet |
Cityscapes |
mIoU |
74.30 |
- |
- |
73.66 |
73.67 |
- |
- |
- |
PointRend |
Cityscapes |
mIoU |
76.47 |
76.47 |
- |
76.41 |
76.42 |
- |
- |
- |
Semantic FPN |
Cityscapes |
mIoU |
74.52 |
- |
- |
74.52 |
74.52 |
- |
- |
- |
STDC |
Cityscapes |
mIoU |
75.10 |
- |
- |
75.10 |
75.10 |
- |
- |
- |
STDC |
Cityscapes |
mIoU |
77.17 |
- |
- |
77.17 |
77.17 |
- |
- |
- |
UPerNet |
Cityscapes |
mIoU |
77.10 |
- |
- |
77.19 |
77.18 |
- |
- |
- |
Segmenter |
ADE20K |
mIoU |
44.32 |
44.29 |
44.29 |
44.29 |
43.34 |
43.35 |
- |
- |
mmpose |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
model |
task |
dataset |
metric |
fp32 |
fp32 |
fp32 |
fp16 |
fp16 |
fp32 |
HRNet |
Pose Detection |
COCO |
AP |
0.748 |
0.748 |
0.748 |
0.748 |
- |
0.748 |
AR |
0.802 |
0.802 |
0.802 |
0.802 |
- |
0.802 |
LiteHRNet |
Pose Detection |
COCO |
AP |
0.663 |
0.663 |
0.663 |
- |
- |
0.663 |
AR |
0.728 |
0.728 |
0.728 |
- |
- |
0.728 |
MSPN |
Pose Detection |
COCO |
AP |
0.762 |
0.762 |
0.762 |
0.762 |
- |
0.762 |
AR |
0.825 |
0.825 |
0.825 |
0.825 |
- |
0.825 |
Hourglass |
Pose Detection |
COCO |
AP |
0.717 |
0.717 |
0.717 |
0.717 |
- |
0.717 |
AR |
0.774 |
0.774 |
0.774 |
0.774 |
- |
0.774 |
mmrotate |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
model |
task |
dataset |
metrics |
fp32 |
fp32 |
fp32 |
fp16 |
fp16 |
fp32 |
RotatedRetinaNet |
Rotated Detection |
DOTA-v1.0 |
mAP |
0.698 |
0.698 |
0.698 |
0.697 |
- |
- |
Oriented RCNN |
Rotated Detection |
DOTA-v1.0 |
mAP |
0.756 |
0.756 |
0.758 |
0.730 |
- |
- |
GlidingVertex |
Rotated Detection |
DOTA-v1.0 |
mAP |
0.732 |
- |
0.733 |
0.731 |
- |
- |
RoI Transformer |
Rotated Detection |
DOTA-v1.0 |
mAP |
0.761 |
- |
0.758 |
- |
- |
- |
mmaction2 |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
model |
task |
dataset |
metrics |
fp32 |
fp32 |
fp32 |
fp16 |
fp16 |
fp32 |
TSN |
Recognition |
Kinetics-400 |
top-1 |
69.71 |
- |
69.71 |
- |
- |
- |
top-5 |
88.75 |
- |
88.75 |
- |
- |
- |
SlowFast |
Recognition |
Kinetics-400 |
top-1 |
74.45 |
- |
75.62 |
- |
- |
- |
top-5 |
91.55 |
- |
92.10 |
- |
- |
- |
- 由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的
- TensorRT 的一些 int8 性能基准测试需要有 tensor core 的 Nvidia 卡,否则性能会大幅下降
- DBNet 在模型
neck
使用了nearest
插值,TensorRT-7 用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容,我们重写了neck
以使用bilinear
插值,这提高了检测性能。为了获得与 Pytorch 匹配的性能,推荐使用 TensorRT-8+,其插值方法与 Pytorch 相同。
- 对于 mmpose 模型,在模型配置文件中
flip_test
需设置为 False
- 部分模型在 fp16 模式下可能存在较大的精度损失,请根据具体情况对模型进行调整。