Skip to content

Latest commit

 

History

History
1431 lines (1411 loc) · 26.4 KB

full_model_list.md

File metadata and controls

1431 lines (1411 loc) · 26.4 KB

Full Validated Models

The below tables are models enabled by the Intel® Neural Compressor.

TensorFlow 2.x models

Framework Version Model Accuracy Performance
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] INT8 realtime(ms)
CLX8280 1s 4c per instance
FP32 realtime(ms)
CLX8280 1s 4c per instance
Realtime Latency Ratio[FP32/INT8]
tensorflow 2.5.0 resnet50v1.0 74.24% 74.27% -0.04% 7.64 21.54 2.82x
tensorflow 2.5.0 resnet50v1.5 76.94% 76.46% 0.63% 9.54 24.28 2.54x
tensorflow 2.5.0 resnet101 77.21% 76.45% 0.99% 12.92 30.65 2.37x
tensorflow 2.5.0 inception_v1 70.30% 69.74% 0.80% 5.58 10.13 1.82x
tensorflow 2.5.0 inception_v2 74.27% 73.97% 0.41% 6.78 12.42 1.83x
tensorflow 2.5.0 inception_v3 77.29% 76.75% 0.70% 12.90 27.74 2.15x
tensorflow 2.5.0 inception_v4 80.36% 80.27% 0.11% 21.00 54.42 2.59x
tensorflow 2.5.0 inception_resnet_v2 80.42% 80.40% 0.02% 44.72 87.62 1.96x
tensorflow 2.5.0 mobilenetv1 73.93% 70.96% 4.19% 2.96 9.88 3.34x
tensorflow 2.5.0 mobilenetv2 71.96% 71.76% 0.28% 4.95 10.71 2.16x
tensorflow 2.5.0 ssd_resnet50_v1 37.91% 38.00% -0.24% 145.96 422.11 2.89x
tensorflow 2.5.0 ssd_mobilenet_v1 23.02% 23.13% -0.48% 12.19 26.85 2.20x
tensorflow 2.5.0 faster_rcnn_resnet101 30.33% 30.38% -0.16% 152.71 541.75 3.55x
tensorflow 2.5.0 faster_rcnn_resnet101_saved 30.37% 30.38% -0.03% 151.55 613.76 4.05x
tensorflow 2.5.0 mask_rcnn_inception_v2 28.61% 28.73% -0.42% 77.73 201.69 2.59x
tensorflow 2.5.0 wide_deep_large_ds 77.61% 77.67% -0.08% 1.24 1.86 1.50x
tensorflow 2.5.0 vgg16 72.13% 70.89% 1.75% 16.91 61.21 3.62x
tensorflow 2.5.0 vgg19 72.35% 71.01% 1.89% 20.58 74.47 3.62x
tensorflow 2.5.0 resnetv2_50 70.36% 69.64% 1.03% 15.20 18.59 1.22x
tensorflow 2.5.0 resnetv2_101 72.58% 71.87% 0.99% 25.54 34.33 1.34x
tensorflow 2.5.0 resnetv2_152 72.92% 72.37% 0.76% 37.25 49.86 1.34x
tensorflow 2.5.0 densenet121 72.31% 72.89% -0.80% 30.56 44.87 1.47x
tensorflow 2.5.0 densenet161 76.36% 76.29% 0.09% 53.69 85.54 1.59x
tensorflow 2.5.0 densenet169 74.49% 74.65% -0.21% 39.50 56.68 1.44x
tensorflow 2.5.0 ssd_resnet50_v1_ckpt 37.89% 38.00% -0.29% 142.82 481.75 3.37x
tensorflow 2.5.0 ssd_mobilenet_v1_ckpt 23.02% 23.13% -0.48% 12.22 32.22 2.64x
tensorflow 2.5.0 mask_rcnn_inception_v2_ckpt 28.61% 28.73% -0.42% 82.38 204.74 2.49x
tensorflow 2.5.0 efficientnet_b0 78.53% 76.75% 2.32% 26.23 27.53 1.05x
tensorflow 2.5.0 resnet50_fashion 78.05% 78.12% -0.09% 3.11 6.89 2.22x

TensorFlow 1.x models

Framework Version Model Accuracy Performance
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] INT8 realtime(ms)
CLX8280 1s 4c per instance
FP32 realtime(ms)
CLX8280 1s 4c per instance
Realtime Latency Ratio[FP32/INT8]
tensorflow 1.15.0-up3 bert_large_squad 92.35 92.98 -0.67% 397.58 875.35 2.20x
tensorflow 1.15.0-up3 bert_base_mrpc 86.03% 86.52% -0.57% 42.25 75.95 1.80x
tensorflow 1.15.0-up3 resnet_v1_50_slim 76.03% 75.18% 1.13% 7.07 23.60 3.34x
tensorflow 1.15.0-up3 resnet_v1_101_slim 77.12% 76.40% 0.94% 12.53 43.21 3.45x
tensorflow 1.15.0-up3 resnet_v1_152_slim 77.58% 76.81% 1.00% 17.76 65.32 3.68x
tensorflow 1.15.0-up3 inception_v1_slim 70.41% 69.77% 0.92% 5.62 12.09 2.15x
tensorflow 1.15.0-up3 inception_v2_slim 74.38% 73.98% 0.54% 6.82 14.40 2.11x
tensorflow 1.15.0-up3 inception_v3_slim 78.32% 77.99% 0.42% 11.63 31.22 2.68x
tensorflow 1.15.0-up3 inception_v4_slim 80.35% 80.19% 0.20% 21.63 62.51 2.89x
tensorflow 1.15.0-up3 vgg16_slim 72.16% 70.89% 1.79% 17.09 60.87 3.56x
tensorflow 1.15.0-up3 vgg19_slim 72.22% 71.01% 1.70% 20.46 73.54 3.59x
tensorflow 1.15.0-up3 resnetv2_50_slim 70.36% 69.72% 0.92% 13.25 19.39 1.46x
tensorflow 1.15.0-up3 resnetv2_101_slim 72.59% 71.91% 0.95% 23.21 35.98 1.55x
tensorflow 1.15.0-up3 resnetv2_152_slim 72.93% 72.40% 0.73% 33.40 52.74 1.58x

PyTorch models

Framework Version Model Accuracy Performance
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] INT8 realtime(ms)
CLX8280 1s 4c per instance
FP32 realtime(ms)
CLX8280 1s 4c per instance
Realtime Latency Ratio[FP32/INT8]
pytorch 1.9.0+cpu resnet18 69.58% 69.76% -0.26% 13.59 24.97 1.84x
pytorch 1.9.0+cpu resnet50 75.87% 76.13% -0.34% 25.67 54.12 2.11x
pytorch 1.9.0+cpu resnext101_32x8d 79.09% 79.31% -0.28% 62.44 147.88 2.37x
pytorch 1.9.0+cpu bert_base_mrpc 88.16% 88.73% -0.64% 41.33 81.93 1.98x
pytorch 1.9.0+cpu bert_base_cola 58.29% 58.84% -0.93% 39.30 86.58 2.20x
pytorch 1.9.0+cpu bert_base_sts-b 88.65% 89.27% -0.70% 39.46 86.97 2.20x
pytorch 1.9.0+cpu bert_base_sst-2 91.63% 91.86% -0.25% 39.12 82.59 2.11x
pytorch 1.9.0+cpu bert_base_rte 69.31% 69.68% -0.52% 39.81 81.98 2.06x
pytorch 1.9.0+cpu bert_large_mrpc 87.48% 88.33% -0.95% 112.61 287.44 2.55x
pytorch 1.9.0+cpu bert_large_squad 92.79 93.05 -0.28% 497.79 953.74 1.92x
pytorch 1.9.0+cpu bert_large_qnli 91.12% 91.82% -0.76% 112.43 291.10 2.59x
pytorch 1.9.0+cpu bert_large_rte 72.92% 72.56% 0.50% 148.60 287.03 1.93x
pytorch 1.9.0+cpu bert_large_cola 62.85% 62.57% 0.45% 112.54 283.38 2.52x
pytorch 1.9.0+cpu dlrm 80.27% 80.27% 0.00% 0.01 0.01 1.00x
pytorch 1.9.0+cpu inception_v3 69.39% 69.54% -0.21% 29.40 52.01 1.77x
pytorch 1.9.0+cpu peleenet 71.54% 72.08% -0.75% 24.99 33.14 1.33x
pytorch 1.9.0+cpu yolo_v3 24.50% 24.54% -0.17% 117.56 243.60 2.07x
pytorch 1.9.0+cpu se_resnext50_32x4d 79.02% 79.08% -0.07% 33.41 63.55 1.90x
pytorch 1.9.0+cpu mobilenet_v2 70.73% 71.86% -1.57% 15.34 23.27 1.52x
pytorch 1.9.0+cpu blendcnn 68.40% 68.40% 0.00% 2.43 2.52 1.03x
pytorch 1.9.0+cpu gpt_wikitext 60.06 60.20 -0.23% 545.94 590.43 1.08x
pytorch 1.9.0+cpu roberta_base_mrpc 85.37% 85.51% -0.17% 40.61 82.25 2.03x
pytorch 1.9.0+cpu camembert_base_mrpc 84.72% 84.22% 0.60% 44.23 83.24 1.88x
pytorch 1.9.0+cpu distilbert_base_mrpc 81.17% 80.99% 0.21% 26.24 45.65 1.74x
pytorch 1.9.0+cpu albert_base_mrpc 88.77% 88.50% 0.31% 303.38 374.12 1.23x
pytorch 1.9.0+cpu funnel_mrpc 91.72% 92.26% -0.58% 86.83 89.71 1.03x
pytorch 1.9.0+cpu bart_wnli 49.30% 52.11% -5.41% 321.66 363.76 1.13x
pytorch 1.9.0+cpu mbart_wnli 56.34% 56.34% 0.00% 175.87 342.64 1.95x
pytorch 1.9.0+cpu t5_wmt_en_ro 24.39 24.52 -0.55% 2530.55 2674.40 1.06x
pytorch 1.9.0+cpu marianmt_wmt_en_ro 22.39 22.23 0.72% 3522.83 3758.02 1.07x
pytorch 1.9.0+cpu pegasus_billsum 50.23 51.21 -1.91% 40000.00 62500.00 1.56x
pytorch 1.9.0+cpu rnnt 92.48 92.55 -0.08% 182.23 554.61 3.04x
pytorch 1.9.0+cpu xlm-roberta-base_mrpc 87.93% 88.62% -0.78% 88.30 90.27 1.02x
pytorch 1.9.0+cpu flaubert_mrpc 79.81% 80.19% -0.48% 19.46 24.80 1.27x
pytorch 1.9.0+cpu barthez_mrpc 83.25% 83.81% -0.66% 69.93 104.06 1.49x
pytorch 1.9.0+cpu longformer_mrpc 90.97% 91.46% -0.53% 528.43 656.89 1.24x
pytorch 1.9.0+cpu layoutlm_mrpc 81.22% 78.01% 4.12% 48.18 88.37 1.83x
pytorch 1.9.0+cpu deberta_mrpc 90.29% 90.91% -0.68% 89.03 135.90 1.53x
pytorch 1.9.0+cpu squeezebert_mrpc 87.96% 87.65% 0.36% 47.68 56.26 1.18x
pytorch 1.9.0+cpu dlrm_fx 80.19% 80.27% -0.10% 0.00 0.01 1.67x
pytorch 1.9.0+cpu resnet18_fx 69.61% 69.76% -0.22% 13.42 26.41 1.97x
pytorch 1.9.0+cpu xlnet_base_mrpc 89.43% 89.47% -0.04% 101.99 128.57 1.26x
pytorch 1.9.0+cpu ctrl_mrpc 82.00% 82.00% 0.00% 474.58 1265.14 2.67x
pytorch 1.9.0+cpu xlm_mrpc 80.50% 79.56% 1.18% 177.14 536.52 3.03x
pytorch 1.9.0+cpu maskrcnn_fx 37.70% 37.80% -0.26% 116.62 179.57 1.54x
pytorch 1.9.0+cpu ssd_resnet34_fx 19.511 19.63 -0.61% 378.40 1347.00 3.56x

Quantization-aware training models

Framework Version Model Accuracy Performance
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] INT8 realtime(ms)
CLX8280 1s 4c per instance
FP32 realtime(ms)
CLX8280 1s 4c per instance
Realtime Latency Ratio[FP32/INT8]
pytorch 1.9.0+cpu resnet18_qat 69.75% 69.76% -0.02% 13.66 25.60 1.87x
pytorch 1.9.0+cpu resnet50_qat 76.05% 76.13% -0.11% 25.22 54.32 2.15x
pytorch 1.9.0+cpu resnet18_qat_fx 69.72% 69.76% -0.05% 13.53 26.72 1.97x
pytorch 1.9.0+cpu mobilenet_v2_qat 71.45% 71.86% -0.56% 15.29 22.79 1.49x

MXNet models

Framework Version Model Accuracy Performance
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] INT8 realtime(ms)
CLX8280 1s 4c per instance
FP32 realtime(ms)
CLX8280 1s 4c per instance
Realtime Latency Ratio[FP32/INT8]
mxnet 1.7.0 resnet50v1 76.08% 76.33% -0.32% 6.29 20.85 3.32x
mxnet 1.7.0 inceptionv3 77.73% 77.64% 0.11% 11.18 31.76 2.84x
mxnet 1.7.0 mobilenet1.0 71.69% 72.22% -0.74% 1.60 3.96 2.48x
mxnet 1.7.0 mobilenetv2_1.0 70.78% 70.87% -0.12% 1.93 5.33 2.76x
mxnet 1.7.0 resnet18_v1 70.02% 70.14% -0.17% 3.01 9.49 3.15x
mxnet 1.7.0 squeezenet1.0 56.74% 56.96% -0.38% 2.38 6.24 2.62x
mxnet 1.7.0 ssd-resnet50_v1 80.21% 80.23% -0.03% 37.68 178.55 4.74x
mxnet 1.7.0 ssd-mobilenet1.0 74.94% 75.54% -0.79% 15.28 59.86 3.92x
mxnet 1.7.0 resnet152_v1 78.21% 78.54% -0.42% 17.79 58.81 3.31x

ONNX Models

Framework Version Model Accuracy Performance
INT8 Tuning Accuracy FP32 Accuracy Baseline Acc Ratio [(INT8-FP32)/FP32] INT8 realtime(ms)
CLX8280 1s 4c per instance
FP32 realtime(ms)
CLX8280 1s 4c per instance
Realtime Latency Ratio[FP32/INT8]
onnxrt 1.8.0 resnet50_v1_5 73.83% 73.99% -0.22% 11.99 20.62 1.72x
onnxrt 1.8.0 bert_base_mrpc_static 85.29% 86.03% -0.86% 14.34 32.15 2.24x
onnxrt 1.8.0 bert_base_mrpc_dynamic 85.29% 86.03% -0.86% 27.57 67.56 2.45x
onnxrt 1.8.0 vgg16 69.45% 69.44% 0.01% 72.53 95.64 1.32x
onnxrt 1.8.0 ssd_mobilenet_v1 22.41% 23.10% -2.99% 16.27 18.74 1.15x
onnxrt 1.8.0 ssd_mobilenet_v2 23.80% 24.68% -3.57% 20.59 25.11 1.22x
onnxrt 1.8.0 distilbert_base_mrpc 85.05% 84.56% 0.58% 6.35 17.24 2.72x
onnxrt 1.8.0 mobilebert_mrpc 86.03% 86.27% -0.28% 15.40 17.52 1.14x
onnxrt 1.8.0 roberta_base_mrpc 88.73% 89.46% -0.82% 14.08 35.92 2.55x
onnxrt 1.8.0 resnet50-v1-12 74.77% 74.97% -0.27% 11.13 20.29 1.82x
onnxrt 1.8.0 resnet_v1_5_mlperf 76.11% 76.47% -0.47% 12.66 20.51 1.62x
onnxrt 1.8.0 mobilenet_v3_mlperf 75.24% 75.39% -0.20% 3.84 5.76 1.50x
onnxrt 1.8.0 bert_squad_model_zoo 79.93 80.67 -0.91% 91.35 168.07 1.84x
onnxrt 1.8.0 mobilebert_squad_mlperf 89.72 90.03 -0.34% 115.82 122.00 1.05x