The below tables are models enabled by the Intel® Neural Compressor.
Framework | Version | Model | Accuracy | Performance | ||||
---|---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio [(INT8-FP32)/FP32] | INT8 realtime(ms) CLX8280 1s 4c per instance |
FP32 realtime(ms) CLX8280 1s 4c per instance |
Realtime Latency Ratio[FP32/INT8] | |||
tensorflow | 2.5.0 | resnet50v1.0 | 74.24% | 74.27% | -0.04% | 7.64 | 21.54 | 2.82x |
tensorflow | 2.5.0 | resnet50v1.5 | 76.94% | 76.46% | 0.63% | 9.54 | 24.28 | 2.54x |
tensorflow | 2.5.0 | resnet101 | 77.21% | 76.45% | 0.99% | 12.92 | 30.65 | 2.37x |
tensorflow | 2.5.0 | inception_v1 | 70.30% | 69.74% | 0.80% | 5.58 | 10.13 | 1.82x |
tensorflow | 2.5.0 | inception_v2 | 74.27% | 73.97% | 0.41% | 6.78 | 12.42 | 1.83x |
tensorflow | 2.5.0 | inception_v3 | 77.29% | 76.75% | 0.70% | 12.90 | 27.74 | 2.15x |
tensorflow | 2.5.0 | inception_v4 | 80.36% | 80.27% | 0.11% | 21.00 | 54.42 | 2.59x |
tensorflow | 2.5.0 | inception_resnet_v2 | 80.42% | 80.40% | 0.02% | 44.72 | 87.62 | 1.96x |
tensorflow | 2.5.0 | mobilenetv1 | 73.93% | 70.96% | 4.19% | 2.96 | 9.88 | 3.34x |
tensorflow | 2.5.0 | mobilenetv2 | 71.96% | 71.76% | 0.28% | 4.95 | 10.71 | 2.16x |
tensorflow | 2.5.0 | ssd_resnet50_v1 | 37.91% | 38.00% | -0.24% | 145.96 | 422.11 | 2.89x |
tensorflow | 2.5.0 | ssd_mobilenet_v1 | 23.02% | 23.13% | -0.48% | 12.19 | 26.85 | 2.20x |
tensorflow | 2.5.0 | faster_rcnn_resnet101 | 30.33% | 30.38% | -0.16% | 152.71 | 541.75 | 3.55x |
tensorflow | 2.5.0 | faster_rcnn_resnet101_saved | 30.37% | 30.38% | -0.03% | 151.55 | 613.76 | 4.05x |
tensorflow | 2.5.0 | mask_rcnn_inception_v2 | 28.61% | 28.73% | -0.42% | 77.73 | 201.69 | 2.59x |
tensorflow | 2.5.0 | wide_deep_large_ds | 77.61% | 77.67% | -0.08% | 1.24 | 1.86 | 1.50x |
tensorflow | 2.5.0 | vgg16 | 72.13% | 70.89% | 1.75% | 16.91 | 61.21 | 3.62x |
tensorflow | 2.5.0 | vgg19 | 72.35% | 71.01% | 1.89% | 20.58 | 74.47 | 3.62x |
tensorflow | 2.5.0 | resnetv2_50 | 70.36% | 69.64% | 1.03% | 15.20 | 18.59 | 1.22x |
tensorflow | 2.5.0 | resnetv2_101 | 72.58% | 71.87% | 0.99% | 25.54 | 34.33 | 1.34x |
tensorflow | 2.5.0 | resnetv2_152 | 72.92% | 72.37% | 0.76% | 37.25 | 49.86 | 1.34x |
tensorflow | 2.5.0 | densenet121 | 72.31% | 72.89% | -0.80% | 30.56 | 44.87 | 1.47x |
tensorflow | 2.5.0 | densenet161 | 76.36% | 76.29% | 0.09% | 53.69 | 85.54 | 1.59x |
tensorflow | 2.5.0 | densenet169 | 74.49% | 74.65% | -0.21% | 39.50 | 56.68 | 1.44x |
tensorflow | 2.5.0 | ssd_resnet50_v1_ckpt | 37.89% | 38.00% | -0.29% | 142.82 | 481.75 | 3.37x |
tensorflow | 2.5.0 | ssd_mobilenet_v1_ckpt | 23.02% | 23.13% | -0.48% | 12.22 | 32.22 | 2.64x |
tensorflow | 2.5.0 | mask_rcnn_inception_v2_ckpt | 28.61% | 28.73% | -0.42% | 82.38 | 204.74 | 2.49x |
tensorflow | 2.5.0 | efficientnet_b0 | 78.53% | 76.75% | 2.32% | 26.23 | 27.53 | 1.05x |
tensorflow | 2.5.0 | resnet50_fashion | 78.05% | 78.12% | -0.09% | 3.11 | 6.89 | 2.22x |
Framework | Version | Model | Accuracy | Performance | ||||
---|---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio [(INT8-FP32)/FP32] | INT8 realtime(ms) CLX8280 1s 4c per instance |
FP32 realtime(ms) CLX8280 1s 4c per instance |
Realtime Latency Ratio[FP32/INT8] | |||
tensorflow | 1.15.0-up3 | bert_large_squad | 92.35 | 92.98 | -0.67% | 397.58 | 875.35 | 2.20x |
tensorflow | 1.15.0-up3 | bert_base_mrpc | 86.03% | 86.52% | -0.57% | 42.25 | 75.95 | 1.80x |
tensorflow | 1.15.0-up3 | resnet_v1_50_slim | 76.03% | 75.18% | 1.13% | 7.07 | 23.60 | 3.34x |
tensorflow | 1.15.0-up3 | resnet_v1_101_slim | 77.12% | 76.40% | 0.94% | 12.53 | 43.21 | 3.45x |
tensorflow | 1.15.0-up3 | resnet_v1_152_slim | 77.58% | 76.81% | 1.00% | 17.76 | 65.32 | 3.68x |
tensorflow | 1.15.0-up3 | inception_v1_slim | 70.41% | 69.77% | 0.92% | 5.62 | 12.09 | 2.15x |
tensorflow | 1.15.0-up3 | inception_v2_slim | 74.38% | 73.98% | 0.54% | 6.82 | 14.40 | 2.11x |
tensorflow | 1.15.0-up3 | inception_v3_slim | 78.32% | 77.99% | 0.42% | 11.63 | 31.22 | 2.68x |
tensorflow | 1.15.0-up3 | inception_v4_slim | 80.35% | 80.19% | 0.20% | 21.63 | 62.51 | 2.89x |
tensorflow | 1.15.0-up3 | vgg16_slim | 72.16% | 70.89% | 1.79% | 17.09 | 60.87 | 3.56x |
tensorflow | 1.15.0-up3 | vgg19_slim | 72.22% | 71.01% | 1.70% | 20.46 | 73.54 | 3.59x |
tensorflow | 1.15.0-up3 | resnetv2_50_slim | 70.36% | 69.72% | 0.92% | 13.25 | 19.39 | 1.46x |
tensorflow | 1.15.0-up3 | resnetv2_101_slim | 72.59% | 71.91% | 0.95% | 23.21 | 35.98 | 1.55x |
tensorflow | 1.15.0-up3 | resnetv2_152_slim | 72.93% | 72.40% | 0.73% | 33.40 | 52.74 | 1.58x |
Framework | Version | Model | Accuracy | Performance | ||||
---|---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio [(INT8-FP32)/FP32] | INT8 realtime(ms) CLX8280 1s 4c per instance |
FP32 realtime(ms) CLX8280 1s 4c per instance |
Realtime Latency Ratio[FP32/INT8] | |||
pytorch | 1.9.0+cpu | resnet18 | 69.58% | 69.76% | -0.26% | 13.59 | 24.97 | 1.84x |
pytorch | 1.9.0+cpu | resnet50 | 75.87% | 76.13% | -0.34% | 25.67 | 54.12 | 2.11x |
pytorch | 1.9.0+cpu | resnext101_32x8d | 79.09% | 79.31% | -0.28% | 62.44 | 147.88 | 2.37x |
pytorch | 1.9.0+cpu | bert_base_mrpc | 88.16% | 88.73% | -0.64% | 41.33 | 81.93 | 1.98x |
pytorch | 1.9.0+cpu | bert_base_cola | 58.29% | 58.84% | -0.93% | 39.30 | 86.58 | 2.20x |
pytorch | 1.9.0+cpu | bert_base_sts-b | 88.65% | 89.27% | -0.70% | 39.46 | 86.97 | 2.20x |
pytorch | 1.9.0+cpu | bert_base_sst-2 | 91.63% | 91.86% | -0.25% | 39.12 | 82.59 | 2.11x |
pytorch | 1.9.0+cpu | bert_base_rte | 69.31% | 69.68% | -0.52% | 39.81 | 81.98 | 2.06x |
pytorch | 1.9.0+cpu | bert_large_mrpc | 87.48% | 88.33% | -0.95% | 112.61 | 287.44 | 2.55x |
pytorch | 1.9.0+cpu | bert_large_squad | 92.79 | 93.05 | -0.28% | 497.79 | 953.74 | 1.92x |
pytorch | 1.9.0+cpu | bert_large_qnli | 91.12% | 91.82% | -0.76% | 112.43 | 291.10 | 2.59x |
pytorch | 1.9.0+cpu | bert_large_rte | 72.92% | 72.56% | 0.50% | 148.60 | 287.03 | 1.93x |
pytorch | 1.9.0+cpu | bert_large_cola | 62.85% | 62.57% | 0.45% | 112.54 | 283.38 | 2.52x |
pytorch | 1.9.0+cpu | dlrm | 80.27% | 80.27% | 0.00% | 0.01 | 0.01 | 1.00x |
pytorch | 1.9.0+cpu | inception_v3 | 69.39% | 69.54% | -0.21% | 29.40 | 52.01 | 1.77x |
pytorch | 1.9.0+cpu | peleenet | 71.54% | 72.08% | -0.75% | 24.99 | 33.14 | 1.33x |
pytorch | 1.9.0+cpu | yolo_v3 | 24.50% | 24.54% | -0.17% | 117.56 | 243.60 | 2.07x |
pytorch | 1.9.0+cpu | se_resnext50_32x4d | 79.02% | 79.08% | -0.07% | 33.41 | 63.55 | 1.90x |
pytorch | 1.9.0+cpu | mobilenet_v2 | 70.73% | 71.86% | -1.57% | 15.34 | 23.27 | 1.52x |
pytorch | 1.9.0+cpu | blendcnn | 68.40% | 68.40% | 0.00% | 2.43 | 2.52 | 1.03x |
pytorch | 1.9.0+cpu | gpt_wikitext | 60.06 | 60.20 | -0.23% | 545.94 | 590.43 | 1.08x |
pytorch | 1.9.0+cpu | roberta_base_mrpc | 85.37% | 85.51% | -0.17% | 40.61 | 82.25 | 2.03x |
pytorch | 1.9.0+cpu | camembert_base_mrpc | 84.72% | 84.22% | 0.60% | 44.23 | 83.24 | 1.88x |
pytorch | 1.9.0+cpu | distilbert_base_mrpc | 81.17% | 80.99% | 0.21% | 26.24 | 45.65 | 1.74x |
pytorch | 1.9.0+cpu | albert_base_mrpc | 88.77% | 88.50% | 0.31% | 303.38 | 374.12 | 1.23x |
pytorch | 1.9.0+cpu | funnel_mrpc | 91.72% | 92.26% | -0.58% | 86.83 | 89.71 | 1.03x |
pytorch | 1.9.0+cpu | bart_wnli | 49.30% | 52.11% | -5.41% | 321.66 | 363.76 | 1.13x |
pytorch | 1.9.0+cpu | mbart_wnli | 56.34% | 56.34% | 0.00% | 175.87 | 342.64 | 1.95x |
pytorch | 1.9.0+cpu | t5_wmt_en_ro | 24.39 | 24.52 | -0.55% | 2530.55 | 2674.40 | 1.06x |
pytorch | 1.9.0+cpu | marianmt_wmt_en_ro | 22.39 | 22.23 | 0.72% | 3522.83 | 3758.02 | 1.07x |
pytorch | 1.9.0+cpu | pegasus_billsum | 50.23 | 51.21 | -1.91% | 40000.00 | 62500.00 | 1.56x |
pytorch | 1.9.0+cpu | rnnt | 92.48 | 92.55 | -0.08% | 182.23 | 554.61 | 3.04x |
pytorch | 1.9.0+cpu | xlm-roberta-base_mrpc | 87.93% | 88.62% | -0.78% | 88.30 | 90.27 | 1.02x |
pytorch | 1.9.0+cpu | flaubert_mrpc | 79.81% | 80.19% | -0.48% | 19.46 | 24.80 | 1.27x |
pytorch | 1.9.0+cpu | barthez_mrpc | 83.25% | 83.81% | -0.66% | 69.93 | 104.06 | 1.49x |
pytorch | 1.9.0+cpu | longformer_mrpc | 90.97% | 91.46% | -0.53% | 528.43 | 656.89 | 1.24x |
pytorch | 1.9.0+cpu | layoutlm_mrpc | 81.22% | 78.01% | 4.12% | 48.18 | 88.37 | 1.83x |
pytorch | 1.9.0+cpu | deberta_mrpc | 90.29% | 90.91% | -0.68% | 89.03 | 135.90 | 1.53x |
pytorch | 1.9.0+cpu | squeezebert_mrpc | 87.96% | 87.65% | 0.36% | 47.68 | 56.26 | 1.18x |
pytorch | 1.9.0+cpu | dlrm_fx | 80.19% | 80.27% | -0.10% | 0.00 | 0.01 | 1.67x |
pytorch | 1.9.0+cpu | resnet18_fx | 69.61% | 69.76% | -0.22% | 13.42 | 26.41 | 1.97x |
pytorch | 1.9.0+cpu | xlnet_base_mrpc | 89.43% | 89.47% | -0.04% | 101.99 | 128.57 | 1.26x |
pytorch | 1.9.0+cpu | ctrl_mrpc | 82.00% | 82.00% | 0.00% | 474.58 | 1265.14 | 2.67x |
pytorch | 1.9.0+cpu | xlm_mrpc | 80.50% | 79.56% | 1.18% | 177.14 | 536.52 | 3.03x |
pytorch | 1.9.0+cpu | maskrcnn_fx | 37.70% | 37.80% | -0.26% | 116.62 | 179.57 | 1.54x |
pytorch | 1.9.0+cpu | ssd_resnet34_fx | 19.511 | 19.63 | -0.61% | 378.40 | 1347.00 | 3.56x |
Framework | Version | Model | Accuracy | Performance | ||||
---|---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio [(INT8-FP32)/FP32] | INT8 realtime(ms) CLX8280 1s 4c per instance |
FP32 realtime(ms) CLX8280 1s 4c per instance |
Realtime Latency Ratio[FP32/INT8] | |||
pytorch | 1.9.0+cpu | resnet18_qat | 69.75% | 69.76% | -0.02% | 13.66 | 25.60 | 1.87x |
pytorch | 1.9.0+cpu | resnet50_qat | 76.05% | 76.13% | -0.11% | 25.22 | 54.32 | 2.15x |
pytorch | 1.9.0+cpu | resnet18_qat_fx | 69.72% | 69.76% | -0.05% | 13.53 | 26.72 | 1.97x |
pytorch | 1.9.0+cpu | mobilenet_v2_qat | 71.45% | 71.86% | -0.56% | 15.29 | 22.79 | 1.49x |
Framework | Version | Model | Accuracy | Performance | ||||
---|---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio [(INT8-FP32)/FP32] | INT8 realtime(ms) CLX8280 1s 4c per instance |
FP32 realtime(ms) CLX8280 1s 4c per instance |
Realtime Latency Ratio[FP32/INT8] | |||
mxnet | 1.7.0 | resnet50v1 | 76.08% | 76.33% | -0.32% | 6.29 | 20.85 | 3.32x |
mxnet | 1.7.0 | inceptionv3 | 77.73% | 77.64% | 0.11% | 11.18 | 31.76 | 2.84x |
mxnet | 1.7.0 | mobilenet1.0 | 71.69% | 72.22% | -0.74% | 1.60 | 3.96 | 2.48x |
mxnet | 1.7.0 | mobilenetv2_1.0 | 70.78% | 70.87% | -0.12% | 1.93 | 5.33 | 2.76x |
mxnet | 1.7.0 | resnet18_v1 | 70.02% | 70.14% | -0.17% | 3.01 | 9.49 | 3.15x |
mxnet | 1.7.0 | squeezenet1.0 | 56.74% | 56.96% | -0.38% | 2.38 | 6.24 | 2.62x |
mxnet | 1.7.0 | ssd-resnet50_v1 | 80.21% | 80.23% | -0.03% | 37.68 | 178.55 | 4.74x |
mxnet | 1.7.0 | ssd-mobilenet1.0 | 74.94% | 75.54% | -0.79% | 15.28 | 59.86 | 3.92x |
mxnet | 1.7.0 | resnet152_v1 | 78.21% | 78.54% | -0.42% | 17.79 | 58.81 | 3.31x |
Framework | Version | Model | Accuracy | Performance | ||||
---|---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio [(INT8-FP32)/FP32] | INT8 realtime(ms) CLX8280 1s 4c per instance |
FP32 realtime(ms) CLX8280 1s 4c per instance |
Realtime Latency Ratio[FP32/INT8] | |||
onnxrt | 1.8.0 | resnet50_v1_5 | 73.83% | 73.99% | -0.22% | 11.99 | 20.62 | 1.72x |
onnxrt | 1.8.0 | bert_base_mrpc_static | 85.29% | 86.03% | -0.86% | 14.34 | 32.15 | 2.24x |
onnxrt | 1.8.0 | bert_base_mrpc_dynamic | 85.29% | 86.03% | -0.86% | 27.57 | 67.56 | 2.45x |
onnxrt | 1.8.0 | vgg16 | 69.45% | 69.44% | 0.01% | 72.53 | 95.64 | 1.32x |
onnxrt | 1.8.0 | ssd_mobilenet_v1 | 22.41% | 23.10% | -2.99% | 16.27 | 18.74 | 1.15x |
onnxrt | 1.8.0 | ssd_mobilenet_v2 | 23.80% | 24.68% | -3.57% | 20.59 | 25.11 | 1.22x |
onnxrt | 1.8.0 | distilbert_base_mrpc | 85.05% | 84.56% | 0.58% | 6.35 | 17.24 | 2.72x |
onnxrt | 1.8.0 | mobilebert_mrpc | 86.03% | 86.27% | -0.28% | 15.40 | 17.52 | 1.14x |
onnxrt | 1.8.0 | roberta_base_mrpc | 88.73% | 89.46% | -0.82% | 14.08 | 35.92 | 2.55x |
onnxrt | 1.8.0 | resnet50-v1-12 | 74.77% | 74.97% | -0.27% | 11.13 | 20.29 | 1.82x |
onnxrt | 1.8.0 | resnet_v1_5_mlperf | 76.11% | 76.47% | -0.47% | 12.66 | 20.51 | 1.62x |
onnxrt | 1.8.0 | mobilenet_v3_mlperf | 75.24% | 75.39% | -0.20% | 3.84 | 5.76 | 1.50x |
onnxrt | 1.8.0 | bert_squad_model_zoo | 79.93 | 80.67 | -0.91% | 91.35 | 168.07 | 1.84x |
onnxrt | 1.8.0 | mobilebert_squad_mlperf | 89.72 | 90.03 | -0.34% | 115.82 | 122.00 | 1.05x |