The Intel® Neural Compressor library is released as part of the Intel® oneAPI AI Analytics Toolkit (AI Kit). The AI Kit provides a consolidated package of Intel's latest deep learning and machine optimizations all in one place for ease of development. Along with Neural Compressor, the AI Kit includes Intel-optimized versions of deep learning frameworks (such as TensorFlow and PyTorch) and high-performing Python libraries to streamline end-to-end data science and AI workflows on Intel architectures.
You can install just the library from binary or source, or you can get the Intel-optimized framework together with the library by installing the Intel® oneAPI AI Analytics Toolkit.
# install from pip
pip install neural-compressor
# install from conda
conda install neural-compressor -c conda-forge -c intel
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
python setup.py install
The AI Kit, which includes the library, is distributed through many common channels, including from Intel's website, YUM, APT, Anaconda, and more. Select and download the AI Kit distribution package that's best suited for you and follow the Get Started Guide for post-installation instructions.
Download AI Kit | AI Kit Get Started Guide |
---|
Prerequisites
The following prerequisites and requirements must be satisfied for a successful installation:
-
Python version: 3.6 or 3.7 or 3.8 or 3.9
-
Download and install anaconda.
-
Create a virtual environment named nc in anaconda:
# Here we install python 3.7 for instance. You can also choose python 3.6, 3.8, or 3.9. conda create -n nc python=3.7 conda activate nc
# install from pip
pip install neural-compressor
# install from conda
conda install neural-compressor -c conda-forge -c intel
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
python setup.py install
Examples are provided to demonstrate the usage of Intel® Neural Compressor in different frameworks: TensorFlow, PyTorch, MXNet, and ONNX Runtime. Hello World examples are also available.
View Neural Compressor Documentation for getting started, deep dive, and advanced resources to help you use and develop Neural Compressor.
Intel® Neural Compressor supports systems based on Intel 64 architecture or compatible processors, specially optimized for the following CPUs:
- Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, and Icelake)
- future Intel Xeon Scalable processor (code name Sapphire Rapids)
Intel® Neural Compressor requires installing the Intel-optimized framework version for the supported DL framework you use: TensorFlow, PyTorch, MXNet, or ONNX runtime.
Note: Intel Neural Compressor supports Intel-optimized and official frameworks for some TensorFlow versions. Refer to Supported Frameworks for specifics.
Platform | OS | Python | Framework | Version |
---|---|---|---|---|
Cascade Lake Cooper Lake Skylake Ice Lake |
CentOS 8.3 Ubuntu 18.04 |
3.6 3.7 3.8 3.9 |
TensorFlow | 2.5.0 |
2.4.0 | ||||
2.3.0 | ||||
2.2.0 | ||||
2.1.0 | ||||
1.15.0 UP1 | ||||
1.15.0 UP2 | ||||
1.15.0 UP3 | ||||
1.15.2 | ||||
PyTorch | 1.5.0+cpu | |||
1.6.0+cpu | ||||
1.8.0+cpu | ||||
IPEX | ||||
MXNet | 1.7.0 | |||
1.6.0 | ||||
ONNX Runtime | 1.6.0 | |||
1.7.0 | ||||
1.8.0 |
Intel® Neural Compressor provides numerous examples to show promising accuracy loss with the best performance gain. A full quantized model list on various frameworks is available in the Model List.
Framework | version | Model | dataset | Accuracy | Performance speed up | ||
---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio[(INT8-FP32)/FP32] | Realtime Latency Ratio[FP32/INT8] | ||||
tensorflow | 2.4.0 | resnet50v1.5 | ImageNet | 76.70% | 76.50% | 0.26% | 3.23x |
tensorflow | 2.4.0 | Resnet101 | ImageNet | 77.20% | 76.40% | 1.05% | 2.42x |
tensorflow | 2.4.0 | inception_v1 | ImageNet | 70.10% | 69.70% | 0.57% | 1.88x |
tensorflow | 2.4.0 | inception_v2 | ImageNet | 74.10% | 74.00% | 0.14% | 1.96x |
tensorflow | 2.4.0 | inception_v3 | ImageNet | 77.20% | 76.70% | 0.65% | 2.36x |
tensorflow | 2.4.0 | inception_v4 | ImageNet | 80.00% | 80.30% | -0.37% | 2.59x |
tensorflow | 2.4.0 | inception_resnet_v2 | ImageNet | 80.10% | 80.40% | -0.37% | 1.97x |
tensorflow | 2.4.0 | Mobilenetv1 | ImageNet | 71.10% | 71.00% | 0.14% | 2.88x |
tensorflow | 2.4.0 | ssd_resnet50_v1 | Coco | 37.90% | 38.00% | -0.26% | 2.97x |
tensorflow | 2.4.0 | mask_rcnn_inception_v2 | Coco | 28.90% | 29.10% | -0.69% | 2.66x |
tensorflow | 2.4.0 | vgg16 | ImageNet | 72.50% | 70.90% | 2.26% | 3.75x |
tensorflow | 2.4.0 | vgg19 | ImageNet | 72.40% | 71.00% | 1.97% | 3.79x |
Framework | version | model | dataset | Accuracy | Performance speed up | ||
---|---|---|---|---|---|---|---|
INT8 Tuning Accuracy | FP32 Accuracy Baseline | Acc Ratio[(INT8-FP32)/FP32] | Realtime Latency Ratio[FP32/INT8] | ||||
pytorch | 1.5.0+cpu | resnet50 | ImageNet | 75.96% | 76.13% | -0.23% | 2.63x |
pytorch | 1.5.0+cpu | resnext101_32x8d | ImageNet | 79.12% | 79.31% | -0.24% | 2.61x |
pytorch | 1.6.0a0+24aac32 | bert_base_mrpc | MRPC | 88.90% | 88.73% | 0.19% | 1.98x |
pytorch | 1.6.0a0+24aac32 | bert_base_cola | COLA | 59.06% | 58.84% | 0.37% | 2.19x |
pytorch | 1.6.0a0+24aac32 | bert_base_sts-b | STS-B | 88.40% | 89.27% | -0.97% | 2.28x |
pytorch | 1.6.0a0+24aac32 | bert_base_sst-2 | SST-2 | 91.51% | 91.86% | -0.37% | 2.30x |
pytorch | 1.6.0a0+24aac32 | bert_base_rte | RTE | 69.31% | 69.68% | -0.52% | 2.15x |
pytorch | 1.6.0a0+24aac32 | bert_large_mrpc | MRPC | 87.45% | 88.33% | -0.99% | 2.73x |
pytorch | 1.6.0a0+24aac32 | bert_large_squad | SQUAD | 92.85% | 93.05% | -0.21% | 2.01x |
pytorch | 1.6.0a0+24aac32 | bert_large_qnli | QNLI | 91.20% | 91.82% | -0.68% | 2.69x |