Optimizing TensorFlow models with Neural Network Compression Framework of OpenVINO™ by 8-bit quantization.
This tutorial demonstrates how to use NNCF 8-bit quantization to optimize the TensorFlow model for inference with OpenVINO Toolkit. For more advanced usage, refer to these examples.
To speed up download and training, use a ResNet-18 model with the Imagenette dataset. Imagenette is a subset of 10 easily classified classes from the ImageNet dataset.
This tutorial consists of the following steps:
- Fine-tuning of
FP32
model - Transforming the original
FP32
model toINT8
- Using fine-tuning to restore the accuracy.
- Exporting optimized and original models to Frozen Graph and then to OpenVINO
- Measuring and comparing the performance of the models.
If you have not installed all required dependencies, follow the Installation Guide.