Accelerate Inference of NLP models with Post-Training Quantization API of NNCF

This tutorial demonstrates how to apply INT8 quantization to the Natural Language Processing model BERT, using the Post-Training Quantization API. The HuggingFace BERT PyTorch model, fine-tuned for Microsoft Research Paraphrase Corpus (MRPC) task is used. The code of this tutorial is designed to be extendable to custom models and datasets.

Notebook Contents

The tutorial consists of the following steps:

If you have not installed all required dependencies, follow the Installation Guide.