PyTorch-Quantization is a toolkit for training and evaluating PyTorch models with simulated quantization. Quantization can be added to the model automatically, or manually, allowing the model to be tuned for accuracy and performance. Quantization is compatible with NVIDIAs high performance integer kernels which leverage integer Tensor Cores. The quantized model can be exported to ONNX and imported to an upcoming version of TensorRT.
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
git clone https://github.com/NVIDIA/TensorRT.git
cd tools/pytorch-quantization
Install prerequisites
pip install -r requirements.txt
pip install torch
Build and install pytorch-quantization
python setup.py install
pytorch-quantization
is preinstalled in NVIDIA NGC PyTorch container since 20.12, e.g. nvcr.io/nvidian/pytorch:20.12-py3
- Pytorch Quantization Toolkit userguide
- Quantization Basics whitepaper