Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AdaRound and RNN QAT results #654

Merged
merged 1 commit into from
Jun 18, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 91 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,34 +79,110 @@ Some recently added features include

## Results

AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning. As an example of accuracy maintained, the DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.
AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.


<h4>DFQ</h4>

The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9%
loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.

<table style="width:50%">
<tr>
<th style="width:80px">Models</th>
<th>FP32 </th>
<th>FP32</th>
<th>INT8 Simulation </th>
</tr>
<tr>
<td>MobileNet v2 (top1)</td>
<td>71.72%</td>
<td>71.08%</td>
<td align="center">71.72%</td>
<td align="center">71.08%</td>
</tr>
<tr>
<td>ResNet 50 (top1)</td>
<td>76.05%</td>
<td>75.45%</td>
<td align="center">76.05%</td>
<td align="center">75.45%</td>
</tr>
<tr>
<td>DeepLab v3 (mIOU)</td>
<td>72.65%</td>
<td>71.91%</td>
<td align="center">72.65%</td>
<td align="center">71.91%</td>
</tr>
</table>
<br>

<h4>AdaRound (Adaptive Rounding)</h4>
<h5>ADAS Object Detect</h5>
<p>For this example ADAS object detection model, which was challenging to quantize to 8-bit precision,
AdaRound can recover the accuracy to within 1% of the FP32 accuracy.</p>
<table style="width:50%">
<tr>
<th style="width:80px" colspan="15">Configuration</th>
<th>mAP - Mean Average Precision</th>
</tr>
<tr>
<td colspan="15">FP32</td>
<td align="center">82.20%</td>
</tr>
<tr>
<td colspan="15">Nearest Rounding (INT8 weights, INT8 acts)</td>
<td align="center">49.85%</td>
</tr>
<tr>
<td colspan="15">AdaRound (INT8 weights, INT8 acts)</td>
<td align="center" bgcolor="#add8e6">81.21%</td>
</tr>
</table>

<h5>DeepLabv3 Semantic Segmentation</h5>
<p>For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to
4-bit precision without a significant drop in accuracy.</p>
<table style="width:50%">
<tr>
<th style="width:80px" colspan="15">Configuration</th>
<th>mIOU - Mean intersection over union</th>
</tr>
<tr>
<td colspan="15">FP32</td>
<td align="center">72.94%</td>
</tr>
<tr>
<td colspan="15">Nearest Rounding (INT4 weights, INT8 acts)</td>
<td align="center">6.09%</td>
</tr>
<tr>
<td colspan="15">AdaRound (INT4 weights, INT8 acts)</td>
<td align="center" bgcolor="#add8e6">70.86%</td>
</tr>
</table>
<br>

AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.
<h4>Quantization for Recurrent Models</h4>
<p>AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU).
Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with
minimal drop in accuracy.</p>

<table style="width:50%">
<tr>
<th>DeepSpeech2 <br>(using bi-directional LSTMs)</th>
<th>Word Error Rate</th>
</tr>
<tr>
<td>FP32</td>
<td align="center">9.92%</td>
</tr>
<tr>
<td>INT8</td>
<td align="center">10.22%</td>
</tr>
</table>

<br>

<h4>Model Compression</h4>
<p>AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18,
compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining
accuracy within approx. 1% of the original uncompressed model.</p>

<table style="width:50%">
<tr>
Expand All @@ -116,16 +192,18 @@ AIMET can also significantly compress models. For popular models, such as Resnet
</tr>
<tr>
<td>ResNet18 (top1)</td>
<td>69.76%</td>
<td>68.56%</td>
<td align="center">69.76%</td>
<td align="center">68.56%</td>
</tr>
<tr>
<td>ResNet 50 (top1)</td>
<td>76.05%</td>
<td>75.75%</td>
<td align="center">76.05%</td>
<td align="center">75.75%</td>
</tr>
</table>

<br>

## Installation Instructions
To install and use the pre-built version of the AIMET package, please follow one of the below links:
- [Install and run AIMET in *Ubuntu* environment](./packaging/install.md)
Expand Down