Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT OSS 22.04 release #1923

Merged
merged 25 commits into from
Apr 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
051411a
update perf number for rel-8.2
ttyio Mar 29, 2022
5d72b75
demo/BERT enhancements
rajeevsrao Feb 22, 2022
c27e444
T5/GPT2 notebooks fix for CI/CD
vinhngx Feb 25, 2022
1555d0e
Use cuda:0 instead of cuda in GPT/T5 notebooks
ttyio Mar 14, 2022
e42d114
demo/Tacotron2 enhancements
rajeevsrao Mar 22, 2022
7714418
Bump version to 8.2.4.2 for DLFW 22.04
rajeevsrao Mar 21, 2022
eb866a9
1583 - sublicense ieee/half.h under Apache2
rajeevsrao Mar 15, 2022
305cdb5
PyramidROIAlign plugin refactor
wraveane Feb 23, 2022
43a6049
TRT crashes on Windows with OSS C&R plugin
samurdhikaru Mar 2, 2022
7838ad3
fix pyramidROIAlignPlugin: assertions used not existing members
theHamsta Mar 21, 2022
526366c
Detectron 2 Mask R-CNN R50-FPN python sample
azhurkevich Mar 21, 2022
ac7fba5
remove sampleNMT from codebase
shuyuelan Feb 18, 2022
0d8188b
modify readme in efficientdet and efficientnet
shuyuelan Mar 1, 2022
a69b32b
remove download_pgms.py instruction in readme
shuyuelan Mar 7, 2022
58a0d7d
Samples readme cleanup
shuyuelan Mar 11, 2022
da9af36
Add model export script for sampleOnnxMnistCoordConvAC
shuyuelan Feb 22, 2022
9bef660
Update onnx-graphsurgeon to v0.3.17
rajeevsrao Apr 13, 2022
a63d33f
Change torch version from 1.8.1 to 1.10.2+cu113 in requirements
shuyuelan Feb 14, 2022
26210e9
Add TensorRT Engine Explorer v0.1.0
rajeevsrao Apr 13, 2022
b650d1b
Update CHANGELOG for 22.04
rajeevsrao Apr 13, 2022
919c099
Fix usage of tactic source in demo/BERT
rajeevsrao Apr 6, 2022
1aa946b
Fix hangs at IndexErrors when when TF is imported after TRT
samurdhikaru Mar 25, 2022
9bb2283
Update copyright headers with SPDX identifiers
rajeevsrao Apr 11, 2022
9821699
Update CHANGELOG for 22.04
rajeevsrao Apr 13, 2022
413cfc3
Update CHANGELOG.md
rajeevsrao Apr 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# TensorRT OSS Release Changelog

## [22.04](https://github.com/NVIDIA/TensorRT/releases/tag/22.04) - 2022-04-13
### Added
- TensorRT Engine Explorer v0.1.0 [README](tools/experimental/trt-engine-explorer/README.md)
- Detectron 2 Mask R-CNN R50-FPN python [sample](samples/python/detectron2/README.md)
- Model export script for sampleOnnxMnistCoordConvAC

### Changed
- Updated base TensorRT version to 8.2.4.2
- Updated copyright headers with SPDX identifiers
- Updated onnx-graphsurgeon v0.3.17 [CHANGELOG](tools/onnx-graphsurgeon/CHANGELOG.md)
- `PyramidROIAlign` plugin refactor and bug fixes
- Fixed `MultilevelCropAndResize` crashes on Windows
- [#1583](https://github.com/NVIDIA/TensorRT/issues/1583) - sublicense ieee/half.h under Apache2
- Updated demo/BERT performance tables for rel-8.2
- [#1774](https://github.com/NVIDIA/TensorRT/issues/1774) Fix python hangs at IndexErrors when TF is imported after TensorRT
- Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
- Cleaned up sample READMEs

### Removed
- sampleNMT removed from samples

## [22.03](https://github.com/NVIDIA/TensorRT/releases/tag/22.03) - 2022-03-23
### Added
- EfficientDet sample enhancements
Expand Down
5 changes: 3 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
Expand Down
4 changes: 4 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,11 @@
- Copyright (c) 2000 The NetBSD Foundation, Inc.


> parsers/common/ieee_half.h
> samples/common/half.h
> third_party/ieee/half.h

The MIT License

Copyright (c) 2012-2017 Christian Rau <rauy@users.sourceforge.net>

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This repository contains the Open Source Software (OSS) components of NVIDIA Ten
To build the TensorRT-OSS components, you will first need the following software packages.

**TensorRT GA build**
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.2.3.0
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.2.4.2

**System Packages**
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
Expand Down Expand Up @@ -70,16 +70,16 @@ To build the TensorRT-OSS components, you will first need the following software

```bash
cd ~/Downloads
tar -xvzf TensorRT-8.2.3.0.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-8.2.3.0
tar -xvzf TensorRT-8.2.4.2.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-8.2.4.2
```

**Example: Windows on x86-64 with cuda-11.4**

```powershell
cd ~\Downloads
Expand-Archive .\TensorRT-8.2.3.0.Windows10.x86_64.cuda-11.4.cudnn8.2.zip
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.2.3.0'
Expand-Archive .\TensorRT-8.2.4.2.Windows10.x86_64.cuda-11.4.cudnn8.2.zip
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.2.4.2'
$Env:PATH += 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\'
```

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8.2.3.0
8.2.4.2
5 changes: 3 additions & 2 deletions demo/BERT/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
#
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
Expand Down
200 changes: 100 additions & 100 deletions demo/BERT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,78 +434,78 @@ Our results for BERT were obtained by running the `scripts/inference_benchmark.s
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 0.33 | 0.97 | 0.58 | 0.75 | 0.75 | 0.72 |
| 128 | 2 | 0.78 | 0.79 | 0.63 | 0.84 | 1.07 | 0.84 |
| 128 | 4 | 0.76 | 0.98 | 0.76 | 1.13 | 1.46 | 1.14 |
| 128 | 8 | 1.08 | 1.08 | 0.98 | 1.66 | 1.81 | 1.66 |
| 128 | 12 | 1.26 | 1.63 | 1.27 | 2.07 | 2.07 | 2.07 |
| 128 | 16 | 1.47 | 1.48 | 1.47 | 2.48 | 2.49 | 2.48 |
| 128 | 24 | 2.13 | 2.13 | 2.13 | 3.47 | 3.49 | 3.46 |
| 128 | 32 | 2.54 | 2.83 | 2.54 | 4.37 | 4.40 | 4.34 |
| 128 | 64 | 4.58 | 4.59 | 4.54 | 8.70 | 8.79 | 8.65 |
| 128 | 128 | 9.04 | 9.06 | 8.97 | 17.05 | 17.07 | 16.90 |
| 384 | 1 | 1.15 | 1.15 | 1.15 | 1.43 | 1.44 | 1.43 |
| 384 | 2 | 1.37 | 1.37 | 1.37 | 1.84 | 2.21 | 1.84 |
| 384 | 4 | 1.73 | 1.74 | 1.73 | 2.47 | 2.48 | 2.47 |
| 384 | 8 | 2.51 | 2.51 | 2.51 | 3.77 | 3.80 | 3.76 |
| 384 | 12 | 3.61 | 3.62 | 3.61 | 5.36 | 5.37 | 5.30 |
| 384 | 16 | 4.39 | 4.40 | 4.38 | 7.32 | 7.32 | 7.24 |
| 384 | 24 | 6.24 | 6.24 | 6.23 | 10.50 | 10.51 | 10.41 |
| 384 | 32 | 8.42 | 8.50 | 8.42 | 14.32 | 14.44 | 14.27 |
| 384 | 64 | 16.48 | 16.52 | 16.36 | 27.51 | 27.54 | 27.33 |
| 384 | 128 | 31.71 | 31.78 | 31.58 | | | |
| 128 | 1 | 0.72 | 0.72 | 0.59 | 0.66 | 0.81 | 0.65 |
| 128 | 2 | 0.68 | 0.68 | 0.64 | 0.97 | 0.97 | 0.79 |
| 128 | 4 | 0.99 | 0.99 | 0.79 | 1.02 | 1.29 | 1.02 |
| 128 | 8 | 0.94 | 1.21 | 0.94 | 1.38 | 1.39 | 1.38 |
| 128 | 12 | 1.22 | 1.23 | 1.22 | 1.91 | 1.92 | 1.91 |
| 128 | 16 | 1.40 | 1.40 | 1.40 | 2.19 | 2.20 | 2.19 |
| 128 | 24 | 1.93 | 1.94 | 1.92 | 3.37 | 3.38 | 3.34 |
| 128 | 32 | 2.48 | 2.48 | 2.47 | 4.08 | 4.14 | 4.07 |
| 128 | 64 | 4.31 | 4.31 | 4.27 | 8.08 | 8.09 | 8.00 |
| 128 | 128 | 8.37 | 8.38 | 8.31 | 16.14 | 16.21 | 16.02 |
| 384 | 1 | 1.15 | 1.47 | 1.15 | 1.30 | 1.65 | 1.31 |
| 384 | 2 | 1.34 | 1.72 | 1.35 | 1.66 | 1.67 | 1.66 |
| 384 | 4 | 1.69 | 1.70 | 1.69 | 2.27 | 2.28 | 2.27 |
| 384 | 8 | 2.29 | 2.30 | 2.28 | 3.67 | 3.70 | 3.66 |
| 384 | 12 | 3.46 | 3.46 | 3.45 | 5.06 | 5.08 | 5.01 |
| 384 | 16 | 4.20 | 4.20 | 4.19 | 6.73 | 6.75 | 6.67 |
| 384 | 24 | 5.94 | 5.95 | 5.94 | 9.86 | 9.87 | 9.75 |
| 384 | 32 | 7.93 | 7.94 | 7.92 | 13.56 | 13.61 | 13.44 |
| 384 | 64 | 15.48 | 15.49 | 15.39 | 26.09 | 26.26 | 25.94 |
| 384 | 128 | 29.92 | 29.95 | 29.68 | 51.65 | 51.71 | 51.02 |

##### BERT Large

| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.24 | 1.56 | 1.24 | 1.73 | 2.11 | 1.73 |
| 128 | 2 | 1.49 | 1.49 | 1.49 | 2.20 | 2.20 | 2.20 |
| 128 | 4 | 1.91 | 1.92 | 1.91 | 3.22 | 3.23 | 3.22 |
| 128 | 8 | 2.94 | 2.94 | 2.93 | 4.84 | 4.84 | 4.83 |
| 128 | 12 | 3.34 | 3.34 | 3.34 | 5.95 | 5.96 | 5.90 |
| 128 | 16 | 4.63 | 4.64 | 4.62 | 7.98 | 7.99 | 7.90 |
| 128 | 24 | 5.87 | 5.88 | 5.87 | 11.05 | 11.08 | 10.94 |
| 128 | 32 | 7.99 | 7.99 | 7.98 | 14.74 | 14.77 | 14.59 |
| 128 | 64 | 14.74 | 17.74 | 14.56 | 28.09 | 28.25 | 27.85 |
| 128 | 128 | 28.32 | 23.38 | 28.03 | 54.38 | 54.40 | 54.12 |
| 384 | 1 | 2.80 | 2.80 | 2.80 | 3.49 | 3.49 | 3.48 |
| 384 | 2 | 3.12 | 3.13 | 3.12 | 4.71 | 4.72 | 4.71 |
| 384 | 4 | 4.27 | 4.27 | 4.27 | 6.70 | 6.71 | 6.70 |
| 384 | 8 | 7.66 | 7.67 | 7.66 | 12.41 | 12.53 | 12.37 |
| 384 | 12 | 10.07 | 10.08 | 10.07 | 17.63 | 17.76 | 17.56 |
| 384 | 16 | 13.34 | 13.34 | 13.33 | 23.40 | 23.46 | 23.19 |
| 384 | 24 | 19.36 | 19.38 | 19.22 | 34.34 | 34.36 | 34.10 |
| 384 | 32 | 25.56 | 25.60 | 25.56 | 44.94 | 44.98 | 44.78 |
| 384 | 64 | 49.84 | 49.92 | 49.60 | 87.26 | 87.56 | 86.77 |
| 384 | 128 | 97.66 | 97.78 | 97.06 | 170.85 | 171.00 | 170.08 |
| 128 | 1 | 1.24 | 1.25 | 1.24 | 1.58 | 1.60 | 1.58 |
| 128 | 2 | 1.51 | 1.52 | 1.51 | 2.00 | 2.02 | 2.00 |
| 128 | 4 | 1.83 | 1.84 | 1.82 | 2.95 | 2.96 | 2.95 |
| 128 | 8 | 2.69 | 2.70 | 2.68 | 4.44 | 4.45 | 4.43 |
| 128 | 12 | 3.11 | 3.12 | 3.11 | 5.25 | 5.30 | 5.23 |
| 128 | 16 | 4.05 | 4.06 | 4.05 | 7.65 | 7.72 | 7.63 |
| 128 | 24 | 5.24 | 5.25 | 5.23 | 10.14 | 10.16 | 10.09 |
| 128 | 32 | 7.01 | 7.07 | 7.01 | 13.89 | 13.89 | 13.77 |
| 128 | 64 | 13.15 | 13.18 | 13.05 | 26.10 | 26.13 | 26.00 |
| 128 | 128 | 25.29 | 25.32 | 25.21 | 51.69 | 51.77 | 51.38 |
| 384 | 1 | 2.66 | 2.66 | 2.66 | 3.09 | 3.10 | 3.09 |
| 384 | 2 | 3.03 | 3.05 | 3.03 | 4.14 | 4.15 | 4.14 |
| 384 | 4 | 4.04 | 4.05 | 4.04 | 5.99 | 5.99 | 5.93 |
| 384 | 8 | 7.13 | 7.14 | 7.13 | 11.60 | 11.62 | 11.47 |
| 384 | 12 | 9.21 | 9.22 | 9.20 | 16.33 | 16.34 | 16.09 |
| 384 | 16 | 12.37 | 12.39 | 12.36 | 22.14 | 22.22 | 21.98 |
| 384 | 24 | 17.51 | 17.52 | 17.49 | 32.44 | 32.56 | 32.29 |
| 384 | 32 | 23.38 | 23.40 | 23.14 | 43.12 | 43.23 | 42.73 |
| 384 | 64 | 45.20 | 45.25 | 45.07 | 83.75 | 83.92 | 83.15 |
| 384 | 128 | 88.18 | 88.26 | 88.01 | 163.61 | 164.08 | 162.62 |

##### Megatron Large with Sparsity

| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.16 | 1.46 | 1.17 |
| 128 | 2 | 1.44 | 1.45 | 1.44 |
| 128 | 4 | 1.69 | 1.7 | 1.69 |
| 128 | 8 | 2.34 | 2.34 | 2.34 |
| 128 | 12 | 2.8 | 2.8 | 2.8 |
| 128 | 16 | 3.7 | 3.71 | 3.7 |
| 128 | 24 | 4.63 | 4.63 | 4.62 |
| 128 | 32 | 6.33 | 6.33 | 6.32 |
| 128 | 64 | 11.34 | 11.35 | 11.24 |
| 128 | 128 | 21.18 | 21.19 | 21.06 |
| 384 | 1 | 1.61 | 1.61 | 1.61 |
| 384 | 2 | 2.19 | 2.19 | 2.18 |
| 384 | 4 | 3.31 | 3.31 | 3.31 |
| 384 | 8 | 5.48 | 5.48 | 5.47 |
| 384 | 12 | 7.69 | 7.7 | 7.69 |
| 384 | 16 | 10.02 | 10.02 | 10.01 |
| 384 | 24 | 14.15 | 14.15 | 14.14 |
| 384 | 32 | 18.41 | 18.56 | 18.4 |
| 384 | 64 | 35.71 | 35.73 | 35.44 |
| 384 | 128 | 68.52 | 68.55 | 68.19 |
| 128 | 1 | 1.16 | 1.48 | 1.17 |
| 128 | 2 | 1.41 | 1.42 | 1.41 |
| 128 | 4 | 1.87 | 1.88 | 1.87 |
| 128 | 8 | 2.84 | 2.84 | 2.83 |
| 128 | 12 | 3.30 | 3.31 | 3.30 |
| 128 | 16 | 4.40 | 4.42 | 4.39 |
| 128 | 24 | 5.86 | 5.87 | 5.85 |
| 128 | 32 | 7.67 | 7.68 | 7.67 |
| 128 | 64 | 13.81 | 13.82 | 13.79 |
| 128 | 128 | 27.00 | 27.02 | 26.80 |
| 384 | 1 | 1.72 | 1.78 | 1.72 |
| 384 | 2 | 2.35 | 2.36 | 2.35 |
| 384 | 4 | 3.80 | 3.81 | 3.80 |
| 384 | 8 | 6.70 | 6.71 | 6.70 |
| 384 | 12 | 8.98 | 8.99 | 8.97 |
| 384 | 16 | 12.38 | 12.39 | 12.37 |
| 384 | 24 | 17.52 | 17.54 | 17.51 |
| 384 | 32 | 22.82 | 22.89 | 22.64 |
| 384 | 64 | 43.78 | 43.90 | 43.59 |
| 384 | 128 | 85.23 | 85.25 | 84.61 |


#### Inference performance: NVIDIA T4 (16GB)
Expand All @@ -517,49 +517,49 @@ Our results were obtained by running the `scripts/inference_benchmark.sh --gpu T
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.55 | 1.57 | 1.33 | 2.00 | 2.06 | 1.93 |
| 128 | 2 | 1.78 | 2.06 | 1.75 | 2.54 | 2.58 | 2.49 |
| 128 | 4 | 2.80 | 2.88 | 2.74 | 4.25 | 4.34 | 4.16 |
| 128 | 8 | 4.48 | 4.56 | 4.42 | 8.13 | 8.74 | 7.88 |
| 128 | 12 | 6.28 | 6.31 | 6.12 | 11.67 | 12.12 | 11.30 |
| 128 | 16 | 8.92 | 9.11 | 8.78 | 17.24 | 17.79 | 16.70 |
| 128 | 24 | 12.70 | 12.84 | 12.53 | 24.48 | 24.85 | 24.90 |
| 128 | 32 | 17.90 | 18.41 | 17.59 | 33.02 | 33.51 | 32.65 |
| 128 | 64 | 34.80 | 34.83 | 34.31 | 65.38 | 65.43 | 64.28 |
| 128 | 128 | 68.16 | 68.46 | 67.05 | 130.77 | 131.01 | 129.19 |
| 384 | 1 | 2.47 | 2.53 | 2.43 | 3.76 | 3.81 | 3.69 |
| 384 | 2 | 3.87 | 3.95 | 3.81 | 6.31 | 6.43 | 6.21 |
| 384 | 4 | 7.15 | 7.18 | 6.97 | 12.16 | 12.22 | 12.03 |
| 384 | 8 | 14.09 | 12.11 | 13.73 | 25.45 | 25.83 | 24.94 |
| 384 | 12 | 20.99 | 21.12 | 20.66 | 38.15 | 38.38 | 37.51 |
| 384 | 16 | 27.49 | 27.65 | 27.08 | 50.90 | 51.36 | 50.04 |
| 384 | 24 | 41.93 | 42.17 | 41.36 | 77.25 | 78.16 | 76.05 |
| 384 | 32 | 54.65 | 54.87 | 54.06 | 102.44 | 103.09 | 101.30 |
| 384 | 64 | 109.78 | 110.42 | 108.24 | 200.58 | 201.20 | 198.68 |
| 384 | 128 | 227.46 | 228.80 | 223.92 | 401.33 | 402.14 | 399.24 |
| 128 | 1 | 1.64 | 1.68 | 1.37 | 2.00 | 2.15 | 1.96 |
| 128 | 2 | 1.88 | 2.11 | 1.82 | 2.77 | 2.78 | 2.70 |
| 128 | 4 | 2.70 | 2.70 | 2.63 | 4.55 | 4.57 | 4.48 |
| 128 | 8 | 4.73 | 4.97 | 4.64 | 9.33 | 10.22 | 8.85 |
| 128 | 12 | 6.63 | 6.73 | 6.55 | 12.82 | 13.19 | 12.39 |
| 128 | 16 | 9.45 | 9.77 | 9.31 | 18.08 | 18.63 | 17.35 |
| 128 | 24 | 14.07 | 14.35 | 13.63 | 27.77 | 28.77 | 26.88 |
| 128 | 32 | 19.75 | 20.59 | 19.12 | 37.42 | 37.79 | 36.66 |
| 128 | 64 | 37.78 | 38.34 | 37.02 | 72.84 | 72.88 | 71.84 |
| 128 | 128 | 74.62 | 75.10 | 73.61 | 147.01 | 147.83 | 145.46 |
| 384 | 1 | 2.59 | 2.63 | 2.51 | 4.12 | 4.16 | 4.03 |
| 384 | 2 | 4.11 | 4.13 | 3.98 | 6.85 | 7.38 | 6.62 |
| 384 | 4 | 7.43 | 7.48 | 7.32 | 13.43 | 13.80 | 12.93 |
| 384 | 8 | 14.94 | 15.08 | 14.73 | 28.62 | 29.65 | 27.84 |
| 384 | 12 | 22.51 | 22.86 | 22.05 | 42.67 | 43.18 | 41.88 |
| 384 | 16 | 30.07 | 30.77 | 29.10 | 57.08 | 57.56 | 56.18 |
| 384 | 24 | 45.34 | 45.90 | 44.62 | 87.62 | 88.20 | 85.69 |
| 384 | 32 | 60.10 | 60.50 | 58.77 | 118.02 | 118.76 | 115.03 |
| 384 | 64 | 121.20 | 121.69 | 118.76 | 235.94 | 237.30 | 230.79 |
| 384 | 128 | 243.66 | 244.15 | 242.68 | 447.69 | 448.97 | 445.64 |

##### BERT Large

| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 3.59 | 3.61 | 3.51 | 5.10 | 5.18 | 5.02 |
| 128 | 2 | 4.93 | 5.03 | 4.83 | 7.72 | 7.73 | 7.58 |
| 128 | 4 | 8.15 | 8.19 | 7.93 | 13.67 | 13.85 | 13.56 |
| 128 | 8 | 14.21 | 14.23 | 13.89 | 26.88 | 27.66 | 26.35 |
| 128 | 12 | 22.41 | 22.47 | 21.91 | 41.04 | 41.29 | 40.30 |
| 128 | 16 | 29.30 | 29.83 | 28.82 | 55.04 | 55.27 | 54.05 |
| 128 | 24 | 44.60 | 44.63 | 43.92 | 81.24 | 82.28 | 79.59 |
| 128 | 32 | 60.88 | 61.48 | 58.97 | 114.13 | 114.47 | 112.78 |
| 128 | 64 | 111.78 | 112.02 | 110.77 | 224.24 | 225.02 | 221.97 |
| 128 | 128 | 223.99 | 224.28 | 222.33 | 417.56 | 418.54 | 415.33 |
| 384 | 1 | 7.18 | 7.27 | 7.07 | 11.74 | 11.96 | 11.51 |
| 384 | 2 | 12.22 | 12.25 | 11.92 | 21.47 | 21.61 | 20.97 |
| 384 | 4 | 35.95 | 36.43 | 35.63 | 42.03 | 42.35 | 41.36 |
| 384 | 8 | 47.06 | 47.22 | 46.41 | 83.16 | 83.51 | 82.06 |
| 384 | 12 | 66.04 | 66.04 | 65.89 | 127.10 | 127.99 | 127.46 |
| 384 | 16 | 87.98 | 88.45 | 87.13 | 164.13 | 165.12 | 161.96 |
| 384 | 24 | 132.56 | 132.96 | 131.24 | 262.76 | 263.68 | 258.96 |
| 384 | 32 | 179.44 | 180.61 | 176.66 | 329.99 | 331.67 | 325.59 |
| 384 | 64 | 352.81 | 353.39 | 350.21 | 684.19 | 686.39 | 674.76 |
| 384 | 128 | 706.85 | 707.73 | 704.38 | 1318.74 | 1320.22 | 1315.10 |
| 128 | 1 | 3.72 | 3.81 | 3.67 | 5.69 | 5.76 | 5.57 |
| 128 | 2 | 5.08 | 5.27 | 5.00 | 8.52 | 8.64 | 8.40 |
| 128 | 4 | 8.51 | 8.54 | 8.32 | 14.93 | 15.23 | 14.53 |
| 128 | 8 | 14.77 | 14.89 | 14.58 | 28.77 | 29.22 | 28.25 |
| 128 | 12 | 23.07 | 23.22 | 22.74 | 46.08 | 46.10 | 45.28 |
| 128 | 16 | 30.29 | 30.56 | 29.57 | 60.23 | 60.93 | 58.35 |
| 128 | 24 | 48.58 | 48.70 | 47.77 | 90.67 | 91.77 | 89.92 |
| 128 | 32 | 64.13 | 64.80 | 63.15 | 117.89 | 118.47 | 116.12 |
| 128 | 64 | 127.74 | 128.46 | 125.80 | 243.10 | 243.52 | 241.59 |
| 128 | 128 | 242.26 | 242.86 | 240.10 | 465.64 | 466.77 | 463.31 |
| 384 | 1 | 7.50 | 7.54 | 7.31 | 12.56 | 12.67 | 12.37 |
| 384 | 2 | 12.46 | 12.58 | 12.23 | 23.09 | 23.11 | 22.57 |
| 384 | 4 | 24.93 | 25.10 | 24.58 | 47.41 | 47.43 | 46.68 |
| 384 | 8 | 50.73 | 50.95 | 49.83 | 93.40 | 94.03 | 92.25 |
| 384 | 12 | 72.95 | 73.36 | 71.97 | 140.44 | 141.25 | 138.23 |
| 384 | 16 | 95.85 | 96.26 | 94.21 | 186.44 | 187.16 | 184.91 |
| 384 | 24 | 145.08 | 145.57 | 143.04 | 281.55 | 282.20 | 279.78 |
| 384 | 32 | 188.62 | 189.24 | 187.12 | 375.30 | 375.91 | 372.80 |
| 384 | 64 | 376.59 | 377.52 | 374.39 | 760.16 | 760.96 | 757.81 |
| 384 | 128 | 758.68 | 759.85 | 754.89 | 1459.63 | 1460.42 | 1457.38 |
Loading