Releases: NVIDIA/TensorRT
22.08
Commit used by the 22.08 TensorRT NGC container.
Changelog
Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information
Changed
- Updated default protobuf version to 3.20.x
- Updated ONNX-TensorRT submodule version to
22.08
tag - Updated
sampleIOFormats
andsampleAlgorithmSelector
to useONNX
models overCaffe
Fixes
- Fixed missing serialization member in
CustomClipPlugin
plugin - Fixed various Python import issues
Added
- Added new DeBERTA demo
- Added version 2 for
disentangledAttentionPlugin
to support DeBERTA v2
Removed
- None
22.07
Commit used by the 22.07 TensorRT NGC container.
Changelog
Added
polygraphy-trtexec-plugin
tool for Polygraphy- Multi-profile support for demoBERT
- KV cache support for HF BART demo
Changed
- Updated ONNX-GS to
v0.3.20
Removed
- None
TensorRT OSS v8.4.1 GA
TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
- Updates since TensorRT 8.2.1 GA release.
- Please refer to the TensorRT 8.4.1 GA release notes for more information.
Key Features and Updates:
-
Samples enhancements
- Added Detectron2 Mask R-CNN R50-FPN python sample
- Added a quickstart guide for NVidia Triton deployment workflow.
- Added onnx export script for sampleOnnxMnistCoordConvAC
- Removed
sampleNMT
. - Removed usage of deprecated TensorRT APIs in samples.
-
EfficientDet sample
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
-
HuggingFace transformer demo
- Added BART model.
- Performance speedup of GPT-2 greedy search using GPU implementation.
- Fixed GPT2 onnx export failure due to 2G file size limitation.
- Extended Megatron LayerNorm plugins to support larger hidden sizes.
- Added performance benchmarking mode.
- Enable tf32 format by default.
-
demoBERT
enhancements- Add
--duration
flag to perf benchmarking script. - Fixed import of
nvinfer_plugins
library in demoBERT on Windows.
- Add
-
Torch-QAT toolkit
quant_bert.py
module removed. It is now upstreamed to HuggingFace QDQBERT.- Use axis0 as default for deconv.
- #1939 - Fixed path in
classification_flow
example.
-
Plugin enhancements
- Added Disentangled attention plugin,
DisentangledAttention_TRT
, to support DeBERTa model. - Added Multiscale deformable attention plugin,
MultiscaleDeformableAttnPlugin_TRT
, to support DDETR model. - Added new plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin.
- Refactored EfficientNMS plugin to support TF-TRT and implicit batch mode.
fp16
support forpillarScatterPlugin
.
- Added Disentangled attention plugin,
-
Build containers
- Updated default cuda versions to
11.6.2
. - CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
- Install
devtoolset-8
for updated g++ versions in CentOS7 container.
- Updated default cuda versions to
-
Tooling enhancements
- Added Tensorflow Quantization Toolkit v0.1.0 for Quantization-Aware-Training of Tensorflow 2.x Keras models.
- Added TensorRT Engine Explorer v0.1.2 for inspecting TensorRT engine plans and associated inference profiling data.
- Updated Polygraphy to v0.38.0.
- Updated onnx-graphsurgeon to v0.3.19.
-
trtexec
enhancements- Added
--layerPrecisions
and--layerOutputTypes
flags for specifying layer-wise precision and output type constraints. - Added
--memPoolSize
flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the--workspace
flag has been deprecated. - "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
- Use
enqueueV2()
instead ofenqueue()
when engine has explicit batch dimensions.
- Added
22.06
Commit used by the 22.06 TensorRT NGC container.
Changelog
Added
- None
Changed
- Disentangled attention (DMHA) plugin refactored
- ONNX parser updated to 8.2GA
Removed
- None
22.05
Commit used by the 22.05 TensorRT NGC container.
Changelog
Added
- Disentangled attention plugin for DeBERTa
- DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
- Performance benchmarking mode to HuggingFace demo
Changed
- Updated base TensorRT version to 8.2.5.1
- Updated onnx-graphsurgeon v0.3.19 CHANGELOG
- fp16 support for pillarScatterPlugin
- #1939 - Fixed path in quantization
classification_flow
- Fixed GPT2 onnx export failure due to 2G limitation
- Use axis0 as default for deconv in pytorch-quantization toolkit
- Updated onnx export script for CoordConvAC sample
- Install devtoolset-8 for updated g++ version in CentOS7 container
Removed
- Usage of deprecated TensorRT APIs in samples removed
quant_bert.py
module removed from pytorch-quantization
22.04
Commit used by the 22.04 TensorRT NGC container.
Changelog
Added
- TensorRT Engine Explorer v0.1.0 README
- Detectron 2 Mask R-CNN R50-FPN python sample
- Model export script for sampleOnnxMnistCoordConvAC
Changed
- Updated base TensorRT version to 8.2.4.2
- Updated copyright headers with SPDX identifiers
- Updated onnx-graphsurgeon v0.3.17 CHANGELOG
PyramidROIAlign
plugin refactor and bug fixes- Fixed
MultilevelCropAndResize
crashes on Windows - #1583 - sublicense ieee/half.h under Apache2
- Updated demo/BERT performance tables for rel-8.2
- #1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
- Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
- Cleaned up sample READMEs
Removed
- sampleNMT removed from samples
22.03
Commit used by the 22.03 TensorRT NGC container.
Changelog
Added
- EfficientDet sample enhancements
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
Changed
- Better decoupling of HuggingFace demo tests
22.02
Commit used by the 22.02 TensorRT NGC container.
Changelog
Added
- New plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin
Changed
- Extend Megatron LayerNorm plugins to support larger hidden sizes
- Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
- Update base TensorRT version to 8.2.3.0
- GPT-2 greedy search speedup - now runs on GPU
- Updates to TensorRT developer tools
- Updated ONNX parser to v8.2.3.0
- Minor updates and bugfixes
- Samples: TFOD, GPT-2, demo/BERT
- Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS
Removed
- Unused source file(s) in demo/BERT
22.01
Commit used by the 22.01 TensorRT NGC container.
TensorRT OSS v8.2.1 GA
TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
-
Updates since TensorRT 8.2.0 EA release.
-
Please refer to the TensorRT 8.2.1 GA release notes for more information.
-
ONNX parser v8.2.1
- Removed duplicate constant layer checks that caused some performance regressions
- Fixed expand dynamic shape calculations
- Added parser-side checks for
Scatter
layer support
-
Sample updates
- Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
- Multiple enhancements in HuggingFace transformer demos
- Added multi-batch support
- Fixed resultant performance regression in batchsize=1
- Fixed T5 large/T5-3B accuracy issues
- Added notebooks for T5 and GPT-2
- Added CPU benchmarking option
- Deprecated
kSTRICT_TYPES
(strict type constraints). Equivalent behaviour now achieved by settingPREFER_PRECISION_CONSTRAINTS
,DIRECT_IO
, andREJECT_EMPTY_ALGORITHMS
- Removed
sampleMovieLens
- Renamed sampleReformatFreeIO to sampleIOFormats
- Add
idleTime
option for samples to control qps - Specify default value for
precisionConstraints
- Fixed reporting of TensorRT build version in trtexec
- Fixed
combineDescriptions
typo in trtexec/tracer.py - Fixed usages of
kDIRECT_IO
-
Plugin updates
EfficientNMS
plugin support extended to TF-TRT, and for clang builds.- Sanitize header definitions for BERT fused MHA plugin
- Separate C++ and cu files in
splitPlugin
to avoid PTX generation (required for CUDA enhanced compatibility support) - Enable C++14 build for plugins
-
ONNX tooling updates
- onnx-graphsurgeon upgraded to v0.3.14
- Polygraphy upgraded to v0.33.2
- pytorch-quantization toolkit upgraded to v2.1.2
-
Build and container fixes
- Add
SM86
target to defaultGPU_ARCHS
for platforms with cuda-11.1+ - Remove deprecated
SM_35
and addSM_60
to defaultGPU_ARCHS
- Skip CUB builds for cuda 11.0+ #1455
- Fixed cuda-10.2 container build failures in Ubuntu 20.04
- Add native ARM server build container
- Install devtoolset-8 for updated g++ version in CentOS7
- Added a note on supporting c++14 builds for CentOS7
- Fixed docker build for large UIDs #1373
- Updated README instructions for Jetpack builds
- Add
-
demo enhancements
- Updated Tacotron2 instructions and add CPU benchmarking
- Fixed issues in demoBERT python notebook
-
Documentation updates
- Updated Python documentation for
add_reduce
,add_top_k
, andISoftMaxLayer
- Renamed default GitHub branch to
main
and updated hyperlinks
- Updated Python documentation for