05 Oct 17:03

rajeevsrao

80674b3

21.10

Commit used by the 21.10 TensorRT NGC container.

Changelog

Added

Benchmark script for demoBERT-Megatron
Dynamic Input Shape support for EfficientNMS plugin
Support empty dimensions in ONNX
INT32 and dynamic clips through elementwise in ONNX parser

Changed

Bump TensorRT version to 8.0.3.4
Use static shape for only single batch single sequence input in demo/BERT
Revert to using native FC layer in demo/BERT and FCPlugin only on older GPUs.
Update demo/Tacotron2 for TensorRT 8.0
Updates to TensorRT developer tools
- Polygraphy v0.33.0
  - Added various examples, a CLI User Guide and how-to guides.
  - Added experimental support for DLA.
  - Added a data to-input tool that can combine inputs/outputs created by --save-inputs/--save-outputs.
  - Added a PluginRefRunner which provides CPU reference implementations for TensorRT plugins
  - Made several performance improvements in the Polygraphy CUDA wrapper.
  - Removed the to-json tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
- Bugfixes and documentation updates in pytorch-quantization toolkit.
Bumped up package versions: tensorflow-gpu 2.5.1, pillow 8.3.2
ONNX parser enhancements and bugfixes
- Update ONNX submodule to v1.8.0
- Update convDeconvMultiInput function to properly handle deconvs
- Update RNN documentation
- Update QDQ axis assertion
- Fix bidirectional activation alpha and beta values
- Fix opset10 Resize
- Fix shape tensor unsqueeze
- Mark BOOL tiles as unsupported
- Remove unnecessary shape tensor checks

Removed

Assets 2

05 Oct 19:03

rajeevsrao

8.2.0-EA

2d517d2

TensorRT OSS v8.2.0 EA Pre-release

Pre-release

TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.

Added

Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
Added support for the following ONNX operators:
- Einsum
- IsNan
- GatherND
- Scatter
- ScatterElements
- ScatterND
- Sign
- Round
Added support for building TensorRT Python API on Windows.

Updated

Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs, IExecutionContext: getEnqueueEmitsProfile(), setEnqueueEmitsProfile(), and reportToProfiler() which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
- Eliminated the global logger; each Runtime, Builder or Refitter now has its own logger.
- Added new operators: IAssertionLayer, IConditionLayer, IEinsumLayer, IIfConditionalBoundaryLayer, IIfConditionalOutputLayer, IIfConditionalInputLayer, and IScatterLayer.
- Added new IGatherLayer modes: kELEMENT and kND
- Added new ISliceLayer modes: kFILL, kCLAMP, and kREFLECT
- Added new IUnaryLayer operators: kSIGN and kROUND
- Added new runtime class IEngineInspector that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
- ProfilingVerbosity enums have been updated to show their functionality more explicitly.
Updated TensorRT OSS container defaults to cuda 11.4
CMake to target C++14 builds.
Updated following ONNX operators:
- Gather and GatherElements implementations to natively support negative indices
- Pad layer to support ND padding, along with edge and reflect padding mode support
- If layer with general performance improvements.

Removed

Removed sampleMLP.
Several flags of trtexec have been deprecated:
- --explicitBatch flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
- --explicitPrecision flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
- --nvtxMode=[verbose|default|none] has been deprecated in favor of --profilingVerbosity=[detailed|layer_names_only|none] to show its functionality more explicitly.

Signed-off-by: Rajeev Rao rajeevrao@nvidia.com

Assets 2

22 Sep 17:28

rajeevsrao

21.09

dc51748

21.09

Commit used by the 21.09 TensorRT NGC container.

Changelog

Added

Add ONNX2TRT_VERSION overwrite in CMake.

Changed

Updates to TensorRT developer tools
- ONNX-GraphSurgeon v0.3.12
- pytorch-quantization toolkit v2.1.1
Fix assertion in EfficientNMSPlugin

Removed

Assets 2

05 Aug 20:15

rajeevsrao

21.08

eb8442d

21.08

Commit used by the 21.08 TensorRT NGC container.

Changelog

Added

Add demoBERT and demoBERT-MT (sparsity) benchmark data for TensorRT 8.
Added example python notebooks
- BERT - Q&A with TensorRT
- EfficientNet - Object Detection with TensorRT

Changed

Updated samples and plugins directory structure
Updates to TensorRT developer tools
- Polygraphy v0.31.1
- ONNX-GraphSurgeon v0.3.11
- pytorch-quantization toolkit v2.1.1
README fix to update build command for native aarch64 builds.

Removed

Assets 2

21 Jul 18:21

rajeevsrao

21.07

eeca567

21.07

Commit used by the 21.07 TensorRT NGC container

Corresponds to the TensorRT-OSS 8.0.1 Release. Details here.

Assets 2

02 Jul 23:37

rajeevsrao

8.0.1

eb5de99

TensorRT OSS v8.0.1

TensorRT OSS release corresponding to TensorRT 8.0.1.6 GA release.

Added

Added support for the following ONNX operators: Celu, CumSum, EyeLike, GatherElements, GlobalLpPool, GreaterOrEqual, LessOrEqual, LpNormalization, LpPool, ReverseSequence, and SoftmaxCrossEntropyLoss.
Rehauled Resize ONNX operator, now fully supporting the following modes:
- Coordinate Transformation modes: half_pixel, pytorch_half_pixel, tf_half_pixel_for_nn, asymmetric, and align_corners.
- Modes: nearest, linear.
- Nearest Modes: floor, ceil, round_prefer_floor, round_prefer_ceil.
Added support for multi-input ONNX ConvTranpose operator.
Added support for 3D spatial dimensions in ONNX InstanceNormalization.
Added support for generic 2D padding in ONNX.
ONNX QuantizeLinear and DequantizeLinear operators leverage IQuantizeLayer and IDequantizeLayer.
- Added support for tensor scales.
- Added support for per-axis quantization.
Added EfficientNMS_TRT, EfficientNMS_ONNX_TRT plugins and experimental support for ONNX NonMaxSuppression operator.
Added ScatterND plugin.
Added TensorRT QuickStart Guide.
Added new samples: engine_refit_onnx_bidaf builds an engine from ONNX BiDAF model and refits engine with new weights, efficientdet and efficientnet samples for demonstrating Object Detection using TensorRT.
Added support for Ubuntu20.04 and RedHat/CentOS 8.3.
Added Python 3.9 support.

Changed

Update Polygraphy to v0.30.3.
Update ONNX-GraphSurgeon to v0.3.10.
Update Pytorch Quantization toolkit to v2.1.0.
Notable TensorRT API updates
- TensorRT now declares API’s with the noexcept keyword. All TensorRT classes that an application inherits from (such as IPluginV2) must guarantee that methods called by TensorRT do not throw uncaught exceptions, or the behavior is undefined.
- Destructors for classes with destroy() methods were previously protected. They are now public, enabling use of smart pointers for these classes. The destroy() methods are deprecated.
Moved RefitMap API from ONNX parser to core TensorRT.
Various bugfixes for plugins, samples and ONNX parser.
Port demoBERT to tensorflow2 and update UFF samples to leverage nvidia-tensorflow1 container.

Removed

IPlugin and IPluginFactory interfaces were deprecated in TensorRT 6.0 and have been removed in TensorRT 8.0. We recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt and IPluginV2IOExt interfaces. For more information, refer to Migrating Plugins From TensorRT 6.x Or 7.x To TensorRT 8.x.x.
- For plugins based on IPluginV2DynamicExt and IPluginV2IOExt, certain methods with legacy function signatures (derived from IPluginV2 and IPluginV2Ext base classes) which were deprecated and marked for removal in TensorRT 8.0 will no longer be available.
Removed samplePlugin since it showcased IPluginExt interface, which is no longer supported in TensorRT 8.0.
Removed sampleMovieLens and sampleMovieLensMPS.
Removed Dockerfile for Ubuntu 16.04. TensorRT 8.0 debians for Ubuntu 16.04 require python 3.5 while minimum required python version for TensorRT OSS is 3.6.
Removed support for PowerPC builds, consistent with TensorRT GA releases.

Notes

We had deprecated the Caffe Parser and UFF Parser in TensorRT 7.0. They are still tested and functional in TensorRT 8.0, however, we plan to remove the support in a future release. Ensure you migrate your workflow to use tf2onnx, keras2onnx or TensorFlow-TensorRT (TF-TRT).

Signed-off-by: Rajeev Rao rajeevrao@nvidia.com

Assets 2

23 Jun 19:08

rajeevsrao

21.06

a2b3d3d

21.06

Commit used by the 21.06 TensorRT NGC container

Changelog

Added

Add switch for batch-agnostic mode in NMS plugin
Add missing model.py in uff_custom_plugin sample

Changed

Update to Polygraphy v0.29.2
Update to ONNX-GraphSurgeon v0.3.9
Fix numerical errors for float type in NMS/batchedNMS plugins
Update demoBERT input dimensions to match Triton requirement #1051
Optimize TLT MaskRCNN plugins:
- enable fp16 precision in multilevelCropAndResizePlugin and multilevelProposeROIPlugin
- Algorithms optimization for NMS kernels and ROIAlign kernel
- Fix invalid cuda config issue when bs is larger than 32
- Fix issues found on Jetson NANO

Removed

Removed fcplugin from demoBERT to improve inference latency on GA100/Turing

Assets 2

19 May 21:18

rajeevsrao

21.05

611dba6

21.05

Commit used by the 21.05 TensorRT NGC container

Changelog

Added

Extended support for ONNX operator InstanceNormalization to 5D tensors
Support negative indices in ONNX Gather operator
Add support for importing ONNX double-typed weights as float
ONNX-GraphSurgeon (v0.3.7) support for models with externally stored weights

Changed

Update ONNX-TensorRT to 21.05
Relicense ONNX-TensorRT under Apache2
demoBERT builder fixes for multi-batch
Speedup demoBERT build using global timing cache and disable cuDNN tactics
Standardize python package versions across OSS samples
Bugfixes in multilevelProposeROI and bertQKV plugin
Fix memleaks in samples logger

Assets 2

12 Apr 21:57

rajeevsrao

21.04

4c99d07

21.04

Commit used by the 21.04 TensorRT NGC container

Changelog

Added

SM86 kernels for BERT MHA plugin
Added opset13 support for SoftMax, LogSoftmax, Squeeze, and Unsqueeze.
Added support for the EyeLike and GatherElements operators.

Changed

Updated TensorRT version to v7.2.3.4.
Update to ONNX-TensorRT 21.03
ONNX-GraphSurgeon (v0.3.4) - updates fold_constants to correctly exit early.
Set default CUDA_INSTALL_DIR #798
Plugin bugfixes, qkv kernels for sm86
Fixed GroupNorm CMakeFile for cu sources #1083
Permit groupadd with non-unique GID in build containers #1091
Avoid reinterpret_cast #146
Clang-format plugins and samples
Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp #1028
Update BERT plugin documentation.

Removed

Removes extra terminate call in InstanceNorm

Assets 2

10 Mar 19:32

rajeevsrao

21.03

c8a30a4

21.03

Commit used by the 21.03 TensorRT NGC container

Changelog

Added

Optimized FP16 NMS/batchedNMS plugins with n-bit radix sort and based on IPluginV2DynamicExt
ProposalDynamic and CropAndResizeDynamic plugins based on IPluginV2DynamicExt

Changed

ONNX-TensorRT v21.03 update
ONNX-GraphSurgeon v0.3.3 update
Bugfix for scaledSoftmax kernel #1096

Removed

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Added

Changed

Removed

Added

Updated

Removed

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Removed

Added

Changed

Removed

Notes

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Removed

Releases: NVIDIA/TensorRT

21.10

Changelog

Added

Changed

Removed

TensorRT OSS v8.2.0 EA

Added

Updated

Removed

21.09

Changelog

Added

Changed

Removed

21.08

Changelog

Added

Changed

Removed

21.07

TensorRT OSS v8.0.1

Added

Changed

Removed

Notes

21.06

Changelog

Added

Changed

Removed

21.05

Changelog

Added

Changed

21.04

Changelog

Added

Changed

Removed

21.03

Changelog

Added

Changed

Removed