Skip to content

Releases: NVIDIA/TensorRT

22.08

17 Aug 00:14
Compare
Choose a tag to compare

Commit used by the 22.08 TensorRT NGC container.

Changelog

Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information

Changed

  • Updated default protobuf version to 3.20.x
  • Updated ONNX-TensorRT submodule version to 22.08 tag
  • Updated sampleIOFormats and sampleAlgorithmSelector to use ONNX models over Caffe

Fixes

  • Fixed missing serialization member in CustomClipPlugin plugin
  • Fixed various Python import issues

Added

  • Added new DeBERTA demo
  • Added version 2 for disentangledAttentionPlugin to support DeBERTA v2

Removed

  • None

22.07

22 Jul 02:46
Compare
Choose a tag to compare

Commit used by the 22.07 TensorRT NGC container.

Changelog

Added

  • polygraphy-trtexec-plugin tool for Polygraphy
  • Multi-profile support for demoBERT
  • KV cache support for HF BART demo

Changed

  • Updated ONNX-GS to v0.3.20

Removed

  • None

TensorRT OSS v8.4.1 GA

14 Jun 21:25
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.

Key Features and Updates:

  • Samples enhancements

  • EfficientDet sample

    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.
  • HuggingFace transformer demo

    • Added BART model.
    • Performance speedup of GPT-2 greedy search using GPU implementation.
    • Fixed GPT2 onnx export failure due to 2G file size limitation.
    • Extended Megatron LayerNorm plugins to support larger hidden sizes.
    • Added performance benchmarking mode.
    • Enable tf32 format by default.
  • demoBERT enhancements

    • Add --duration flag to perf benchmarking script.
    • Fixed import of nvinfer_plugins library in demoBERT on Windows.
  • Torch-QAT toolkit

    • quant_bert.py module removed. It is now upstreamed to HuggingFace QDQBERT.
    • Use axis0 as default for deconv.
    • #1939 - Fixed path in classification_flow example.
  • Plugin enhancements

  • Build containers

    • Updated default cuda versions to 11.6.2.
    • CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
    • Install devtoolset-8 for updated g++ versions in CentOS7 container.
  • Tooling enhancements

  • trtexec enhancements

    • Added --layerPrecisions and --layerOutputTypes flags for specifying layer-wise precision and output type constraints.
    • Added --memPoolSize flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace flag has been deprecated.
    • "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
    • Use enqueueV2() instead of enqueue() when engine has explicit batch dimensions.

22.06

09 Jun 02:54
Compare
Choose a tag to compare

Commit used by the 22.06 TensorRT NGC container.

Changelog

Added

  • None

Changed

  • Disentangled attention (DMHA) plugin refactored
  • ONNX parser updated to 8.2GA

Removed

  • None

22.05

13 May 21:52
Compare
Choose a tag to compare

Commit used by the 22.05 TensorRT NGC container.

Changelog

Added

  • Disentangled attention plugin for DeBERTa
  • DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
  • Performance benchmarking mode to HuggingFace demo

Changed

  • Updated base TensorRT version to 8.2.5.1
  • Updated onnx-graphsurgeon v0.3.19 CHANGELOG
  • fp16 support for pillarScatterPlugin
  • #1939 - Fixed path in quantization classification_flow
  • Fixed GPT2 onnx export failure due to 2G limitation
  • Use axis0 as default for deconv in pytorch-quantization toolkit
  • Updated onnx export script for CoordConvAC sample
  • Install devtoolset-8 for updated g++ version in CentOS7 container

Removed

  • Usage of deprecated TensorRT APIs in samples removed
  • quant_bert.py module removed from pytorch-quantization

22.04

14 Apr 01:19
Compare
Choose a tag to compare

Commit used by the 22.04 TensorRT NGC container.

Changelog

Added

  • TensorRT Engine Explorer v0.1.0 README
  • Detectron 2 Mask R-CNN R50-FPN python sample
  • Model export script for sampleOnnxMnistCoordConvAC

Changed

  • Updated base TensorRT version to 8.2.4.2
  • Updated copyright headers with SPDX identifiers
  • Updated onnx-graphsurgeon v0.3.17 CHANGELOG
  • PyramidROIAlign plugin refactor and bug fixes
  • Fixed MultilevelCropAndResize crashes on Windows
  • #1583 - sublicense ieee/half.h under Apache2
  • Updated demo/BERT performance tables for rel-8.2
  • #1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
  • Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
  • Cleaned up sample READMEs

Removed

  • sampleNMT removed from samples

22.03

24 Mar 05:20
Compare
Choose a tag to compare

Commit used by the 22.03 TensorRT NGC container.

Changelog

Added

  • EfficientDet sample enhancements
    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.

Changed

  • Better decoupling of HuggingFace demo tests

22.02

04 Feb 18:40
Compare
Choose a tag to compare

Commit used by the 22.02 TensorRT NGC container.

Changelog

Added

Changed

  • Extend Megatron LayerNorm plugins to support larger hidden sizes
  • Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
  • Update base TensorRT version to 8.2.3.0
  • GPT-2 greedy search speedup - now runs on GPU
  • Updates to TensorRT developer tools
  • Updated ONNX parser to v8.2.3.0
  • Minor updates and bugfixes
    • Samples: TFOD, GPT-2, demo/BERT
    • Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS

Removed

  • Unused source file(s) in demo/BERT

22.01

24 Jan 23:49
Compare
Choose a tag to compare

Commit used by the 22.01 TensorRT NGC container.

TensorRT OSS v8.2.1 GA

24 Nov 18:19
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.

  • Updates since TensorRT 8.2.0 EA release.

  • Please refer to the TensorRT 8.2.1 GA release notes for more information.

  • ONNX parser v8.2.1

    • Removed duplicate constant layer checks that caused some performance regressions
    • Fixed expand dynamic shape calculations
    • Added parser-side checks for Scatter layer support
  • Sample updates

    • Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
    • Multiple enhancements in HuggingFace transformer demos
      • Added multi-batch support
      • Fixed resultant performance regression in batchsize=1
      • Fixed T5 large/T5-3B accuracy issues
      • Added notebooks for T5 and GPT-2
      • Added CPU benchmarking option
    • Deprecated kSTRICT_TYPES (strict type constraints). Equivalent behaviour now achieved by setting PREFER_PRECISION_CONSTRAINTS, DIRECT_IO, and REJECT_EMPTY_ALGORITHMS
    • Removed sampleMovieLens
    • Renamed sampleReformatFreeIO to sampleIOFormats
    • Add idleTime option for samples to control qps
    • Specify default value for precisionConstraints
    • Fixed reporting of TensorRT build version in trtexec
    • Fixed combineDescriptions typo in trtexec/tracer.py
    • Fixed usages of kDIRECT_IO
  • Plugin updates

    • EfficientNMS plugin support extended to TF-TRT, and for clang builds.
    • Sanitize header definitions for BERT fused MHA plugin
    • Separate C++ and cu files in splitPlugin to avoid PTX generation (required for CUDA enhanced compatibility support)
    • Enable C++14 build for plugins
  • ONNX tooling updates

  • Build and container fixes

    • Add SM86 target to default GPU_ARCHS for platforms with cuda-11.1+
    • Remove deprecated SM_35 and add SM_60 to default GPU_ARCHS
    • Skip CUB builds for cuda 11.0+ #1455
    • Fixed cuda-10.2 container build failures in Ubuntu 20.04
    • Add native ARM server build container
    • Install devtoolset-8 for updated g++ version in CentOS7
    • Added a note on supporting c++14 builds for CentOS7
    • Fixed docker build for large UIDs #1373
    • Updated README instructions for Jetpack builds
  • demo enhancements

    • Updated Tacotron2 instructions and add CPU benchmarking
    • Fixed issues in demoBERT python notebook
  • Documentation updates

    • Updated Python documentation for add_reduce, add_top_k, and ISoftMaxLayer
    • Renamed default GitHub branch to main and updated hyperlinks