TensorRT OSS v8.2.0 EA
Pre-release
Pre-release
TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.
Added
- Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
- Added support for the following ONNX operators:
Einsum
IsNan
GatherND
Scatter
ScatterElements
ScatterND
Sign
Round
- Added support for building TensorRT Python API on Windows.
Updated
- Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs,
IExecutionContext: getEnqueueEmitsProfile()
,setEnqueueEmitsProfile()
, andreportToProfiler()
which can be used to collect layer profiling info when the inference is launched as a CUDA graph. - Eliminated the global logger; each
Runtime
,Builder
orRefitter
now has its own logger. - Added new operators:
IAssertionLayer
,IConditionLayer
,IEinsumLayer
,IIfConditionalBoundaryLayer
,IIfConditionalOutputLayer
,IIfConditionalInputLayer
, andIScatterLayer
. - Added new
IGatherLayer
modes:kELEMENT
andkND
- Added new
ISliceLayer
modes:kFILL
,kCLAMP
, andkREFLECT
- Added new
IUnaryLayer
operators:kSIGN
andkROUND
- Added new runtime class
IEngineInspector
that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc. ProfilingVerbosity
enums have been updated to show their functionality more explicitly.
- Added three new APIs,
- Updated TensorRT OSS container defaults to cuda 11.4
- CMake to target C++14 builds.
- Updated following ONNX operators:
Gather
andGatherElements
implementations to natively support negative indicesPad
layer to support ND padding, along withedge
andreflect
padding mode supportIf
layer with general performance improvements.
Removed
- Removed
sampleMLP
. - Several flags of trtexec have been deprecated:
--explicitBatch
flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.--explicitPrecision
flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.--nvtxMode=[verbose|default|none]
has been deprecated in favor of--profilingVerbosity=[detailed|layer_names_only|none]
to show its functionality more explicitly.
Signed-off-by: Rajeev Rao rajeevrao@nvidia.com