Skip to content

Releases: pytorch/TensorRT

TRTorch v0.4.1

06 Oct 19:14
92d6851
Compare
Choose a tag to compare

TRTorch v0.4.1

Bug Fixes for Module Ignorelist for Partial Compilation, trtorch.Device, Version updates for PyTorch, TensorRT, cuDNN

Target Platform Changes

This is the first patch of TRTorch v0.4. It now targets by default PyTorch 1.9.1, TensorRT 8.0.3.4 and cuDNN 8.2.4.15 and CUDA 11.1. Older versions of PyTorch, TensorRT, cuDNN are still supported in the same manner as TRTorch v0.4.0

Module Ignorelist for Partial Compilation

There was an issue with the pass marking modules to be ignored during compilation where it unsafely assumed that methods are named forward all the way down the module tree. While this was fine for 1.8.0, with PyTorch 1.9.0, the TorchScript codegen changed slightly to sometimes use methods of other names for modules which reduce trivially to a functional api. This fix now will identify method calls as the recursion point and then use those method calls to select modules to recurse on. It will also check to verify existence of these modules and methods before recursing. Finally this pass was run by default even if the ignore list was empty causing issues for users not using the feature. Therefore this pass is now disabled unless explicitly enabled

trtorch.Device

Some of the constructors for trtorch.Device would not work or incorrectly configure the device. This patch will fix those issues.

Dependencies

- Bazel 4.0.0
- LibTorch 1.9.1
- CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.3.4
- TensorRT 8.0.3.4

0.4.1 (2021-10-06)

Bug Fixes

  • //core/lowering: Fixes module level fallback recursion (2fc612d)
  • Move some lowering passes to graph level logging (0266f41)
  • //py: Fix trtorch.Device alternate contructor options (ac26841)

Operators Supported

Operators Currently Supported Through Converters

  • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
  • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
  • aten::abs(Tensor self) -> (Tensor)
  • aten::acos(Tensor self) -> (Tensor)
  • aten::acosh(Tensor self) -> (Tensor)
  • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
  • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
  • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
  • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::asin(Tensor self) -> (Tensor)
  • aten::asinh(Tensor self) -> (Tensor)
  • aten::atan(Tensor self) -> (Tensor)
  • aten::atanh(Tensor self) -> (Tensor)
  • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
  • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
  • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::ceil(Tensor self) -> (Tensor)
  • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
  • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
  • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
  • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
  • aten::cos(Tensor self) -> (Tensor)
  • aten::cosh(Tensor self) -> (Tensor)
  • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
  • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
  • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
  • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
  • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::erf(Tensor self) -> (Tensor)
  • aten::exp(Tensor self) -> (Tensor)
  • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
  • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
  • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
  • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
  • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
  • aten::floor(Tensor self) -> (Tensor)
  • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
  • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::gelu(Tensor self) -> (Tensor)
  • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
  • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
  • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
  • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
  • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
  • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
  • aten::log(Tensor self) -> (Tensor)
  • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
  • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
  • aten::matmul(Tensor self, Tensor other) -> (Tensor)
  • aten::max(Tensor self) -> (Tensor)
  • aten::max.other(Tensor self, Tensor other) -> (Tensor)
  • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
  • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
  • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
  • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::min(Tensor self) -> (Tensor)
  • aten::min.other(Tensor self, Tensor other) -> (Tensor)
  • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
  • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
  • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::neg(Tensor self) -> (Tensor)
  • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
  • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
  • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
  • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
  • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
  • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
  • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::reciprocal(Tensor self) -> (Tensor)
  • aten::relu(Tensor input) -> (Tensor)
  • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
  • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
  • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
  • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
  • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
  • aten::reshape(Tensor self, int[] shape) -> (Tensor)
  • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
  • aten::sig...
Read more

TRTorch v0.4.0

24 Aug 21:49
Compare
Choose a tag to compare

TRTorch v0.4.0

Support for PyTorch 1.9, TensorRT 8.0. Introducing INT8 Execution for QAT models, Module Based Partial Compilation, Auto Device Configuration, Input Class, Usability Improvements, New Converters, Bug Fixes

Target Platform Changes

This is the fourth beta release of TRTorch, targeting PyTorch 1.9, CUDA 11.1 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatibile source for Jetpack 4.5. When building on Jetson, the flag --platforms //toolchains:jetpack_4.x must be now be provided for C++ compilation to select the correct dependency paths. For python by default it is assumed the Jetpack version is 4.6. To override this add the --jetpack-version 4.5 flag when building.

TensorRT 8.0

This release adds support for compiling models trained with Quantization aware training (QAT) allowing users using the TensorRT PyTorch Quantization Toolkit (https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization) to compile their models using TRTorch. For more information and a tutorial, refer to https://www.github.com/NVIDIA/TRTorch/tree/v0.4.0/examples/int8/qat. It also adds support for sparsity via the sparse_weights flag in the compile spec. This allows TensorRT to utilize specialized hardware in Ampere GPUs to minimize unnecessary computation and therefore increase computational efficiency.

Partial Compilation

In v0.4.0 the partial compilation feature of TRTorch can now be considered beta level stability. New in this release is the ability to specify entire PyTorch modules to run in PyTorch explicitly as part of partial compilation. This should let users isolate troublesome code easily when compiling. Again, feedback on this feature is greatly appreciated.

Automatic Device Configuration at Runtime

v0.4.0 also changes the "ABI" of TRTorch to now include information about the target device for the program. Programs compiled with v0.4.0 will look for and select the most compatible available device. The rules used are: Any valid device option must have the same SM capability as the device building the engine. From there, TRTorch prefers the same device (e.g. Built on A100 so A100 is better than A30) and finally prefers the same device ID. Users will be warned if this selected device is not the current active device in the course of execution as overhead may be incurred in transferring input tensors from the current device to the target device. Users can then modify their code to avoid this. Due to this ABI change, existing compiled TRTorch programs are incompatible with the TRTorch v0.4.0 runtime. From v0.4.0 onwards an internal ABI version will check program compatibility. This ABI version is only incremented with breaking changes to the ABI.

API Changes (Input, enabled_precisions, Device)

TRTorch v0.4.0 changes the API for specifying Input shapes and data types to provide users more control over configuration. The new API makes use of the class trtorch.Input which lets users set the shape (or shape range) as well as memory layout and expected data type. These input specs are set in the input field of the CompileSpec.

"inputs": [
        trtorch.Input((1, 3, 224, 224)), # Static input shape for input #1
        trtorch.Input(
            min_shape=(1, 224, 224, 3),
            opt_shape=(1, 512, 512, 3),
            max_shape=(1, 1024, 1024, 3),
            dtype=torch.int32,
            format=torch.channel_last,
        ) # Dynamic input shape for input #2, input type int and channel last format
    ],

The legacy input_shapes field and associated usage with lists of tuples/InputRanges should now be considered deprecated. They remain usable in v0.4.0 but will be removed in the next release. Similarly, the compile spec field op_precision is now also deprecated in favor of enabled_precisions. enabled_precisions is a set containing the data types that kernels will be allowed to use. Whereas setting op_precision = torch.int8 would implicitly enable FP32 and FP16 kernels as well, now enabled_precisions should be set as {torch.float32, torch.float16, torch.int8} to do the same. In order to maintain similar behavior to normal PyTorch, if FP16 is the lowest precision enabled but no explicit data type is set for the inputs to the model, the expectation will be that inputs will be in FP16 . For other cases (FP32, INT8) FP32 is the default, similar to PyTorch and previous versions of TRTorch. Finally in the Python API, a class trtorch.Device has been added. While users can continue to use torch.Device or other torch APIs, trtorch.Device allows for better control for the specific use cases of compiling with TRTorch (e.g. setting DLA core and GPU fallback). This class is very similar to the C++ version with a couple additions of syntactic sugar to make the class easier and more familiar to use:

trtorch.Device("dla:0", allow_gpu_fallback=False) #Set device as DLA Core 0 (implicitly sets the GPU managing DLA cores as the GPU and sets fallback to false)

trtorch.Device can be used instead of a dictionary in the compile spec if desired.

trtorchc has been updated to reflect these API changes. Users can set the shape, dtype and format of inputs from the command line using the following format "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]@DTYPE%FORMAT" e.g. (3, 3, 32,32)@f16%NHWC. -p is now a repeatable flag to enable multiple precisions. Also added are repeatable flags --ffm and --ffo to mark specific modules and operators for running in PyTorch respectively. To use these two options, --allow-torch-fallback should be set. Options for embedding serialized engines (--embed-engine) and sparsity (--sparse-weights) added as well.

Usability

Finally, TRTorch v0.4.0 also now includes the ability to provide backtraces for locations in your model which TRTorch does not support. This can help in identifying locations in the model that might need to change for TRTorch support or modules which should run fully in PyTorch via partial compilation.

Dependencies

- Bazel 4.0.0
- LibTorch 1.9.0
- CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.2.3
- TensorRT 8.0.1.6

0.4.0 (2021-08-24)

  • feat(serde)!: Refactor CudaDevice struct, implement ABI versioning, (9327cce)
  • feat(//py)!: Implementing top level python api changes to reflect new (482265f)
  • feat(//cpp)!: Changes to TRTorch C++ api reflecting Input and (08b4942)
  • feat!: Pytorch 1.9 version bump (a12d249)
  • feat(//core/runtime)!: Better and more portable names for engines (6eb3bb2)

Bug Fixes

  • //core/conversion/conversionctx: Guard final engine building (dfa9ae8)
  • //core/lowering: use lower_info as parameter (370aeb9)
  • //cpp/ptq: fixing bad accuracy in just the example code (7efa11d)
  • //py: Fix python setup.py with new libtrtorch.so location (68ba63c)
  • //tests: fix optional jetson tests (4c32a83)
  • //tests: use right type for masked_fill test (4a5c28f)
  • aten::cat: support neg dim for cat (d8ca182)
  • aten::select and aten::var: Fix converters to handle negative axes (3a734a2)
  • aten::slice: Allow slicing of pytorch tensors (50f012e)
  • aten::tensor: Last dim doesnt always get written right (b68d4aa)
  • aten::tensor: Last dim doesnt always get written right (38744bc)
  • Address review comments, fix failing tests due to bool mishandling (13eef91)
  • Final working version of QAT in TRTorch (521a0cb)
  • fix aten::sub.scalar operator (9a09514)
  • Fix linear lowering pass, lift layer_norm scale layer restriction and matmul layer nbdims restriction (930d582)
  • Fix testcases using old InputRange API (ff87956)
  • Fix TRT8 engine capability flags (2b69742)
  • Fix warnings thrown by noexcept functions (c5f7eea)
  • Fix warnings thrown by noexcept functions (ddc8950)
  • Minor fixes to qat scripts (b244423)
  • Restrict TRTorch to compile only forward methods (9f006d5)
  • Transfer calibration data to gpu when it is not a batch (23739cb)
  • typo in aten::batch_norm (d47f48f)
  • qat: Rescale input data for C++ application (9dc6061)
  • Use len() to get size of datase...
Read more

TRTorch v0.3.0

14 May 00:55
Compare
Choose a tag to compare

TRTorch v0.3.0

Support for PyTorch 1.8.x (by default 1.8.1), Introducing Plugin Library, PTQ from Python, Arbitrary TRT engine embedding, Preview Release of Partial Compilation, New Converters, Bug Fixes

This is the third beta release of TRTorch, targeting PyTorch 1.8.x, CUDA 11.1 (on x86_64), TensorRT 7.2, cuDNN 8. TRTorch 0.3.0 binary releases target PyTorch 1.8.1 specifically, these builds are not compatible with 1.8.0, though the source code remains compatible with any PyTorch 1.8.x version. On aarch64 TRTorch targets JetPack 4.5.x. This release introduces libtrtorch_plugins.so. This library is a portable distribution of all TensorRT plugins used in TRTorch. The intended usecase is to support TRTorch programs that utilize TensorRT plugins deployed on systems with only the runtime library available or in the case that TRTorch was used to create a TensorRT engine to be run outside the TRTorch runtime, which makes uses of TRTorch plugins. An example on how to use this library can be found here: https://www.github.com/NVIDIA/TRTorch/tree/v0.3.0/examples/sample_rt_app. TRTorch 0.3.0 also now allows users to repurpose PyTorch Dataloaders to do post training quantization in Python similar to the workflow supported in C++ currently. It also introduces a new API to wrap arbitrary TensorRT engines in a PyTorch Module wrapper, making the serializable by torch.jit.save and completely compatible with other PyTorch modules. Finally, TRTorch 0.3.0 also includes a preview of the new partial compilation capability of the TRTorch compiler. With this feature, users can now instruct TRTorch to keep operations that are not supported but TRTorch/TensorRT in PyTorch. Partial compilation should be considered alpha stability and we are seeking feedback on bugs, pain points and feature requests surrounding using this feature.

Dependencies:

- Bazel 4.0.0
- LibTorch 1.8.1 (on x86_64), 1.8.0 (on aarch64)
- CUDA 11.1 (on x86_64, by default , newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.1.1
- TensorRT 7.2.3.4

0.3.0 (2021-05-13)

Bug Fixes

  • //plugins: Readding cuBLAS BUILD to allow linking of libnvinfer_plugin on Jetson (a8008f4)

  • //tests/../concat: Concat test fix (2432fb8)

  • //tests/core/partitioning: Fixing some issues with the partition (ff89059)

  • erase the repetitive nodes in dependency analysis (80b1038)

  • fix a typo for debug (c823ebd)

  • fix typo bug (e491bb5)

  • aten::linear: Fixes new issues in 1.8 that cause script based (c5057f8)

  • register the torch_fallback attribute in Python API (8b7919f)

  • support expand/repeat with IValue type input (a4882c6)

  • support shape inference for add_, support non-tensor arguments for segmented graphs (46950bb)

  • feat!: Updating versions of CUDA, cuDNN, TensorRT and PyTorch (71c4dcb)

  • feat(WORKSPACE)!: Updating PyTorch version to 1.8.1 (c9aa99a)

Features

  • //.github: Linter throws 1 when there needs to be style changes to (a39dea7)
  • //core: New API to register arbitrary TRT engines in TorchScript (3ec836e)
  • //core/conversion/conversionctx: Adding logging for truncated (96245ee)
  • //core/partitioing: Adding ostream for Partition Info (b3589c5)
  • //core/partitioning: Add an ostream implementation for (ee536b6)
  • //core/partitioning: Refactor top level partitioning API, fix a bug with (abc63f6)
  • //core/plugins: Gating plugin logging based on global config (1d5a088)
  • added user level API for fallback (f4c29b4)
  • allow users to set fallback block size and ops (6d3064a)
  • insert nodes by dependencies for nonTensor inputs/outputs (4e32eff)
  • support aten::arange converter (014e381)
  • support aten::transpose with negative dim (4a1d2f3)
  • support Int/Bool and other constants' inputs/outputs for TensorRT segments (54e407e)
  • support prim::Param for fallback inputs (ec2bbf2)
  • support prim::Param for input type after refactor (3cebe97)
  • support Python APIs for Automatic Fallback (100b090)
  • support the case when the injected node is not supported in dependency analysis (c67d8f6)
  • support truncate long/double to int/float with option (740eb54)
  • Try to submit review before exit (9a9d7f0)
  • update truncate long/double python api (69e49e8)
  • //docker: Adding Docker 21.03 (9b326e8)
  • update truncate long/double warning message (60dba12)
  • //docker: Update CI container (df63467)
  • //py: Allowing people using the PyTorch backend to use TRTorch/TRT (6c3e0ad)
  • //py: Catch when bazel is not in path and error out when running (1da999d)
  • //py: Gate partial compilation from to_backend API (bf1b2d8)
  • //py: New API to embed engine in new module (88d07a9)
  • aten::floor: Adds floor.int evaluator (a6a46e5)

BREAKING CHANGES

  • PyTorch version has been bumped to 1.8.0
    Default CUDA version is CUDA 11.1
    TensorRT version is TensorRT 7.2.3.4
    cuDNN version is now cuDNN 8.1

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

  • Due to issues with compatability between PyTorch 1.8.0
    and 1.8.1 in the Torch Python API, TRTorch 0.3.0 compiled for 1.8.0 does not
    work with PyTorch 1.8.1 and will show an error about use_input_stats.
    If you see this error make sure the version of libtorch you are
    compiling with is PyTorch 1.8.1

TRTorch 0.3.0 will target PyTorch 1.8.1. There is no backwards
compatability with 1.8.0. If you need this specific version compile from
source with the dependencies in WORKSPACE changed

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

Supported Operators in TRTorch v0.3.0

Operators Currently Supported Through Converters

  • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
  • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
  • aten::abs(Tensor self) -> (Tensor)
  • aten::acos(Tensor self) -> (Tensor)
  • aten::acosh(Tensor self) -> (Tensor)
  • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
  • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::asin(Tensor self) -> (Tensor)
  • aten::asinh(Tensor self) -> (Tensor)
  • aten::atan(Tensor self) -> (Tensor)
  • aten::atanh(Tensor self) -> (Tensor)
  • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
  • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::ceil(Tensor self) -> (Tensor)
  • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
  • aten::cos(Tensor self) -> (Tensor)
  • aten::cosh(Tensor self) -> (Tensor)
  • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
  • ate...
Read more

TRTorch v0.2.0

26 Feb 03:20
Compare
Choose a tag to compare

TRTorch v0.2.0

Support for PyTorch 1.7.x, Multi Device APIs, Runtime Library, New Converters, Bug Fixes

This is the second beta release of TRTorch, targeting PyTorch 1.7.x, CUDA 11.0 (on x86_64), TensorRT 7.2 and cuDNN 8. TRTorch 0.2.0 for aarch64 targets JetPack 4.5.x. It updates the to_backend integration for PyTorch to reflect changes in the PyTorch API. A new API has been added to disable the newly introduced TF32 data format used on Ampere as TF32 is now the default FP32 format used in TRTorch. APIs have been solidified for runtime configuration of the active CUDA device to let users choose what device a program is deserialized on. This API will continue to change as we further define the serialization format and work with the PyTorch team to make runtime device configuration more ergonomic. You can follow this work here: #311. This PR also formalizes DLA support in TRTorch, adding APIs and capabilities to target DLA on Jetson and DRIVE platforms. v0.2.0 also includes a new shared library libtrtorchrt.so. This library only contains the runtime components of TRTorch and is suitable for use in situations where device footprint is extremely limited. libtrtorch.so can be linked to C++ applications and loaded into Python scripts and will load all necessary trtorch runtime components into the torch runtime allowing users to run TRTorch applications without the full compiler. v0.2.0 also adds support for Python 3.9.

Dependencies:

- Bazel 4.0.0
- Libtorch 1.7.1 (on x86_64), 1.7.0 (on aarch64)
- CUDA 11.0 (by default, newer CUDA 11 supported with compatible PyTorch build)
- cuDNN 8.0.5
- TensorRT 7.2.2

v0.2.0 (2021-02-25)

  • refactor!: Update bazel and trt versions (0618b6b)

Bug Fixes

  • //core/conversion/conversionctx: Fix memory leak in conversion (6f83b41)
  • //core/lowering: fix debug message for bn dim check removal pass (86bb5b7)
  • //py: Fix bounds for enum macros (6b942e5)
  • aten::expand: Fix compiler warning for unused out ITensor (5b0f584)
  • aten::expand: Fix compiler warnings in the expand converter (51b09d4)
  • aten::flatten: Fixing flatten converter to handle dynamic batch (00f2d78)
  • aten::max_pool2d: Supressing error due to not filling in stride in (ed3c185)
  • aten::zeros: verify zeros produces a tensor correctly (00d2d0c)
  • remove_to: bug in remove_to.cpp, replace outputs()[0] with inputs()[0] (6c5118a)
  • setup.py: Broaden the supported pytorch versions to handle jetson (e94a040)
  • test_op_aliasing: Fix the renamed op (91c3c80)
  • tests: Fix broken elementwise tests (22ed944)

Features

  • support true_divide, floor_divide, max, min, rsub (a35fbf1)
  • //.github: Moving to python directly (ece114c)
  • //core/conversion: Adding a check to detect programs that will (a3d4144)
  • //core/lowering: Adding a new pass to handle new dim checks for (3d14cda)
  • //cpp/api/lib: New runtime only library (6644a9e)
  • //notebooks: Update notebooks container for 0.1.0 (a5851ff)
  • //py: [to_backend] adding device specification support for (6eeba1c), closes #286
  • aten::leaky_relu_: Adding alias for inplace leaky relu (bc53411)
  • aten::softmax: Adding support for any neg index (abc29a2)
  • aten::squeeze|aten::unsqueeze: adding BUILD files for new squeeze (9e0a1d7)
  • aten::sum: Allow for negative indices less than -1 (769bbc9)
  • aten::topk: Add a debug message noting that sorted is always true (81f1e9d)
  • aten::topk: Adding BUILD files for topk op (22e6a6b)
  • disable_tf32: Add a new API to disable TF32 (536983b)
  • interpolate: Adding support for .vec variants and overhauling test (0cda1cc)
  • interpolate: Addressing the linear, scale factor, align corners edge case (92e3818)
  • supportedops: Application to dump a list of supported operators (872d9a3)

BREAKING CHANGES

  • Version of bazel has been bumped to 4.0.0
    Version of TensorRT has been bumped to 7.2.2.3

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

  • The device API has now changed. Device settings are configured via a device struct which encapsulates information on selected device ids and types.

Supported Operators in TRTorch v0.2.0

Operators Currently Supported Through Converters

  • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
  • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
  • aten::abs(Tensor self) -> (Tensor)
  • aten::acos(Tensor self) -> (Tensor)
  • aten::acosh(Tensor self) -> (Tensor)
  • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
  • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::asin(Tensor self) -> (Tensor)
  • aten::asinh(Tensor self) -> (Tensor)
  • aten::atan(Tensor self) -> (Tensor)
  • aten::atanh(Tensor self) -> (Tensor)
  • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
  • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::ceil(Tensor self) -> (Tensor)
  • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
  • aten::cos(Tensor self) -> (Tensor)
  • aten::cosh(Tensor self) -> (Tensor)
  • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
  • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
  • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
  • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::erf(Tensor self) -> (Tensor)
  • aten::exp(Tensor self) -> (Tensor)
  • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
  • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
  • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
  • aten::floor(Tensor self) -> (Tensor)
  • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
  • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
  • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
  • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
  • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
  • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
  • aten::log(Tensor self) -> (Tensor)
  • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=Non...
Read more

TRTorch v0.1.0

23 Oct 23:29
Compare
Choose a tag to compare

TRTorch v0.1.0

Direct PyTorch integration via backend API, support for Ampere, support for simple branch and loop cases

This is the first "beta" release of TRTorch, introducing direct integration into PyTorch via the new Backend API. This release also contains an NGC based Dockerfile for users looking to use TRTorch on Ampere, using NGC's patched version of PyTorch. Note that compiled programs from older versions of TRTorch are not compatible with the TRTorch 0.1.0 runtime due to an ABI change. There are now example Jupyter notebooks which demonstrate various features of the compiler included in the documentation.

New Ops:

  • prelu
  • lstm_cell
  • power
  • conv3d
  • narrow

Dependencies:

  • Bazel 3.4.1
  • Libtorch 1.6.0
  • CUDA 10.2 (by default, CUDA 11 supported with compatible PyTorch build)
  • cuDNN 7.6.5 (by default, cuDNN 8 supported with compatible PyTorch build)
  • TensorRT 7.0.0 (by default, TensorRT 7.1 supported with compatible PyTorch build)

Changelog

v0.1.0 (2020-10-23)

Bug Fixes

  • added some fixes, trt/jit output still mismatches (723ac1d)

  • added test cases to explicitly check hidden/cell state outputs (d7c3164)

  • cleaned up logic, added case where bias doesn't exist for LSTM cell converter (a3e1093)

  • //core/conversion/evaluator: Custom to IValue that handles int[] (68c934a)

  • //docker: Workaround only shared libraries being available in (50c7eda)

  • //py: Fix long description section of setup.py (efd2099)

  • //tests: Add stride to complete tensors (af5d28e)

  • //tests/accuracy: Fix int8 accuracy test for new PTQ api (a53bea7)

  • //tests/core/converters/activations: Complete tensors in prelu test (0e90f78)

  • docsrc: Update docsrc container for bazel 3.4.1 (4eb53b5)

  • fix(Windows)!: Fix dependency resolution for local builds (858d8c3)

  • chore!: Update dependencies to PyTorch 1.6.0 (8eda27d)

  • chore!: Bumping version numbers to 0.1.0 (b84c90b)

  • refactor(//core)!: Introducing a binding convention that will address (5a105c6)

  • refactor!: Renaming extra info to compile spec to be more consistent (b8fa228)

Features

  • //core/conversion/converters: LSTMCell converter (8c61248)
  • //core/conversion/var: created ITensorOrFreeze() method, to replace functionality of Var::ITensor() (2ccf8d0)
  • //core/converters: Add power layer conversion support and minor README edits (a801506)
  • //core/lowering: Add functionalization pass to replace implace (90a9ed6), closes #30
  • //docker: Adding CUDA11 based container for Ampere support (970d775)
  • started working on lstm_cell converter (546d790)
  • //py: Initial compiliant implementation of the to_backend api for (59113cf)
  • //third_party/tensorrt: Add back TensorRT static lib in a cross (d3c2e7e)
  • aten::prelu: Basic prelu support (8bc4369)
  • aten::prelu: Implement the multi-channel version of prelu and (c066581)
  • finished logic for LSTM cell, now to test (a88cfaf)

BREAKING CHANGES

  • Users on Windows trying to use cuDNN 8 must manually
    configure third_party/cudnn/local/BUILD to use cuDNN 8.

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

  • Support for Python 3.5 is being dropped with this
    update

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

  • Version is being bumped to version 0.1.0a0 to target
    PyTorch 1.6.0

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

  • This changes the "ABI" of compiled TRTorch programs and
    the runtime and breaks backwards compatability between the runtime in
    0.1.0+ and programs compiled pre-0.1.0

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

  • This changes the top level api for setting the
    specification for compilation, a simple find and replace should allow
    users to port forward

Signed-off-by: Naren Dasan naren@narendasan.com
Signed-off-by: Naren Dasan narens@nvidia.com

TRTorch v0.0.3

18 Jul 05:42
Compare
Choose a tag to compare
TRTorch v0.0.3 Pre-release
Pre-release

TRTorch v0.0.3

aarch64 toolchain, Revised PTQ API, PyTorch 1.5.1, support for cuDNN 8.0, TensorRT 7.1 (with compatible PyTorch build)

This is the thrid alpha release of TRTorch. It bumps the target PyTorch version to 1.5.1 and introduces support for cuDNN 8.0 and TensorRT 7.1, however this is only supported in cases where PyTorch has been compiled with the same cuDNN version. This release also introduces formal support for aarch64, however pre-compiled binaries will not be available until we can deliver python packages for aarch64 for all supported version of python. Note some idiosyncrasies when it comes to working with PyTorch on aarch64, if you are using PyTorch compiled by NVIDIA for aarch64 the ABI version is CXX11 instead of the pre CXX11 ABI found on PyTorch on x86_64. When compiling the Python API for TRTorch add the --use-cxx11-abi flag to the command and do not use the --config=pre-cxx11-abi flag when building the C++ library (more instructions on native aarch64 compilation in the documentation). This release also introduces a breaking change to the C++ API where now in order to use logging or ptq APIs a separate header file must be included. Look at the implementation of trtorchc or ptq for example usage.

Dependencies:

  • Bazel 3.3.1
  • Libtorch 1.5.1
  • CUDA 10.2
  • cuDNN 7.6.5 (by default, cuDNN 8 supported with compatable PyTorch build)
  • TensorRT 7.0.0 (by default, TensorRT 7.1 supported with compatable PyTorch build)

Changelog

  • feat!: Lock bazel version (25f4371)
  • refactor(//cpp/api)!: Refactoring ptq to use includes but seperate from (d2f8a59)

Bug Fixes

  • //core: Do not compile hidden methods (6bd1a3f)
  • //core/conversion: Check for calibrator before setting int8 mode (3afd209)
  • //core/conversion: Supress unnecessary debug messages (2b23874)
  • //core/conversion/conversionctx: Check both tensor and eval maps (2d65ece)
  • //core/conversion/conversionctx: In the case of strict types and (3611778)
  • //core/conversion/converters: Fix plugin implementation for TRT 7 (94d6a0f)
  • //core/conversion/converters/impl: 1d case not working (f42562b)
  • //core/conversion/converters/impl: code works for interpolate2d/3d, doesn't work for 1d yet (e4cb117)
  • //core/conversion/converters/impl: Fix interpolate.cpp (b6942a2)
  • //core/conversion/converters/impl/element_wise: Fix broadcast (a9f33e4)
  • //core/conversion/evaluators: A couple fixes for evaluators (07ba980)
  • //core/lowering: Conv2D -> _convolution pass was triggering conv (ca2b5f9)
  • //cpp: Remove deprecated script namespace (d70760f)
  • //cpp/api: Better inital condition for the dataloader iterator to (8d22bdd)
  • //cpp/api: Remove unecessary destructor in ptq class (fc70267)
  • //cpp/api: set a default for calibrator (825be69)
  • //cpp/benchmark: reorder benchmark so FP16 bn issue in JIT doesnt (98527d2)
  • //cpp/ptq: Default version of the app should not resize images (de3cbc4)
  • //cpp/ptq: Enable FP16 kernels for INT8 applications (26709cc)
  • //cpp/ptq: Enable FP16 kernels for INT8 applications (e1c5416)
  • //cpp/ptq: remove some logging from ptq app (b989c7f)
  • //cpp/ptq: Tracing model in eval mode wrecks accuracy in Libtorch (54a24b3)
  • //cpp/trtorchc: Refactor trtorchc to use new C++ API (789e1be), closes #132
  • //cpp/trtorchc: Support building trtorchc with the pre_cxx11_abi (172d4d5)
  • //docs: add nojekyll file (2a02cd5)
  • //docs: fix version links (11555f7)
  • //notebooks: Fix WORKSPACE template file to reflect new build system layout (c8ea9b7)
  • //py: Build system issues (c1de126)
  • //py: Ignore generated version file (9e37dc1)
  • //py: Lib path incorrect (ff2b13c)
  • //tests: Duplicated tensorrt dep (5cd697e)
  • //third_party/tensorrt: Fix include dir for library headers (22ed5cf)
  • //third_party/tensorrt: Fix TensorRT paths for local x86 builds (73d804b)
  • aarch64: fixes and issues for aarch64 toolchain (9a6cccd)
  • aten::_convolution: out channels was passed in incorrectly for (ee727f8)
  • aten::_convolution: Pass dummy bias when there is no bias (b20671c)
  • aten::batch_norm: A new batch norm implementation that hopefully (6461872)
  • aten::batchnorm|aten::view: Fix converter implementation for (bf651dd)
  • aten::contiguous: Blacklist aten::contiguous from conversion (b718121)
  • aten::flatten: Fixes dynamic shape for flatten (4eb20bb)
  • fixed FP16 bug, fixed README, addressed some other PR comments (d9c0e84)
  • aten::neg: Fix a index bug in neg (1b2cde4)
  • aten::size, other aten evaluators: Removes aten::size converter in (c83447e)
  • BUILD: modified BUILD (a0d8586)
  • trying to resolve interpolate plugin problems (f0fefaa)
  • core/conversion/converters/impl: fix error message in interpolate (5ddab8b)
  • Address issues in PR (cd24f26)
  • bypass jeykll, also add PR template (a41c400)
  • first commit (4f1a9df)
  • Fix pre CXX11 ABI python builds and regen docs (42013ab)
  • fixed interpolate_plugin to handle dynamically sized inputs for adaptive_pool2d (7794c78)
  • need to fix gather converter (024a6b2)
  • plugin: trying to fix bug in plugin (cafcced)
  • pooling: fix the tests and the 1D pooling cases (a90e6db)
  • RunGraphEngineDynamic fixed to work with dynamically sized input tensors (6308190)

Features

  • //:libtrtorch: Ship trtorchc with the tarball (d647447)
  • //core/compiler: Multiple outputs supported now via tuple (f9af574)
  • //core/conversion: Adds the ability to evaluate loops (dcb1474)
  • //core/conversion: Compiler can now create graphs (9d1946e)
  • //core/conversion: Evaluation of static conditionals works now (6421f3d)
  • //core/conversion/conversionctx: Make op precision available at (78a1c61)
  • //core/conversion/converters: Throw a warning if a converter is (6cce381)
  • //core/conversion/converters/impl: added support for aten::stack (415378e)
  • //core/conversion/converters/impl: added support for linear1d and bilinear2d ops (4416d1f)
  • //core/conversion/converters/impl: added support for trilinear3d op (bb46e70)
  • //core/conversion/converters/impl: all function schemas for upsample_nearest (1b50484)
  • //core/conversion/converters/impl: logic implemented ([7f12160](https://github.com/...
Read more

TRTorch v0.0.2

17 May 02:00
3f57189
Compare
Choose a tag to compare
TRTorch v0.0.2 Pre-release
Pre-release

TRTorch v0.0.2

Python API & PyTorch 1.5.0 Support

  • This is a second alpha release of TRTorch. It bumps support for PyTorch to 1.5.0 and introduces a Python distribution for TRTorch.
  • Also now includes full documentation https://nvidia.github.io/TRTorch
  • Adds support for Post Training Quantization in C++

Dependencies

  • Libtorch 1.5.0
  • CUDA 10.2
  • cuDNN 7.6.5
  • TensorRT 7.0.0

Changelog

Bug Fixes

  • //core/conversion: Check for calibrator before setting int8 mode (3afd209)
  • //core/conversion/conversionctx: Check both tensor and eval maps (2d65ece)
  • //core/conversion/converters/impl/element_wise: Fix broadcast (a9f33e4)
  • //cpp: Remove deprecated script namespace (d70760f)
  • //cpp/api: Better inital condition for the dataloader iterator to (8d22bdd)
  • //cpp/api: Remove unecessary destructor in ptq class (fc70267)
  • //cpp/api: set a default for calibrator (825be69)
  • //cpp/ptq: remove some logging from ptq app (b989c7f)
  • Address issues in PR (cd24f26)
  • //cpp/ptq: Tracing model in eval mode wrecks accuracy in Libtorch (54a24b3)
  • //docs: add nojekyll file (2a02cd5)
  • //docs: fix version links (11555f7)
  • //py: Build system issues (c1de126)
  • //py: Ignore generated version file (9e37dc1)
  • bypass jeykll, also add PR template (a41c400)

Features

  • //core/conversion/conversionctx: Make op precision available at (78a1c61)
  • //core/conversion/converters/impl/shuffle: Implement aten::resize (353f2d2)
  • //core/execution: Type checking for the executor, now is the (2dd1ba3)
  • //core/lowering: New freeze model pass and new exception (4acc3fd)
  • //core/quantization: skeleton of INT8 PTQ calibrator (dd443a6)
  • //core/util: New logging level for Graph Dumping (90c44b9)
  • //cpp/api: Adding max batch size setting (1b25542)
  • //cpp/api: Functional Dataloader based PTQ (f022dfe)
  • //cpp/api: Remove the extra includes in the API header (2f86f84)
  • //cpp/ptq: Add a feature to the dataset to use less than the full (5f36f47)
  • //cpp/ptq/training: Training recipe for VGG16 Classifier on (676bf56)
  • //lowering: centralize lowering and try to use PyTorch Conv2DBN folding (fad4a10)
  • //py: API now produces valid engines that are consumable by (72bc1f7)
  • //py: Inital introduction of the Python API (7088245)
  • //py: Manylinux container and build system for multiple python (639c2a3)
  • //py: Working portable package (482ef2c)
  • //tests: New optional accuracy tests to check INT8 and FP16 (df74136)
  • //cpp/api: Working INT8 Calibrator, also resolves #41 (5c0d737)
  • aten::flatten: Adds a converter for aten flatten since MM is the (d945eb9)
  • aten::matmul|aten::addmm: Adds support for aten::matmul and (c5b6202)
  • Support non cxx11-abi builds for use in python api (83e0ed6)
  • aten::size [static]: Implement a aten::size converter for static input size (0548540)
  • conv2d_to_convolution: A pass to map aten::conv2d to _convolution (2c5c0d5)

Initial Release

08 Apr 01:41
Compare
Choose a tag to compare
Initial Release Pre-release
Pre-release

TRTorch v0.0.1

Initial Release

  • This is the initial alpha release of TRTorch. Supports basic compilation of TorchScript Modules, networks similar to ResNet50, Mobilenet, simple feed forward networks.
  • C++ Based API
    • Can save converted models to PLAN file for use in TensorRT Apps
    • Compile module and continue running with JIT interpreter accelerated by TensorRT
  • Supports FP32 and FP16 execution
  • Sample application to show how to use the compiler

Dependencies

  • Libtorch 1.4.0
  • CUDA 10.1
  • cuDNN 7.6
  • TensorRT 6.0.1