You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run torch_tensorrt.compile with BART model as input, using fp32 precision.
Choose two fixed-size inputs of shape [1, 128] and [1, 128] and enable truncate_long_and_double with 12 GB workspace.
Pass in model keyword args to disable attention and hidden state outputs
Run inference using the compiled model on two sample inputs.
Expected behavior
Model should successfully perform inference with Torch-TRT. Specifically, internal shape issues should either be caught at compile time, or should otherwise not cause errors.
Environment
Torch-TensorRT Version: 1.4.0.dev0+81f2dabb
PyTorch Version: 1.14.0.dev20221114+cu116
CPU Architecture: Intel Xeon CPU
OS: Ubuntu 20.04
How you installed PyTorch: pip
Build command you used: python setup.py develop
Are you using local sources or building from archives: local
Python version: 3.8.13
CUDA version: 11.6
Additional context
The problem currently seems to be related to Torch-TensorRT flattening input tensors in a way which is inconsistent with the analogous PyTorch behavior. Two potential operations which could be relevant are aten::mul and aten::add which are used often in the BART code as replacements for the linear layer, inserted in the LinearToAddMM lowering pass:
This solution works as it happens to exclude the problematic code, which could potentially be related to the aten::mul operator itself.
Related Issues
Potentially related to Issue #1455, as a similar error appears under certain compilation configurations for that model as well.
Additional Note
The bug appears to be nondeterministic, as, after recompiling and running inference using the model many times, inference ultimately completes successfully.
The text was updated successfully, but these errors were encountered:
It appears that the reshaped tensor _75 has a different final dimension from the other tensor _21, causing the add to fail. The operator causing the shape mismatch is likely within/arising from Torch-TensorRT or TensorRT, as the mismatch does not occur during the Torch-only dry-run in partitioning (the error is only thrown at inference time).
Bug Description
When performing inference with a Torch-TRT converted BART network (https://huggingface.co/facebook/bart-base), the following error is encountered:
Note that compilation of the model succeeds.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Model should successfully perform inference with Torch-TRT. Specifically, internal shape issues should either be caught at compile time, or should otherwise not cause errors.
Environment
python setup.py develop
Additional context
The problem currently seems to be related to Torch-TensorRT flattening input tensors in a way which is inconsistent with the analogous PyTorch behavior. Two potential operations which could be relevant are
aten::mul
andaten::add
which are used often in the BART code as replacements for the linear layer, inserted in theLinearToAddMM
lowering pass:TensorRT/core/lowering/passes/linear_to_addmm.cpp
Lines 47 to 61 in aa93a12
Temporary Solution
A temporary fix to this problem is to add the following to the compilation arguments in torch_tensorrt.compile:
This solution works as it happens to exclude the problematic code, which could potentially be related to the
aten::mul
operator itself.Related Issues
Potentially related to Issue #1455, as a similar error appears under certain compilation configurations for that model as well.
Additional Note
The bug appears to be nondeterministic, as, after recompiling and running inference using the model many times, inference ultimately completes successfully.
The text was updated successfully, but these errors were encountered: