-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833
Comments
@narendasan @ncomly-nvidia @blchu Thanks for working on this! Do we know what the remaining challenges are for compiling the full encoder module? |
There is a limitation in TensorRT that is causing the engine building to fail even if we add in workarounds. We are trying to figure out the root cause for this to determine if we can work around at the Torch-TRT level or if we need some improvement to TensorRT |
Thanks for digging into this issue! It looks like a Fairseq transformer that uses a similar encoder runs successfully on TensorRT, so all of the operations should be supported (blog post). Has the compile error been traced to a specific operator? We can also look at temporary workarounds at the Sockeye level. |
@mjdenkowski We recently updated our TensorRT version which addresses our previous issues during engine building. Currently the status is:
|
That sounds like great progress! We use dynamic shapes throughout our model, so padding inputs isn't a good fit for our use case. Is there a timeline for supporting the other two operators? Sockeye implements a standard transformer encoder, so supporting these ops should be broadly helpful for compiling various transformer/BERT/attention implementations. |
Thanks again @narendasan and @blchu! Is there an active issue for supporting the remaining two operators or do we need to open a new issue? |
We have it tracked internally and using this issue as a reference. I don't think we need an additional issue for the specific operators |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
@blchu any updates on converter progress? |
Bug Description
When compiling the encoder transformer model for Sockeye inference, Torch-TensorRT throws a runtime error.
To Reproduce
Steps to reproduce the behavior:
docker run --gpus all --rm -it nvcr.io/nvidia/pytorch:21.11-py3
Stack trace and logs:
Expected behavior
The model should compile successfully without error and translate sentences
Environment
conda
,pip
,libtorch
, source): NGC ContainerThe text was updated successfully, but these errors were encountered: