Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

Closed
blchu opened this issue Jan 29, 2022 · 9 comments · Fixed by #839 or #1313
Closed

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

blchu opened this issue Jan 29, 2022 · 9 comments · Fixed by #839 or #1313
Assignees
Labels
bug Something isn't working

Comments

@blchu
Copy link
Contributor

blchu commented Jan 29, 2022

Bug Description

When compiling the encoder transformer model for Sockeye inference, Torch-TensorRT throws a runtime error.

To Reproduce

Steps to reproduce the behavior:

  1. Start a docker container docker run --gpus all --rm -it nvcr.io/nvidia/pytorch:21.11-py3
  2. Run the following to download+preprocess data and train a basic model:
git clone https://github.com/blchu/sockeye.git -b tensorrt_blchu
tail -n 4 sockeye/requirements/requirements.txt > requirements.txt.tmp \
    && mv requirements.txt.tmp sockeye/requirements/requirements.txt
pip install -e ./sockeye
git clone https://github.com/rsennrich/subword-nmt.git
export PYTHONPATH=$(pwd)/subword-nmt:$PYTHONPATH

wget http://data.statmt.org/wmt17/translation-task/preprocessed/de-en/corpus.tc.de.gz
wget http://data.statmt.org/wmt17/translation-task/preprocessed/de-en/corpus.tc.en.gz
gunzip corpus.tc.de.gz
gunzip corpus.tc.en.gz
curl https://data.statmt.org/wmt17/translation-task/preprocessed/de-en/dev.tgz | tar xvzf -

head -n 32768 corpus.tc.de > corpus.tc.de.tmp && mv corpus.tc.de.tmp corpus.tc.de
head -n 32768 corpus.tc.en > corpus.tc.en.tmp && mv corpus.tc.en.tmp corpus.tc.en

python -m learn_joint_bpe_and_vocab --input corpus.tc.de corpus.tc.en \
                                    -s 3000 \
                                    -o bpe.codes \
                                    --write-vocabulary bpe.vocab.de bpe.vocab.en

python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 < corpus.tc.de > corpus.tc.BPE.de
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en --vocabulary-threshold 50 < corpus.tc.en > corpus.tc.BPE.en

python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 < newstest2016.tc.de > newstest2016.tc.BPE.de
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en --vocabulary-threshold 50 < newstest2016.tc.en > newstest2016.tc.BPE.en

python -m sockeye.prepare_data_pt \
                        -s corpus.tc.BPE.de \
                        -t corpus.tc.BPE.en \
                        -o train_data \
                        --shared-vocab

torchrun --no_python --nproc_per_node 1 sockeye-train \
         --prepared-data train_data \
         --validation-source newstest2016.tc.BPE.de \
         --validation-target newstest2016.tc.BPE.en \
         --output model \
         --batch-size 2048 \
         --update-interval 1 \
         --checkpoint-interval 1 \
         --max-updates 1 \
         --decoder ssru_transformer \
         --shared-vocab \
         --seed 1 \
         --quiet-secondary-workers
  1. Run the translate command to attempt to compile with Torch-TensorRT, here is where the error should occur:
sockeye-translate \
    --input newstest2016.tc.BPE.de \
    --output out \
    --model model \
    --dtype float16 \
    --beam-size 5 \
    --batch-size 64 \
    --output-type benchmark

Stack trace and logs:

WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected invalid timing cache, setup a local cache instead
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieved with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.

WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
[ERROR:root] Uncaught exception
Traceback (most recent call last):
  File "/opt/conda/bin/sockeye-translate", line 33, in <module>
    sys.exit(load_entry_point('sockeye', 'console_scripts', 'sockeye-translate')())
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 43, in main
    run_translate(args)
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 147, in run_translate
    read_and_translate(translator=translator,
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 234, in read_and_translate
    chunk_time = translate(output_handler, chunk, translator)
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 257, in translate
    trans_outputs = translator.translate(trans_inputs)
  File "/workspace/temp/sockeye/sockeye/inference_pt.py", line 807, in translate
    batch_translations = self._translate_np(*self._get_inference_input(translator_inputs))  # type: ignore
  File "/workspace/temp/sockeye/sockeye/inference_pt.py", line 995, in _translate_np
    return self._get_best_translations(self._search(source,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/temp/sockeye/sockeye/beam_search_pt.py", line 778, in forward
    model_states, estimated_reference_lengths = self._inference.encode_and_initialize(source, source_length)
  File "/workspace/temp/sockeye/sockeye/beam_search_pt.py", line 70, in encode_and_initialize
    states, predicted_output_length = self._model.encode_and_initialize(inputs, valid_length, self._const_lr)
  File "/workspace/temp/sockeye/sockeye/model_pt.py", line 234, in encode_and_initialize
    source_encoded, source_encoded_lengths = self.encode(inputs, valid_length=valid_length)
  File "/workspace/temp/sockeye/sockeye/model_pt.py", line 200, in encode
    self.traced_encoder = torch_tensorrt.compile(self.traced_encoder,
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 97, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: [Error thrown at ./core/conversion/var/Var_inl.h:38] Expected ivalue->isInt() to be true but got false
Requested unwrapping of arg IValue assuming it was l however type is NoneType

Expected behavior

The model should compile successfully without error and translate sentences

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.11.0a0+b6df043'
  • PyTorch Version (e.g. 1.0): 1.0.0a0
  • CPU Architecture: x86_64 (Intel Xeon Platinum 8259CL)
  • OS (e.g., Linux): Ubuntu 20.04
  • How you installed PyTorch (conda, pip, libtorch, source): NGC Container
  • Python version: 3.8.12
  • CUDA version: 11.5
  • GPU models and configuration: Tesla T4
@blchu blchu added the bug Something isn't working label Jan 29, 2022
@narendasan narendasan mentioned this issue Jan 31, 2022
6 tasks
@narendasan narendasan reopened this Feb 2, 2022
@mjdenkowski
Copy link

@narendasan @ncomly-nvidia @blchu Thanks for working on this!

Do we know what the remaining challenges are for compiling the full encoder module?

@narendasan
Copy link
Collaborator

There is a limitation in TensorRT that is causing the engine building to fail even if we add in workarounds. We are trying to figure out the root cause for this to determine if we can work around at the Torch-TRT level or if we need some improvement to TensorRT

@mjdenkowski
Copy link

Thanks for digging into this issue!

It looks like a Fairseq transformer that uses a similar encoder runs successfully on TensorRT, so all of the operations should be supported (blog post).

Has the compile error been traced to a specific operator? We can also look at temporary workarounds at the Sockeye level.

@narendasan narendasan mentioned this issue Mar 8, 2022
6 tasks
@narendasan
Copy link
Collaborator

narendasan commented Mar 8, 2022

@mjdenkowski We recently updated our TensorRT version which addresses our previous issues during engine building.

Currently the status is:

  • With aten::pow support #918, we can partially compile the module which is comprised of 4 blocks. Two blocks remain in PyTorch because we don't have converters for the following ops:
    - aten::bitwise_not(Tensor self) -> (Tensor)
    - aten::repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> (Tensor)
  • When sockeye goes to execute the built TorchTRT module, I see errors related to unexpected input shape occasionally (i.e. input tensor is smaller that the set input shape [64, 96, 512]).
    • Here we have a couple ways forward:
      1. If padding the input to a uniform size is acceptable then you should be good to go
      2. If you need support for dynamic shape, we need to address the two unsupported ops since we don't currently support using partial compilation and dynamic shape at the same time

@mjdenkowski
Copy link

That sounds like great progress!

We use dynamic shapes throughout our model, so padding inputs isn't a good fit for our use case.

Is there a timeline for supporting the other two operators? Sockeye implements a standard transformer encoder, so supporting these ops should be broadly helpful for compiling various transformer/BERT/attention implementations.

@mjdenkowski
Copy link

Thanks again @narendasan and @blchu!

Is there an active issue for supporting the remaining two operators or do we need to open a new issue?

@narendasan
Copy link
Collaborator

We have it tracked internally and using this issue as a reference. I don't think we need an additional issue for the specific operators

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@ncomly-nvidia
Copy link
Contributor

@blchu any updates on converter progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants