-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support onnx #12
Comments
Tried it just now. Was able to export to ONNX using |
@baudm I can not convert to onnx, the main problem comes from
can you share code ex for loading checkpoint to Model architecture?? |
import torch
parseq = torch.hub.load('baudm/parseq', 'parseq', pretrained=True).eval()
dummy_input = torch.rand(1, 3, *parseq.hparams.img_size) # (1, 3, 32, 128) by default
# To ONNX
parseq.to_onnx('parseq.onnx', dummy_input, opset_version=14) # opset v14 or newer is required
# To TorchScript
parseq.to_torchscript('parseq-ts.pt') |
@baudm model converted successfully to onnx, but can not load onnx model. I am asking the expert pytorch to resolve. If done I will give the final onnx |
@baudm after some days, I had try to fix onnx, but can not. I very happy if you can give some line code example for infer model(torchscript), which you converted.
I get error:
|
@baudm I have a similar issue with loading the converted ONNX model. I am able to successfully convert the model to ONNX, but when I try to load and check if the model is well-formed I get the error. import torch
import onnx
# Load PyTorch model
parseq = torch.hub.load('baudm/parseq', 'parseq', pretrained=True).eval()
dummy_input = torch.rand(1, 3, *parseq.hparams.img_size)
# Convert to ONNX
parseq.to_onnx('pairseq.onnx', dummy_input, opset_version=14)
# Load the ONNX model
onnx_model = onnx.load('pairseq.onnx')
# Check ONNX model
onnx.checker.check_model(onnx_model, full_check=True)
---------------------------------------------------------------------------
InferenceError Traceback (most recent call last)
Input In [1], in <cell line: 15>()
12 onnx_model = onnx.load('pairseq.onnx')
14 # Check ONNX model
---> 15 onnx.checker.check_model(onnx_model, full_check=True)
File /opt/venv/lib/python3.8/site-packages/onnx/checker.py:108, in check_model(model, full_check)
106 C.check_model(protobuf_string)
107 if full_check:
--> 108 onnx.shape_inference.infer_shapes(model, check_type=True, strict_mode=True)
File /opt/venv/lib/python3.8/site-packages/onnx/shape_inference.py:34, in infer_shapes(model, check_type, strict_mode, data_prop)
32 if isinstance(model, (ModelProto, bytes)):
33 model_str = model if isinstance(model, bytes) else model.SerializeToString()
---> 34 inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode, data_prop)
35 return onnx.load_from_string(inferred_model_str)
36 elif isinstance(model, str):
InferenceError: [ShapeInferenceError] (op_type:CumSum, node name: CumSum_2527): x typestr: T, has unsupported type: tensor(bool) |
Waiting for onnx and tensorrt conversion |
export onnx successful tgt_padding_mask = (((tgt_in == self.eos_id)*2).cumsum(-1) > 0) # mask tokens beyond the first EOS token. |
@mcmingchang can you elaborate more? I can convert the model to onnx but I can't use it with onnxruntime. when building the onnx model I get the following message:
did someone already success build and run the onnx model ? can someone share it? |
I am having similar issues with the ONNX, any leads on it? |
@ashishpapanai maybe waiting expert to solve. |
parseq/strhub/models/parseq/system.py Line 147 in 8fa5100
This is the offending code fragment. You can comment this out (disabling the iterative refinement branch of the code) before exporting the ONNX model. I tried it and |
Thank you for answering @baudm |
Thank you @baudm I tried below code with
|
@allenwu5 oh yeah, this is even better. Setting |
UPDATE: As of commit ed3d847,
import torch
parseq = torch.hub.load('baudm/parseq', 'parseq', pretrained=True).eval()
dummy_input = torch.rand(1, 3, *parseq.hparams.img_size) # (1, 3, 32, 128) by default
# To ONNX
parseq.to_onnx('parseq.onnx', dummy_input, opset_version=14) # opset v14 or newer is required |
@baudm The output size was changed with different conversions |
I am facing a similar issue with the output shape. |
|
@baudm thank you! It's work I can also convert the onnx-model to tensorrt format and archive the same result. The benchmark of inference time between torch, onnx-runtime and trt-model (3x32x128, bs=1, average 100 samples)
The trt-fp32-model is 4-times faster than the torch model. The trt model was served by triton-inference-server |
@baudm Thanks for the advice, the export to onnx worked now. @huyhoang17 I'm also running the model on a triton server and I'm able to make the inference request which returns me a result that I convert back with the triton client as_numpy function, this gives me an array of [1, 7, 95]. Do you have any advice on how to extract the label and confidence scores from this array? |
@huyhoang17 How did you make the output dimensions equal to [1, max_label_length, 95]? |
@ashishpapanai here is the example code, you should use both 2 params: decode_ar=False & refine_iters=0 Lib version
from strhub.models.utils import load_from_checkpoint
# To ONNX
device = "cuda"
ckpt_path = "..."
onnx_path = "..."
img = ...
parseq = load_from_checkpoint(ckpt_path)
parseq.refine_iters = 0
parseq.decode_ar = False
parseq = parseq.to(device).eval()
parseq.to_onnx(onnx_path, img, do_constant_folding=True, opset_version=14) # opset v14 or newer is required
# check
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model, full_check=True) ==> pass |
@huyhoang17 I would love to see your code for ONNX inference, I am very interested in and impressed by your speedtesting! |
I am getting OpenVINO IR model compile time with AR decoding enabled as 81 Minutes, which is way too large. Is there anything in the community knowledge which I can do to optimize the model? |
|
Is it possible to train and convert a model with dynamic shapes? (image of shape -1 c -1 -1)? |
Hello.
I am using Has anyone ran into this issue or does anyone know how to solve it? |
I have converted pytorch model to onnx -> then to trt engine file. I did the inference on onnx it works fine. I used the same pre and post processing methods for onnx & tensorrt script. version : TensorRT-8.5.3.1_cuda11 |
Hi @huyhoang17 I am having some issue with onnx -> trt trtexec --onnx=poc_sim.onnx --saveEngine=poc.engine --workspace=4096 --verbose converts fine but when I load in triton, I get this error |
U should use 8.5.3.1. |
Now, my issue is with triton. E0420 08:10:14.327730 1 logging.cc:43] 1: [convBaseRunner.cpp::execute::271] Error Code 1: Cask (Cask convolution execution) triton : nvcr.io/nvidia/tritonserver:23.03-py3 |
I went through lots of try and error due to having different training server and inferencing server.
Now you have tensorrt model that will work on the machine that runs triton server. |
@kino0924 I have different training and inferencing servers, thanks for the shoutout on doing the tensorRT conversion in the triton container, that is a good idea. Have you noticed that TensorRT inference is much faster than ONNX inference? I haven't benchmarked, I thought they would be close |
@jturner116 I did not get dramatic improvement as @huyhoang17 but definitely it was worth it. |
I've converted model to Tensorrt(8.5.3.3.1) successfully, I get only inference I am happy if you can share: "what is the difference between trition server and tensorrt8.5.3.1 docker?". I don not use trition server, i only use docker installed tensorrt8.5.3.1. |
Triton server is designed for Inferencing. |
how you handle the performance degradation on onnx or TensorRT @jturner116 @phamkhactu ? because I have also encounter this issue and no one talk about it. |
@RickyGunawan09 I found a kind of hacky solution mentioned here #66 . If I give an example input with my max character length (25 I think) in the onnx export instead of a random tensor, I don't notice the performance degradation. If I were smarter I might be able to figure out why that works, but maybe it makes some sense with EOS token |
I've just tested again, performance degradation on onnx will appear if bad image input. Have you ever tested with bad image ?? @jturner116 |
@phamkhactu right, often people use random tensors in ONNX exports, but I used this image EDIT: Sorry, just now realized you probably meant bad image input to the ONNX model. I will test this again too, thanks for the heads up |
@phamkhactu Tested random tensors to ONNX and to original model and outputs are fine with np.testing.assert_allclose(to_numpy(torch_pred), to_numpy(onnx_pred), rtol=1e-03, atol=1e-03) Very acceptable for my case |
@baudm @phamkhactu hi, I have converted my parseq model successfully with the 'decoder_ar=false, refine_iters=2'. The converted onnx model can't get stable predictions. Sometimes, redundant repeated characters will be generated. For example: |
@Gavinic you can check again if you use |
I convert model to onnx successfully but trt engine result is very bad. full description is in this link : NVIDIA/TensorRT#3136 |
Anyone had problem with converting onnx to TensorRT 8.6.1? @keivanmoazami |
@kino0924 use onnx modifier and add cast layers like image explained. |
I have already converted the pre trained model parameters parseq_tiny to onnx, but I won't be able to extract the post-processing. Is there any Python code for onnx inference @baudm |
@z3lz you can use this script for postprocessing after converting into ONNX logits = torch.from_numpy(ort_outs[0]) token_decoder = TokenDecoder()
|
Does the model can be converted to onnx model?
The text was updated successfully, but these errors were encountered: