Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems when transforming dynamic input models and quantifying static models #729

Open
Shirifo opened this issue Jan 16, 2025 · 7 comments
Labels
Bug bug Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape OP:Add OP:Add OP:Concat OP:Concat TODO TODO Undefined dimension Undefined dimension

Comments

@Shirifo
Copy link

Shirifo commented Jan 16, 2025

Issue Type

Others

OS

Linux

onnx2tf version number

ooonx2tf -1.26.3

onnx version number

1.16.1

onnxruntime version number

1.18.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.18.0

Download URL for ONNX

https://drive.google.com/drive/folders/1BWeNDI2PMmORZqT-ZPkkrmozNMgGxWX5?usp=drive_link

Parameter Replacement JSON

{
  "format_version": 1,
  "operations": [
    {
      "op_name": "Add_216",
      "param_target": "inputs",
      "param_name": "onnx____Add_428",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_216",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_429",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_239",
      "param_target": "inputs",
      "param_name": "onnx____Add_451",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_239",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_452",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_383",
      "param_target": "inputs",
      "param_name": "onnx____Add_607",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_383",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_608",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_406",
      "param_target": "inputs",
      "param_name": "onnx____Add_630",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_406",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_631",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_550",
      "param_target": "inputs",
      "param_name": "onnx____Add_786",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_550",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_787",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_573",
      "param_target": "inputs",
      "param_name": "onnx____Add_809",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_573",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_810",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_717",
      "param_target": "inputs",
      "param_name": "onnx____Add_965",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_717",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_966",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_740",
      "param_target": "inputs",
      "param_name": "onnx____Add_988",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_740",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_989",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_877",
      "param_target": "inputs",
      "param_name": "onnx____Add_1137",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_877",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_1138",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_900",
      "param_target": "inputs",
      "param_name": "onnx____Add_1160",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_900",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_1161",
      "post_process_transpose_perm": [0,3,1,2]
    }
  ]
}

Description

Hi @PINTO0309 , thanks for all of your great work.
I am trying to convert an ONNX model with dynamic inputs to TFlite.
I have 3 problems.

1. The problem of Concat operator is difficult to solve.

I use the command:
onnx2tf -i dynamics_rife.onnx
And get

INFO: 42 / 480
INFO: onnx_op_type: ConvTranspose onnx_op_name: ConvTranspose_53
INFO:  input_name.1: onnx____ConvTranspose_204 shape: [1, 16, 'unk__5', 'unk__6'] dtype: float32
INFO:  input_name.2: encode.cnn3.weight shape: [16, 4, 4, 4] dtype: float32
INFO:  input_name.3: encode.cnn3.bias shape: [4] dtype: float32
INFO:  output_name.1: f0 shape: [1, 4, 'unk__7', 'unk__8'] dtype: float32
2025-01-14 10:36:06.649150228 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Concat node. Name:'Concat_76' Status Message: concat.cc:154 Preparn concat axis dimensions must match: Axis 2 has mismatched dimensions of 2 and 1
ERROR: The trace log is below.
Traceback (most recent call last):
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'Concat_76' Status Message: concat.cc:154 PrepareForComt axis dimensions must match: Axis 2 has mismatched dimensions of 2 and 1

Do you have a solution to this problem?

2. When I use static ONNX model, the concat can work but got another error:

INFO: 82 / 371
INFO: onnx_op_type: Add onnx_op_name: Add_219
INFO:  input_name.1: onnx::Cast_415 shape: [1, 2, 256, 512] dtype: float32
INFO:  input_name.2: onnx____Add_431 shape: [1, 2, 256, 512] dtype: float32
INFO:  output_name.1: onnx____Transpose_432 shape: [1, 2, 256, 512] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
Dimensions must be equal, but are 512 and 256 for '{{node tf.math.add_28/Add}} = AddV2[T=DT_FLOAT](Placeholder, tf.math.add_28/Add/y)' with input shapes: [1,512,2,256], [1,256,512,2].

Call arguments received by layer "tf.math.add_28" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 512, 2, 256), dtype=float32)
  • y=tf.Tensor(shape=(1, 256, 512, 2), dtype=float32)
  • name='Add_219'

I refer to this link to write a json file to make changes https://github.com/PINTO0309/onnx2tf/issues/103
I had a strange problem, I couldn't convert x(onnx::Cast_415) to (1, 256, 512, 2), so I chose to convert y(onnx____Add_431) and output(onnx____Transpose_432) to get the correct result, as follows

    {
    "op_name": "Add_219",
    "param_target": "inputs",
    "param_name": "onnx____Add_431",
    "pre_process_transpose_perm": [0,2,3,1]
  },
  {
    "op_name": "Add_219",
    "param_target": "outputs",
    "param_name": "onnx____Transpose_432",
    "post_process_transpose_perm": [0,3,1,2]
  },

After modifying all the Add I was able to get TFlite output, but it was much larger than the original onnx model. This seems to be a gridsample problem

  1. After using int8 quantization, the obtained model output is completely wrong, and the order of the two outputs is reversed, do you know how to solve it. Below is the command
    onnx2tf -i static_rife_sim.onnx -prf replace.json -oiqt
@PINTO0309 PINTO0309 added TODO TODO Undefined dimension Undefined dimension Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape labels Jan 16, 2025
@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 16, 2025

This is not a solution. I only record what I have tried because I don't have time to work on it.

This model will probably terminate abnormally for all inferences except for those with an input resolution that is a multiple of 64.

  • 512x640 -> Success
    sit4onnx -if dynamics_rife_sim.onnx -fs 1 6 512 640 -fs 1 -oep cuda
    
    INFO: file: dynamics_rife_sim.onnx
    INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
    INFO: input_name.1: x shape: [1, 6, 512, 640] dtype: float32
    INFO: input_name.2: timestep shape: [1] dtype: float32
    INFO: test_loop_count: 10
    INFO: total elapsed time:  144.0567970275879 ms
    INFO: avg elapsed time per pred:  14.405679702758789 ms
    INFO: output_name.1: flow shape: [1, 4, 512, 640] dtype: float32
    INFO: output_name.2: merged shape: [1, 3, 512, 640] dtype: float32
    
  • 480x640 -> Abort
    sit4onnx -if dynamics_rife_sim.onnx -fs 1 6 480 640 -fs 1 -oep cuda
    
    INFO: file: dynamics_rife_sim.onnx
    INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
    INFO: input_name.1: x shape: [1, 6, 480, 640] dtype: float32
    INFO: input_name.2: timestep shape: [1] dtype: float32
    Traceback (most recent call last):
      File "/home/b920405/.local/bin/sit4onnx", line 8, in <module>
        sys.exit(main())
      File "/home/b920405/.local/lib/python3.10/site-packages/sit4onnx/onnx_inference_test.py", line 519, in main
        final_results = inference(
      File "/home/b920405/.local/lib/python3.10/site-packages/sit4onnx/onnx_inference_test.py", line 363, in inference
        results = onnx_session.run(
      File "/home/b920405/.local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
        return self._sess.run(output_names, input_feed, run_options)
    onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'Concat_378' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched dimensions of 480 and 512
    

@Shirifo
Copy link
Author

Shirifo commented Jan 16, 2025

@PINTO0309 Thank you for your quick reply!!!
You are right, in fact, the model seems to support integer multiples of 128, when I checked the model structure from near the gridsample.

Did you try to show that although the onnx model supports integer multimultials of 64/128 (which is considered dynamic input), it must be fixed to static values when converting to TFlite

As you have tried, I can successfully get the tflite model of float32 if I fix the model to static input, the static model I uploaded to google drive is (256*512). And, to use quantified commands, I fixed another input directly to 0.5 in the model, then change the input to two images.
At this stage, I got problem 2 and 3. I could not change the first x=tf.Tensor(shape=(1, 512, 2, 256), dtype=float32) to (1, 256, 512, 2), but changed the second y to the shape of the first one, and then modified the output. I'm not sure it makes a difference.
The TFlite result of float32 has only a small loss, but after quantification, the result is completely wrong

@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 16, 2025

Did you try to show that although the onnx model supports integer multimultials of 64/128 (which is considered dynamic input), it must be fixed to static values when converting to TFlite

No. To be precise, since the dynamic input size model cannot determine the correct dimensional position during the conversion process, a dummy-sized tentative inference tensor is generated at the beginning of the onnx2tf process and used for shape estimation.

There is no guarantee that the input tensor is an image, and it is necessary to assume all types of tensor other than 4D, such as audio data and sensor data, so if there are undefined dimensions in the shape of the ONNX input tensor, a fixed size 1 is set and dummy inference is performed.

onnx_inputs = gs_graph.inputs
input_names: List[str] = [inp.name for inp in onnx_inputs]
input_sizes: List[int] = [inp.shape for inp in onnx_inputs]
new_input_sizes = []
for input_size in input_sizes:
new_input_size = []
for idx, dim in enumerate(input_size):
if idx == 0 and input_sizes[0][0] is not None \
and not isinstance(input_sizes[0][0], str) \
and len(input_sizes[0]) == len(input_size) \
and (dim is None or isinstance(dim, str)):
# Batch size assignment for input OPs
new_input_size.append(input_sizes[0][0])
elif dim is None or isinstance(dim, str):
# Fixed and assigned 1
new_input_size.append(1)
else:
# Assign input shape as is
new_input_size.append(dim)
new_input_sizes.append(new_input_size)
input_sizes = new_input_sizes
input_dtypes: List[Any] = [inp.dtype for inp in onnx_inputs]
input_datas = {}

Therefore, in this case, since the test data x:[1, 6, 1, 1] and timestep:[1] were used, there was a discrepancy in the shape estimation process inside onnx2tf. In this case, the shape of Concat has become an unexpected size.

This cannot be fixed immediately. A function that allows users to specify hints about the tensor shape, such as

onnx2tf \
-i xxx.onnx \
--dummy_tensor_shape x:1,6,512,640 \
--dummy_tensor_shape timestep:1

needs to be added, and it will take a long time to implement. TFLite (LiteRT) supports dynamic tensor inference.

@PINTO0309 PINTO0309 added Bug bug OP:Concat OP:Concat labels Jan 16, 2025
@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 16, 2025

As you say, it seems that an error occurs even when it is fixed to a static shape. It is probably a bug in the dimension judgment processing of Concat or Add. It seems you have generated a fairly rare pattern.

onnx2tf \
-i dynamics_rife_sim.onnx \
-cotof \
-ois x:1,6,512,640 \
-ois timestep:1

INFO: 76 / 371
INFO: onnx_op_type: Div onnx_op_name: Div_238
INFO:  input_name.1: onnx____Div_425 shape: [1, 1, 512, 640] dtype: float32
INFO:  input_name.2: onnx::Div_433 shape: [] dtype: float32
INFO:  output_name.1: onnx____Concat_434 shape: [1, 1, 512, 640] dtype: float32
INFO: tf_op_type: divide
INFO:  input.1.x: name: tf.strided_slice_10/StridedSlice:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.2.y: shape: () dtype: float32 
INFO:  output.1.output: name: tf.math.divide_1/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 

INFO: 77 / 371
INFO: onnx_op_type: Div onnx_op_name: Div_252
INFO:  input_name.1: onnx____Div_439 shape: [1, 1, 512, 640] dtype: float32
INFO:  input_name.2: onnx::Div_447 shape: [] dtype: float32
INFO:  output_name.1: onnx____Concat_448 shape: [1, 1, 512, 640] dtype: float32
INFO: tf_op_type: divide
INFO:  input.1.x: name: tf.strided_slice_11/StridedSlice:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.2.y: shape: () dtype: float32 
INFO:  output.1.output: name: tf.math.divide_2/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 

:
:

INFO: 80 / 371
INFO: onnx_op_type: Concat onnx_op_name: Concat_253
INFO:  input_name.1: onnx____Concat_434 shape: [1, 1, 512, 640] dtype: float32
INFO:  input_name.2: onnx____Concat_448 shape: [1, 1, 512, 640] dtype: float32
INFO:  output_name.1: onnx____Add_449 shape: [1, 2, 512, 640] dtype: float32
INFO: tf_op_type: concat
INFO:  input.1.input0: name: tf.math.divide_1/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.2.input1: name: tf.math.divide_2/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.3.axis: val: 3 
INFO:  output.1.output: name: tf.concat_1/concat:0 shape: (1, 512, 640, 2) dtype: <dtype: 'float32'> 

:
:

INFO: 82 / 371
INFO: onnx_op_type: Add onnx_op_name: Add_254
INFO:  input_name.1: onnx::Cast_419 shape: [1, 2, 512, 640] dtype: float32
INFO:  input_name.2: onnx____Add_449 shape: [1, 2, 512, 640] dtype: float32
INFO:  output_name.1: onnx____Transpose_450 shape: [1, 2, 512, 640] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 312, in print_wrapper_func
    result = func(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 385, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 55, in get_replacement_parameter_wrapper_func
    func(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/ops/Add.py", line 281, in make_node
    merge_two_consecutive_identical_ops_into_one(
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 5454, in merge_two_consecutive_identical_ops_into_one
    tf.math.add(
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
    return op(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/b920405/.local/lib/python3.10/site-packages/tf_keras/src/layers/core/tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/tf_keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.math.add_28" (type TFOpLambda).

Dimensions must be equal, but are 640 and 512 for '{{node tf.math.add_28/Add}} = AddV2[T=DT_FLOAT](Placeholder, tf.math.add_28/Add/y)' with input shapes: [1,640,2,512], [1,512,640,2].

Call arguments received by layer "tf.math.add_28" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 640, 2, 512), dtype=float32)
  • y=tf.Tensor(shape=(1, 512, 640, 2), dtype=float32)
  • name='Add_254'

ERROR: input_onnx_file_path: dynamics_rife_sim.onnx
ERROR: onnx_op_name: Add_254
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC or an irregular channel arrangement other than NCHW, use the -kt or -kat option.
ERROR: Also, for models that include NonMaxSuppression in the post-processing, try the -onwdt option.

@Shirifo
Copy link
Author

Shirifo commented Jan 16, 2025

My original model came from
Practical RIFE
This is a frame Interpolation model and I hope to deploy on mobile

I modified the IFNet file in the 4.25lite version because some of these operations cannot be directly converted to onnx.

The static model I uploaded with replace.json can successfully convert 32float without much loss of accuracy.
Although my replace operation looks a little weird hahaha

@PINTO0309 PINTO0309 added the OP:Add OP:Add label Jan 16, 2025
PINTO0309 added a commit that referenced this issue Jan 17, 2025
PINTO0309 added a commit that referenced this issue Jan 17, 2025
@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 17, 2025

Fix only static shape model.

ValueError: Exception encountered when calling layer "tf.math.add_28" (type TFOpLambda).

Dimensions must be equal, but are 640 and 512 for '{{node tf.math.add_28/Add}} = AddV2[T=DT_FLOAT](Placeholder, tf.math.add_28/Add/y)' with input shapes: [1,640,2,512], [1,512,640,2].

Call arguments received by layer "tf.math.add_28" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 640, 2, 512), dtype=float32)
  • y=tf.Tensor(shape=(1, 512, 640, 2), dtype=float32)
  • name='Add_254'

https://github.com/PINTO0309/onnx2tf/releases/tag/1.26.4

onnx2tf -i static_rife_sim.onnx -cotof

Image

@Shirifo
Copy link
Author

Shirifo commented Jan 20, 2025

Thanks for your help!
I will continue to investigate the cause of quantization error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape OP:Add OP:Add OP:Concat OP:Concat TODO TODO Undefined dimension Undefined dimension
Projects
None yet
Development

No branches or pull requests

2 participants