Problems when transforming dynamic input models and quantifying static models #729

Shirifo · 2025-01-16T06:37:52Z

Issue Type

Others

OS

Linux

onnx2tf version number

ooonx2tf -1.26.3

onnx version number

1.16.1

onnxruntime version number

1.18.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.18.0

Download URL for ONNX

https://drive.google.com/drive/folders/1BWeNDI2PMmORZqT-ZPkkrmozNMgGxWX5?usp=drive_link

Parameter Replacement JSON

{
  "format_version": 1,
  "operations": [
    {
      "op_name": "Add_216",
      "param_target": "inputs",
      "param_name": "onnx____Add_428",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_216",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_429",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_239",
      "param_target": "inputs",
      "param_name": "onnx____Add_451",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_239",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_452",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_383",
      "param_target": "inputs",
      "param_name": "onnx____Add_607",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_383",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_608",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_406",
      "param_target": "inputs",
      "param_name": "onnx____Add_630",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_406",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_631",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_550",
      "param_target": "inputs",
      "param_name": "onnx____Add_786",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_550",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_787",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_573",
      "param_target": "inputs",
      "param_name": "onnx____Add_809",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_573",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_810",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_717",
      "param_target": "inputs",
      "param_name": "onnx____Add_965",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_717",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_966",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_740",
      "param_target": "inputs",
      "param_name": "onnx____Add_988",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_740",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_989",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_877",
      "param_target": "inputs",
      "param_name": "onnx____Add_1137",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_877",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_1138",
      "post_process_transpose_perm": [0,3,1,2]
    },
    {
      "op_name": "Add_900",
      "param_target": "inputs",
      "param_name": "onnx____Add_1160",
      "pre_process_transpose_perm": [0,2,3,1]
    },
    {
      "op_name": "Add_900",
      "param_target": "outputs",
      "param_name": "onnx____Transpose_1161",
      "post_process_transpose_perm": [0,3,1,2]
    }
  ]
}

Description

Hi @PINTO0309 , thanks for all of your great work.
I am trying to convert an ONNX model with dynamic inputs to TFlite.
I have 3 problems.

1. The problem of Concat operator is difficult to solve.

I use the command:
onnx2tf -i dynamics_rife.onnx
And get

INFO: 42 / 480
INFO: onnx_op_type: ConvTranspose onnx_op_name: ConvTranspose_53
INFO:  input_name.1: onnx____ConvTranspose_204 shape: [1, 16, 'unk__5', 'unk__6'] dtype: float32
INFO:  input_name.2: encode.cnn3.weight shape: [16, 4, 4, 4] dtype: float32
INFO:  input_name.3: encode.cnn3.bias shape: [4] dtype: float32
INFO:  output_name.1: f0 shape: [1, 4, 'unk__7', 'unk__8'] dtype: float32
2025-01-14 10:36:06.649150228 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Concat node. Name:'Concat_76' Status Message: concat.cc:154 Preparn concat axis dimensions must match: Axis 2 has mismatched dimensions of 2 and 1
ERROR: The trace log is below.
Traceback (most recent call last):
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'Concat_76' Status Message: concat.cc:154 PrepareForComt axis dimensions must match: Axis 2 has mismatched dimensions of 2 and 1

Do you have a solution to this problem?

2. When I use static ONNX model, the concat can work but got another error:

INFO: 82 / 371
INFO: onnx_op_type: Add onnx_op_name: Add_219
INFO:  input_name.1: onnx::Cast_415 shape: [1, 2, 256, 512] dtype: float32
INFO:  input_name.2: onnx____Add_431 shape: [1, 2, 256, 512] dtype: float32
INFO:  output_name.1: onnx____Transpose_432 shape: [1, 2, 256, 512] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
Dimensions must be equal, but are 512 and 256 for '{{node tf.math.add_28/Add}} = AddV2[T=DT_FLOAT](Placeholder, tf.math.add_28/Add/y)' with input shapes: [1,512,2,256], [1,256,512,2].

Call arguments received by layer "tf.math.add_28" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 512, 2, 256), dtype=float32)
  • y=tf.Tensor(shape=(1, 256, 512, 2), dtype=float32)
  • name='Add_219'

I refer to this link to write a json file to make changes https://github.com/PINTO0309/onnx2tf/issues/103
I had a strange problem, I couldn't convert x(onnx::Cast_415) to (1, 256, 512, 2), so I chose to convert y(onnx____Add_431) and output(onnx____Transpose_432) to get the correct result, as follows

    {
    "op_name": "Add_219",
    "param_target": "inputs",
    "param_name": "onnx____Add_431",
    "pre_process_transpose_perm": [0,2,3,1]
  },
  {
    "op_name": "Add_219",
    "param_target": "outputs",
    "param_name": "onnx____Transpose_432",
    "post_process_transpose_perm": [0,3,1,2]
  },

After modifying all the Add I was able to get TFlite output, but it was much larger than the original onnx model. This seems to be a gridsample problem

After using int8 quantization, the obtained model output is completely wrong, and the order of the two outputs is reversed, do you know how to solve it. Below is the command
onnx2tf -i static_rife_sim.onnx -prf replace.json -oiqt

The text was updated successfully, but these errors were encountered:

PINTO0309 · 2025-01-16T07:04:24Z

This is not a solution. I only record what I have tried because I don't have time to work on it.

This model will probably terminate abnormally for all inferences except for those with an input resolution that is a multiple of 64.

512x640 -> Success

sit4onnx -if dynamics_rife_sim.onnx -fs 1 6 512 640 -fs 1 -oep cuda

INFO: file: dynamics_rife_sim.onnx
INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: x shape: [1, 6, 512, 640] dtype: float32
INFO: input_name.2: timestep shape: [1] dtype: float32
INFO: test_loop_count: 10
INFO: total elapsed time:  144.0567970275879 ms
INFO: avg elapsed time per pred:  14.405679702758789 ms
INFO: output_name.1: flow shape: [1, 4, 512, 640] dtype: float32
INFO: output_name.2: merged shape: [1, 3, 512, 640] dtype: float32

480x640 -> Abort

sit4onnx -if dynamics_rife_sim.onnx -fs 1 6 480 640 -fs 1 -oep cuda

INFO: file: dynamics_rife_sim.onnx
INFO: providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
INFO: input_name.1: x shape: [1, 6, 480, 640] dtype: float32
INFO: input_name.2: timestep shape: [1] dtype: float32
Traceback (most recent call last):
  File "/home/b920405/.local/bin/sit4onnx", line 8, in <module>
    sys.exit(main())
  File "/home/b920405/.local/lib/python3.10/site-packages/sit4onnx/onnx_inference_test.py", line 519, in main
    final_results = inference(
  File "/home/b920405/.local/lib/python3.10/site-packages/sit4onnx/onnx_inference_test.py", line 363, in inference
    results = onnx_session.run(
  File "/home/b920405/.local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'Concat_378' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched dimensions of 480 and 512

Shirifo · 2025-01-16T07:19:40Z

@PINTO0309 Thank you for your quick reply!!!
You are right, in fact, the model seems to support integer multiples of 128, when I checked the model structure from near the gridsample.

Did you try to show that although the onnx model supports integer multimultials of 64/128 (which is considered dynamic input), it must be fixed to static values when converting to TFlite

As you have tried, I can successfully get the tflite model of float32 if I fix the model to static input, the static model I uploaded to google drive is (256*512). And, to use quantified commands, I fixed another input directly to 0.5 in the model, then change the input to two images.
At this stage, I got problem 2 and 3. I could not change the first x=tf.Tensor(shape=(1, 512, 2, 256), dtype=float32) to (1, 256, 512, 2), but changed the second y to the shape of the first one, and then modified the output. I'm not sure it makes a difference.
The TFlite result of float32 has only a small loss, but after quantification, the result is completely wrong

PINTO0309 · 2025-01-16T08:18:05Z

Did you try to show that although the onnx model supports integer multimultials of 64/128 (which is considered dynamic input), it must be fixed to static values when converting to TFlite

No. To be precise, since the dynamic input size model cannot determine the correct dimensional position during the conversion process, a dummy-sized tentative inference tensor is generated at the beginning of the onnx2tf process and used for shape estimation.

There is no guarantee that the input tensor is an image, and it is necessary to assume all types of tensor other than 4D, such as audio data and sensor data, so if there are undefined dimensions in the shape of the ONNX input tensor, a fixed size 1 is set and dummy inference is performed.

onnx2tf/onnx2tf/utils/common_functions.py

Lines 3764 to 3786 in ff346ed

    
           onnx_inputs = gs_graph.inputs 
        
           input_names: List[str] = [inp.name for inp in onnx_inputs] 
        
           input_sizes: List[int] = [inp.shape for inp in onnx_inputs] 
        
           new_input_sizes = [] 
        
           for input_size in input_sizes: 
        
               new_input_size = [] 
        
               for idx, dim in enumerate(input_size): 
        
                   if idx == 0 and input_sizes[0][0] is not None \ 
        
                       and not isinstance(input_sizes[0][0], str) \ 
        
                       and len(input_sizes[0]) == len(input_size) \ 
        
                       and (dim is None or isinstance(dim, str)): 
        
                       # Batch size assignment for input OPs 
        
                       new_input_size.append(input_sizes[0][0]) 
        
                   elif dim is None or isinstance(dim, str): 
        
                       # Fixed and assigned 1 
        
                       new_input_size.append(1) 
        
                   else: 
        
                       # Assign input shape as is 
        
                       new_input_size.append(dim) 
        
               new_input_sizes.append(new_input_size) 
        
           input_sizes = new_input_sizes 
        
           input_dtypes: List[Any] = [inp.dtype for inp in onnx_inputs] 
        
           input_datas = {}

Therefore, in this case, since the test data x:[1, 6, 1, 1] and timestep:[1] were used, there was a discrepancy in the shape estimation process inside onnx2tf. In this case, the shape of Concat has become an unexpected size.

This cannot be fixed immediately. A function that allows users to specify hints about the tensor shape, such as

onnx2tf \
-i xxx.onnx \
--dummy_tensor_shape x:1,6,512,640 \
--dummy_tensor_shape timestep:1

needs to be added, and it will take a long time to implement. TFLite (LiteRT) supports dynamic tensor inference.

PINTO0309 · 2025-01-16T08:35:43Z

As you say, it seems that an error occurs even when it is fixed to a static shape. It is probably a bug in the dimension judgment processing of Concat or Add. It seems you have generated a fairly rare pattern.

onnx2tf \
-i dynamics_rife_sim.onnx \
-cotof \
-ois x:1,6,512,640 \
-ois timestep:1

INFO: 76 / 371
INFO: onnx_op_type: Div onnx_op_name: Div_238
INFO:  input_name.1: onnx____Div_425 shape: [1, 1, 512, 640] dtype: float32
INFO:  input_name.2: onnx::Div_433 shape: [] dtype: float32
INFO:  output_name.1: onnx____Concat_434 shape: [1, 1, 512, 640] dtype: float32
INFO: tf_op_type: divide
INFO:  input.1.x: name: tf.strided_slice_10/StridedSlice:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.2.y: shape: () dtype: float32 
INFO:  output.1.output: name: tf.math.divide_1/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 

INFO: 77 / 371
INFO: onnx_op_type: Div onnx_op_name: Div_252
INFO:  input_name.1: onnx____Div_439 shape: [1, 1, 512, 640] dtype: float32
INFO:  input_name.2: onnx::Div_447 shape: [] dtype: float32
INFO:  output_name.1: onnx____Concat_448 shape: [1, 1, 512, 640] dtype: float32
INFO: tf_op_type: divide
INFO:  input.1.x: name: tf.strided_slice_11/StridedSlice:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.2.y: shape: () dtype: float32 
INFO:  output.1.output: name: tf.math.divide_2/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 

：
：

INFO: 80 / 371
INFO: onnx_op_type: Concat onnx_op_name: Concat_253
INFO:  input_name.1: onnx____Concat_434 shape: [1, 1, 512, 640] dtype: float32
INFO:  input_name.2: onnx____Concat_448 shape: [1, 1, 512, 640] dtype: float32
INFO:  output_name.1: onnx____Add_449 shape: [1, 2, 512, 640] dtype: float32
INFO: tf_op_type: concat
INFO:  input.1.input0: name: tf.math.divide_1/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.2.input1: name: tf.math.divide_2/truediv:0 shape: (1, 512, 640, 1) dtype: <dtype: 'float32'> 
INFO:  input.3.axis: val: 3 
INFO:  output.1.output: name: tf.concat_1/concat:0 shape: (1, 512, 640, 2) dtype: <dtype: 'float32'> 

：
：

INFO: 82 / 371
INFO: onnx_op_type: Add onnx_op_name: Add_254
INFO:  input_name.1: onnx::Cast_419 shape: [1, 2, 512, 640] dtype: float32
INFO:  input_name.2: onnx____Add_449 shape: [1, 2, 512, 640] dtype: float32
INFO:  output_name.1: onnx____Transpose_450 shape: [1, 2, 512, 640] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 312, in print_wrapper_func
    result = func(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 385, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 55, in get_replacement_parameter_wrapper_func
    func(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/ops/Add.py", line 281, in make_node
    merge_two_consecutive_identical_ops_into_one(
  File "/home/b920405/.local/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 5454, in merge_two_consecutive_identical_ops_into_one
    tf.math.add(
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
    return op(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/b920405/.local/lib/python3.10/site-packages/tf_keras/src/layers/core/tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/tf_keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.math.add_28" (type TFOpLambda).

Dimensions must be equal, but are 640 and 512 for '{{node tf.math.add_28/Add}} = AddV2[T=DT_FLOAT](Placeholder, tf.math.add_28/Add/y)' with input shapes: [1,640,2,512], [1,512,640,2].

Call arguments received by layer "tf.math.add_28" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 640, 2, 512), dtype=float32)
  • y=tf.Tensor(shape=(1, 512, 640, 2), dtype=float32)
  • name='Add_254'

ERROR: input_onnx_file_path: dynamics_rife_sim.onnx
ERROR: onnx_op_name: Add_254
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC or an irregular channel arrangement other than NCHW, use the -kt or -kat option.
ERROR: Also, for models that include NonMaxSuppression in the post-processing, try the -onwdt option.

Shirifo · 2025-01-16T08:55:48Z

My original model came from
Practical RIFE
This is a frame Interpolation model and I hope to deploy on mobile

I modified the IFNet file in the 4.25lite version because some of these operations cannot be directly converted to onnx.

The static model I uploaded with replace.json can successfully convert 32float without much loss of accuracy.
Although my replace operation looks a little weird hahaha

#729 shape unmatch workaround test

PINTO0309 · 2025-01-17T12:41:26Z

Fix only static shape model.

ValueError: Exception encountered when calling layer "tf.math.add_28" (type TFOpLambda).

Dimensions must be equal, but are 640 and 512 for '{{node tf.math.add_28/Add}} = AddV2[T=DT_FLOAT](Placeholder, tf.math.add_28/Add/y)' with input shapes: [1,640,2,512], [1,512,640,2].

Call arguments received by layer "tf.math.add_28" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 640, 2, 512), dtype=float32)
  • y=tf.Tensor(shape=(1, 512, 640, 2), dtype=float32)
  • name='Add_254'

https://github.com/PINTO0309/onnx2tf/releases/tag/1.26.4

onnx2tf -i static_rife_sim.onnx -cotof

Shirifo · 2025-01-20T01:50:45Z

Thanks for your help!
I will continue to investigate the cause of quantization error.

PINTO0309 added TODO TODO Undefined dimension Undefined dimension Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape labels Jan 16, 2025

PINTO0309 added Bug bug OP:Concat OP:Concat labels Jan 16, 2025

PINTO0309 added the OP:Add OP:Add label Jan 16, 2025

PINTO0309 added a commit that referenced this issue Jan 17, 2025

#729 shape unmatch workaround test

49ade3f

PINTO0309 mentioned this issue Jan 17, 2025

#729 shape unmatch workaround test #730

Merged

PINTO0309 added a commit that referenced this issue Jan 17, 2025

Merge pull request #730 from PINTO0309/fix_add_shape_unmatch

f16ea9d

#729 shape unmatch workaround test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems when transforming dynamic input models and quantifying static models #729

Problems when transforming dynamic input models and quantifying static models #729

Shirifo commented Jan 16, 2025

PINTO0309 commented Jan 16, 2025 •

edited

Loading

Shirifo commented Jan 16, 2025 •

edited

Loading

PINTO0309 commented Jan 16, 2025 •

edited

Loading

PINTO0309 commented Jan 16, 2025 •

edited

Loading

Shirifo commented Jan 16, 2025

PINTO0309 commented Jan 17, 2025 •

edited

Loading

Shirifo commented Jan 20, 2025

Problems when transforming dynamic input models and quantifying static models #729

Problems when transforming dynamic input models and quantifying static models #729

Comments

Shirifo commented Jan 16, 2025

Issue Type

OS

onnx2tf version number

onnx version number

onnxruntime version number

onnxsim (onnx_simplifier) version number

tensorflow version number

Download URL for ONNX

Parameter Replacement JSON

Description

PINTO0309 commented Jan 16, 2025 • edited Loading

Shirifo commented Jan 16, 2025 • edited Loading

PINTO0309 commented Jan 16, 2025 • edited Loading

PINTO0309 commented Jan 16, 2025 • edited Loading

Shirifo commented Jan 16, 2025

PINTO0309 commented Jan 17, 2025 • edited Loading

Shirifo commented Jan 20, 2025

PINTO0309 commented Jan 16, 2025 •

edited

Loading

Shirifo commented Jan 16, 2025 •

edited

Loading

PINTO0309 commented Jan 16, 2025 •

edited

Loading

PINTO0309 commented Jan 16, 2025 •

edited

Loading

PINTO0309 commented Jan 17, 2025 •

edited

Loading