[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

InputBlackBoxOutput · 2023-03-14T05:59:35Z

Issue Type

Others

onnx2tf version number

1.7.25

onnx version number

1.13.1

tensorflow version number

2.12.0rc1

Download URL for ONNX

https://github.com/onnx/models/blob/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx

Parameter Replacement JSON

None

Description

Hi,

I am trying to convert and INT8 quantize a BERT ONNX model. I am using the following command on my setup on Google Colab.

onnx2tf --output_integer_quantized_tflite -i {MODEL}.onnx -b 1 > {MODEL}.log

Ouput:

Model convertion started
============================================================

ERROR: For INT8 quantization, the input data type must be Float32. Also, if --quant_calib_input_op_name_np_data_path is not specified, all input OPs must assume 4D tensor image data. INPUT Name: unique_ids_raw_output___9:0 INPUT Shape: ['unk__492'] INPUT dtype: int64

I believe the model has int64 as the input datatype, causing the onnx2tf to fail. Is there a workaround for this ?

Thanks for creating such a fantastic tool!

The text was updated successfully, but these errors were encountered:

InputBlackBoxOutput · 2023-03-14T06:11:04Z

Please guide me on how to work around the Auto Calibration Check. I do not need the model to be accurate. I only need the model for hardware profiling for latency.

onnx2tf/onnx2tf/onnx2tf.py

Lines 722 to 733 in df183f1

    
           # AUTO calib 4D check 
        
           if output_integer_quantized_tflite \ 
        
               and quant_calib_input_op_name_np_data_path is None \ 
        
               and (graph_input.dtype != np.float32 or len(graph_input.shape) != 4): 
        
               print( 
        
                   f'{Color.RED}ERROR:{Color.RESET} ' + 
        
                   f'For INT8 quantization, the input data type must be Float32. ' + 
        
                   f'Also, if --quant_calib_input_op_name_np_data_path is not specified, ' + 
        
                   f'all input OPs must assume 4D tensor image data. ' + 
        
                   f'INPUT Name: {graph_input.name} INPUT Shape: {graph_input.shape} INPUT dtype: {graph_input.dtype}' 
        
               ) 
        
               sys.exit(1)

PINTO0309 · 2023-03-14T09:20:12Z

If you do not need to perform INT8 quantization with this tool alone, the following method is the easiest.

The -osd option will output a saved_model.pb in the saved_model folder with the full size required for quantization. That is, a default signature named serving_default is embedded in .pb.

onnx2tf -i bertsquad-12.onnx -b 1 -osd

saved_model_cli show --dir saved_model/ --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_ids'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_input_ids:0
    inputs['input_mask'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_input_mask:0
    inputs['segment_ids'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_segment_ids:0
    inputs['unique_ids_raw_output___9'] tensor_info:
        dtype: DT_INT64
        shape: (1)
        name: serving_default_unique_ids_raw_output___9:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['unique_ids_0'] tensor_info:
        dtype: DT_INT64
        shape: (1)
        name: PartitionedCall:0
    outputs['unstack_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 256)
        name: PartitionedCall:1
    outputs['unstack_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 256)
        name: PartitionedCall:2
  Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'RestoreV2', 'Tanh', 'Sub', 'FloorMod', 'Sqrt', 'Cast', 'Const', 'MergeV2Checkpoints', 'NoOp', 'GatherV2', 'Reshape', 'Select', 'Pack', 'ExpandDims', 'BatchMatMulV2', 'SaveV2', 'MatMul', 'Pow', 'ShardedFilename', 'StringJoin', 'Less', 'PartitionedCall', 'Softmax', 'Placeholder', 'Split', 'StaticRegexFullMatch', 'Mean', 'Squeeze', 'StridedSlice', 'OneHot', 'ConcatV2', 'Transpose', 'Identity', 'Reciprocal', 'StatefulPartitionedCall', 'AddV2', 'Mul', 'Fill'}

Next, simply follow the official tutorial to write and run a few lines of quantization source code.
https://www.tensorflow.org/lite/performance/post_training_quantization

import tensorflow as tf

def representative_dataset():
  for data in dataset:
    yield {
      "unique_ids_raw_output___9": data.unique_id,
      "segment_ids": data.segment_id,
      "input_mask": data.mask,
      "input_ids": data.input_id,
    }

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
  w.write(tflite_quant_model)

It will be by far easier to understand than reading my messy source code. Note that the above sample code has not been tested. If an error occurs anywhere, please modify it yourself and try again.

Ref: #222

InputBlackBoxOutput · 2023-03-14T19:45:35Z

Hi @PINTO0309,
I got it working. Heres what I did:

import tensorflow as tf
import numpy as np

# Output of command: saved_model_cli show --dir saved_model/ --all
  # The given SavedModel SignatureDef contains the following input(s):
  #   inputs['input_ids_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_input_ids_0:0
  #   inputs['input_mask_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_input_mask_0:0
  #   inputs['segment_ids_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_segment_ids_0:0
  #   inputs['unique_ids_raw_output___9_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1)
  #       name: serving_default_unique_ids_raw_output___9_0:0

def representative_dataset():
    yield {
      'input_ids_0': np.array([1 for i in range(256)]),
      'input_mask_0': np.array([1 for i in range(256)]),
      'segment_ids_0': np.array([1 for i in range(256)]),
      'unique_ids_raw_output___9_0': np.array([1]),
    }

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

with open('saved_model/int8_model.tflite', 'wb') as w:
  w.write(tflite_quant_model)

Thanks for the help!

MrRace · 2023-04-26T10:04:13Z

@InputBlackBoxOutput From you representative_dataset

def representative_dataset():
    yield {
      'input_ids_0': np.array([1 for i in range(256)]),
      'input_mask_0': np.array([1 for i in range(256)]),
      'segment_ids_0': np.array([1 for i in range(256)]),
      'unique_ids_raw_output___9_0': np.array([1]),
    }

Your calibration data is so simple, is that OK?

InputBlackBoxOutput · 2023-04-26T10:15:26Z

Hi @MrRace
I wanted to convert the model for profiling purpose only hence quantization accuracy was not taken into account during conversation. You will have to modify the code to make a correct representative dataset.

PINTO0309 added the Quantization Quantization label Mar 14, 2023

PINTO0309 changed the title ~~INT8 quantization: The input data type must be Float32.~~ [BERT-Squad] INT8 quantization: The input data type must be Float32. Mar 14, 2023

PINTO0309 added the Transformer Transformer label Mar 14, 2023

InputBlackBoxOutput closed this as completed Mar 14, 2023

PINTO0309 mentioned this issue Mar 15, 2023

Improved INT8 quantization to be performed based on saved_model signature information #249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

InputBlackBoxOutput commented Mar 14, 2023

InputBlackBoxOutput commented Mar 14, 2023

PINTO0309 commented Mar 14, 2023 •

edited

Loading

InputBlackBoxOutput commented Mar 14, 2023

MrRace commented Apr 26, 2023

InputBlackBoxOutput commented Apr 26, 2023 •

edited

Loading

[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

Comments

InputBlackBoxOutput commented Mar 14, 2023

Issue Type

onnx2tf version number

onnx version number

tensorflow version number

Download URL for ONNX

Parameter Replacement JSON

Description

InputBlackBoxOutput commented Mar 14, 2023

PINTO0309 commented Mar 14, 2023 • edited Loading

InputBlackBoxOutput commented Mar 14, 2023

MrRace commented Apr 26, 2023

InputBlackBoxOutput commented Apr 26, 2023 • edited Loading

PINTO0309 commented Mar 14, 2023 •

edited

Loading

InputBlackBoxOutput commented Apr 26, 2023 •

edited

Loading