Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BERT-Squad] INT8 quantization: The input data type must be Float32. #248

Closed
InputBlackBoxOutput opened this issue Mar 14, 2023 · 5 comments
Labels
Quantization Quantization Transformer Transformer

Comments

@InputBlackBoxOutput
Copy link

Issue Type

Others

onnx2tf version number

1.7.25

onnx version number

1.13.1

tensorflow version number

2.12.0rc1

Download URL for ONNX

https://github.com/onnx/models/blob/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx

Parameter Replacement JSON

None

Description

Hi,

I am trying to convert and INT8 quantize a BERT ONNX model. I am using the following command on my setup on Google Colab.

onnx2tf --output_integer_quantized_tflite -i {MODEL}.onnx -b 1 > {MODEL}.log

Ouput:

Model convertion started
============================================================

ERROR: For INT8 quantization, the input data type must be Float32. Also, if --quant_calib_input_op_name_np_data_path is not specified, all input OPs must assume 4D tensor image data. INPUT Name: unique_ids_raw_output___9:0 INPUT Shape: ['unk__492'] INPUT dtype: int64

I believe the model has int64 as the input datatype, causing the onnx2tf to fail. Is there a workaround for this ?

image

Thanks for creating such a fantastic tool!

@InputBlackBoxOutput
Copy link
Author

Please guide me on how to work around the Auto Calibration Check. I do not need the model to be accurate. I only need the model for hardware profiling for latency.

onnx2tf/onnx2tf/onnx2tf.py

Lines 722 to 733 in df183f1

# AUTO calib 4D check
if output_integer_quantized_tflite \
and quant_calib_input_op_name_np_data_path is None \
and (graph_input.dtype != np.float32 or len(graph_input.shape) != 4):
print(
f'{Color.RED}ERROR:{Color.RESET} ' +
f'For INT8 quantization, the input data type must be Float32. ' +
f'Also, if --quant_calib_input_op_name_np_data_path is not specified, ' +
f'all input OPs must assume 4D tensor image data. ' +
f'INPUT Name: {graph_input.name} INPUT Shape: {graph_input.shape} INPUT dtype: {graph_input.dtype}'
)
sys.exit(1)

@PINTO0309
Copy link
Owner

PINTO0309 commented Mar 14, 2023

If you do not need to perform INT8 quantization with this tool alone, the following method is the easiest.

The -osd option will output a saved_model.pb in the saved_model folder with the full size required for quantization. That is, a default signature named serving_default is embedded in .pb.

onnx2tf -i bertsquad-12.onnx -b 1 -osd
saved_model_cli show --dir saved_model/ --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_ids'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_input_ids:0
    inputs['input_mask'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_input_mask:0
    inputs['segment_ids'] tensor_info:
        dtype: DT_INT64
        shape: (1, 256)
        name: serving_default_segment_ids:0
    inputs['unique_ids_raw_output___9'] tensor_info:
        dtype: DT_INT64
        shape: (1)
        name: serving_default_unique_ids_raw_output___9:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['unique_ids_0'] tensor_info:
        dtype: DT_INT64
        shape: (1)
        name: PartitionedCall:0
    outputs['unstack_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 256)
        name: PartitionedCall:1
    outputs['unstack_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 256)
        name: PartitionedCall:2
  Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'RestoreV2', 'Tanh', 'Sub', 'FloorMod', 'Sqrt', 'Cast', 'Const', 'MergeV2Checkpoints', 'NoOp', 'GatherV2', 'Reshape', 'Select', 'Pack', 'ExpandDims', 'BatchMatMulV2', 'SaveV2', 'MatMul', 'Pow', 'ShardedFilename', 'StringJoin', 'Less', 'PartitionedCall', 'Softmax', 'Placeholder', 'Split', 'StaticRegexFullMatch', 'Mean', 'Squeeze', 'StridedSlice', 'OneHot', 'ConcatV2', 'Transpose', 'Identity', 'Reciprocal', 'StatefulPartitionedCall', 'AddV2', 'Mul', 'Fill'}

Next, simply follow the official tutorial to write and run a few lines of quantization source code.
https://www.tensorflow.org/lite/performance/post_training_quantization

import tensorflow as tf

def representative_dataset():
  for data in dataset:
    yield {
      "unique_ids_raw_output___9": data.unique_id,
      "segment_ids": data.segment_id,
      "input_mask": data.mask,
      "input_ids": data.input_id,
    }

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
  w.write(tflite_quant_model)

It will be by far easier to understand than reading my messy source code. Note that the above sample code has not been tested. If an error occurs anywhere, please modify it yourself and try again.

Ref: #222

@PINTO0309 PINTO0309 added the Quantization Quantization label Mar 14, 2023
@PINTO0309 PINTO0309 changed the title INT8 quantization: The input data type must be Float32. [BERT-Squad] INT8 quantization: The input data type must be Float32. Mar 14, 2023
@PINTO0309 PINTO0309 added the Transformer Transformer label Mar 14, 2023
@InputBlackBoxOutput
Copy link
Author

Hi @PINTO0309,
I got it working. Heres what I did:

import tensorflow as tf
import numpy as np

# Output of command: saved_model_cli show --dir saved_model/ --all
  # The given SavedModel SignatureDef contains the following input(s):
  #   inputs['input_ids_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_input_ids_0:0
  #   inputs['input_mask_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_input_mask_0:0
  #   inputs['segment_ids_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1, 256)
  #       name: serving_default_segment_ids_0:0
  #   inputs['unique_ids_raw_output___9_0'] tensor_info:
  #       dtype: DT_INT64
  #       shape: (1)
  #       name: serving_default_unique_ids_raw_output___9_0:0

def representative_dataset():
    yield {
      'input_ids_0': np.array([1 for i in range(256)]),
      'input_mask_0': np.array([1 for i in range(256)]),
      'segment_ids_0': np.array([1 for i in range(256)]),
      'unique_ids_raw_output___9_0': np.array([1]),
    }

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

with open('saved_model/int8_model.tflite', 'wb') as w:
  w.write(tflite_quant_model)

Thanks for the help!

@MrRace
Copy link

MrRace commented Apr 26, 2023

@InputBlackBoxOutput From you representative_dataset

def representative_dataset():
    yield {
      'input_ids_0': np.array([1 for i in range(256)]),
      'input_mask_0': np.array([1 for i in range(256)]),
      'segment_ids_0': np.array([1 for i in range(256)]),
      'unique_ids_raw_output___9_0': np.array([1]),
    }

Your calibration data is so simple, is that OK?

@InputBlackBoxOutput
Copy link
Author

InputBlackBoxOutput commented Apr 26, 2023

Hi @MrRace
I wanted to convert the model for profiling purpose only hence quantization accuracy was not taken into account during conversation. You will have to modify the code to make a correct representative dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quantization Quantization Transformer Transformer
Projects
None yet
Development

No branches or pull requests

3 participants