-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BERT-Squad] INT8 quantization: The input data type must be Float32. #248
Comments
Please guide me on how to work around the Auto Calibration Check. I do not need the model to be accurate. I only need the model for hardware profiling for latency. Lines 722 to 733 in df183f1
|
If you do not need to perform INT8 quantization with this tool alone, the following method is the easiest. The
saved_model_cli show --dir saved_model/ --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_ids'] tensor_info:
dtype: DT_INT64
shape: (1, 256)
name: serving_default_input_ids:0
inputs['input_mask'] tensor_info:
dtype: DT_INT64
shape: (1, 256)
name: serving_default_input_mask:0
inputs['segment_ids'] tensor_info:
dtype: DT_INT64
shape: (1, 256)
name: serving_default_segment_ids:0
inputs['unique_ids_raw_output___9'] tensor_info:
dtype: DT_INT64
shape: (1)
name: serving_default_unique_ids_raw_output___9:0
The given SavedModel SignatureDef contains the following output(s):
outputs['unique_ids_0'] tensor_info:
dtype: DT_INT64
shape: (1)
name: PartitionedCall:0
outputs['unstack_0'] tensor_info:
dtype: DT_FLOAT
shape: (1, 256)
name: PartitionedCall:1
outputs['unstack_1'] tensor_info:
dtype: DT_FLOAT
shape: (1, 256)
name: PartitionedCall:2
Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'RestoreV2', 'Tanh', 'Sub', 'FloorMod', 'Sqrt', 'Cast', 'Const', 'MergeV2Checkpoints', 'NoOp', 'GatherV2', 'Reshape', 'Select', 'Pack', 'ExpandDims', 'BatchMatMulV2', 'SaveV2', 'MatMul', 'Pow', 'ShardedFilename', 'StringJoin', 'Less', 'PartitionedCall', 'Softmax', 'Placeholder', 'Split', 'StaticRegexFullMatch', 'Mean', 'Squeeze', 'StridedSlice', 'OneHot', 'ConcatV2', 'Transpose', 'Identity', 'Reciprocal', 'StatefulPartitionedCall', 'AddV2', 'Mul', 'Fill'} Next, simply follow the official tutorial to write and run a few lines of quantization source code. import tensorflow as tf
def representative_dataset():
for data in dataset:
yield {
"unique_ids_raw_output___9": data.unique_id,
"segment_ids": data.segment_id,
"input_mask": data.mask,
"input_ids": data.input_id,
}
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
w.write(tflite_quant_model) It will be by far easier to understand than reading my messy source code. Note that the above sample code has not been tested. If an error occurs anywhere, please modify it yourself and try again. Ref: #222 |
Hi @PINTO0309, import tensorflow as tf
import numpy as np
# Output of command: saved_model_cli show --dir saved_model/ --all
# The given SavedModel SignatureDef contains the following input(s):
# inputs['input_ids_0'] tensor_info:
# dtype: DT_INT64
# shape: (1, 256)
# name: serving_default_input_ids_0:0
# inputs['input_mask_0'] tensor_info:
# dtype: DT_INT64
# shape: (1, 256)
# name: serving_default_input_mask_0:0
# inputs['segment_ids_0'] tensor_info:
# dtype: DT_INT64
# shape: (1, 256)
# name: serving_default_segment_ids_0:0
# inputs['unique_ids_raw_output___9_0'] tensor_info:
# dtype: DT_INT64
# shape: (1)
# name: serving_default_unique_ids_raw_output___9_0:0
def representative_dataset():
yield {
'input_ids_0': np.array([1 for i in range(256)]),
'input_mask_0': np.array([1 for i in range(256)]),
'segment_ids_0': np.array([1 for i in range(256)]),
'unique_ids_raw_output___9_0': np.array([1]),
}
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
with open('saved_model/int8_model.tflite', 'wb') as w:
w.write(tflite_quant_model) Thanks for the help! |
@InputBlackBoxOutput From you representative_dataset
Your calibration data is so simple, is that OK? |
Hi @MrRace |
Issue Type
Others
onnx2tf version number
1.7.25
onnx version number
1.13.1
tensorflow version number
2.12.0rc1
Download URL for ONNX
https://github.com/onnx/models/blob/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
Parameter Replacement JSON
None
Description
Hi,
I am trying to convert and INT8 quantize a BERT ONNX model. I am using the following command on my setup on Google Colab.
Ouput:
I believe the model has int64 as the input datatype, causing the onnx2tf to fail. Is there a workaround for this ?
Thanks for creating such a fantastic tool!
The text was updated successfully, but these errors were encountered: