Skip to content

Commit

Permalink
[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurati…
Browse files Browse the repository at this point in the history
…ons (microsoft#22677)

### Description
Introduces the `get_qdq_config()` function to get a quantization
configuration for a full integer QDQ model. This function provides an
easier way of specifying commonly used options and sets convenient
defaults. Specifically:

- Instead of requiring the user to pass a dictionary of `extra_options`,
the new interface adds function parameters for common settings:
  - All calibrator settings
  - Whether activations/weights are symmetric
  - Whether to keep or fuse relu/clip into Q
  - Minimum real range for quantization
  - Dictionary of tensor quantization overrides.
- Automatically scans the input floating-point model and fills out the
operator types to quantize. Otherwise, only a limited number of operator
types would be quantized by default.
- Detects if the input model uses external data. If so, ensures that the
generated QDQ model also uses external data.
- Detects if the model will use newly introduced quantization types
(int4/int16) with an older opset. If so, forces the use of the
`com.microsoft` domain for Q/DQ ops, which support all types.
- Automatically enables the "extra option" called
`ForceQuantizeNoInputCheck` to ensure data movement operators (e.g.,
Transpose) are always quantized.
- User can pass a function to indicate which nodes to exclude from
quantization.
- The user can still pass their own `extra_options` to override any of
the above if necessary.
 
```python
from onnxruntime.quantization import get_int_qdq_config, quantize # , ...

# Get QDQ configuration
qdq_config = get_int_qdq_config(
    float_model,
    data_reader,
    calibrate_method=CalibrationMethod.Percentile,
    calibrate_args={"percentile": 99.98},  # Converted to extra_options
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QInt8,
    per_channel=True,
    nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"`

    # Other options converted to extra_options:
    min_real_range=0.0001,
    keep_removable_activations=True,
    activation_symmetric=True,
    weight_symmetric=True,
)

# Quantize model
quantize(float_model_path, qdq_model_path, qdq_config)
```
### Motivation and Context
Need a version of `get_qnn_qdq_config()` that is not EP-specific.
  • Loading branch information
adrianlizarraga authored and ankitm3k committed Dec 11, 2024
1 parent 6e5d9b8 commit 01ecbb0
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 7 deletions.
6 changes: 1 addition & 5 deletions onnxruntime/python/tools/quantization/quantize.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,6 @@ def get_qdq_config(
activation_symmetric: bool = False,
weight_symmetric: bool | None = None,
per_channel: bool = False,
reduce_range: bool = False,
keep_removable_activations: bool = False,
min_real_range: float | None = None,
tensor_quant_overrides: dict[str, list[dict[str, Any]]] | None = None,
Expand All @@ -246,7 +245,7 @@ def get_qdq_config(
calibration_data_reader: Calibration data reader.
calibrate_methode: The calibration method. Defaults to MinMax.
activation_type: The default activation quantization type. Defaults to QUInt8.
weight_type: The default weight quantization type. Defaults to QInt8.
weight_type: The default weight quantization type. Defaults to QUInt8.
activation_symmetric: True if activations should be quantized symmetrically (i.e, rmax == -rmin) by default.
Defaults to false. For int8 and int16, this results in zero-point values of 0. For uint8 and uint16,
the zero-point values are 127 and 32,767, respectively.
Expand All @@ -255,8 +254,6 @@ def get_qdq_config(
per_channel: Global option that determines if a fixed set of operator types should be quantized per-channel.
Defaults to false. Alternatively, use the tensor-level `tensor_quant_overrides` to select individual operators
and their quantization axes.
reduce_range: quantize weights with 1 less bit of precision (e.g., 7 bits for QInt8). Defaults to false.
May improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode.
keep_removable_activations: Defaults to false. If true, "removable" activations (e.g., Clip or Relu) will not
be removed, and will be explicitly represented in the QDQ model. If false, these activations
are automatically removed if activations are asymmetrically quantized. Keeping these activations
Expand Down Expand Up @@ -376,7 +373,6 @@ def get_qdq_config(
op_types_to_quantize=list(op_types.difference(op_types_to_exclude)),
nodes_to_exclude=final_nodes_to_exclude,
per_channel=per_channel,
reduce_range=reduce_range,
use_external_data_format=(model_has_external_data or model.ByteSize() >= MODEL_SIZE_THRESHOLD),
extra_options=final_extra_options,
)
Expand Down
2 changes: 0 additions & 2 deletions onnxruntime/test/python/quantization/test_get_qdq_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ def test_basic_args(self):
activation_type=QuantType.QUInt16,
weight_type=QuantType.QInt16,
per_channel=True,
reduce_range=True,
nodes_to_exclude=["Mul"],
# Other options converted to extra_options:
min_real_range=0.0001,
Expand All @@ -105,7 +104,6 @@ def test_basic_args(self):
self.assertEqual(qdq_config.activation_type, QuantType.QUInt16)
self.assertEqual(qdq_config.weight_type, QuantType.QInt16)
self.assertTrue(qdq_config.per_channel)
self.assertTrue(qdq_config.reduce_range)
self.assertEqual(set(qdq_config.nodes_to_exclude), {"Mul"})
self.assertEqual(set(qdq_config.op_types_to_quantize), {"Add"})

Expand Down

0 comments on commit 01ecbb0

Please sign in to comment.