Should `scale` and `bias` be required inputs for `batchNormalization` op? #481

huningxin · 2023-11-19T06:33:29Z

(Thanks @wacky6 for raising this issue during reviewing Chromium CL-5034594)

Regarding to the existing batchNormalization definition, scale and bias operands are optional members of MLBatchNormalizationOptions dictionary. Regarding to its calculation, if scale is not present, the element-wise multiplication can be eliminated, and if bias is not present, the element-wise addition can be eliminated too.

// Assume input tensor is 4-D of the "nchw" layout.
const shape = [1, c, 1, 1];
let output = builder.div(
    builder.sub(input, builder.reshape(mean, shape)),
    builder.sqrt(builder.add(builder.reshape(variance, shape), builder.constant(options.epsilon))));
if (options.scale)
    output = builder.mul(builder.reshape(options.scale, shape), output);
if (options.bias)
    output = builder.add(builder.reshape(options.bias, shape), output);
return output;

However, the optional scale and bias are not widely supported across frameworks and native ML APIs. This would cause the implementation more complex for those native ML APIs which don't support optional scale and bias, e.g., by making bias tensor with all 0 and scale tensor with all 1 at graph building time, if the scale and bias are not present.

Frameworks:

TensorFlow's tf.nn.batch_normalization: offset (equivalent to bias) and scale are required parameters.
ONNX's BatchNormalization: scale and B (equivalent to bias) are required inputs.
Pytorch's BatchNorm: gamma (equivalent to scale) and beta (equivalent to bias) are optional inputs, controlled by affine parameter.

Native ML APIs:

DirectML's DML_BATCH_NORMALIZATION_OPERATOR_DESC: ScaleTensor and BiasTensor are not annotated with _Maybenull_ , so they are supposed to be required.
MPS's MPSCNNBatchNormalizationDataSource: beta (equivalent to bias) and gamma (equivalent to scale) are annotated with Required.
Stable HLO's batch_norm_inference: scale and offset (equivalent to bias) are required inputs.

The proposal is to make the two operands required , for example

dictionary MLBatchNormalizationOptions {
  unsigned long axis = 1;
  float epsilon = 1e-5;
  MLActivation activation;
};

partial interface MLGraphBuilder {
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
                               MLOperand scale, MLOperand bias,
                               optional MLBatchNormalizationOptions options = {});
};

For some models that won't use scale and bias at the inference time, e.g., DenseNet 121, the frameworks can set scale's values to 1 and bias's values to 0.

/cc @wchao1115 @fdwr

The text was updated successfully, but these errors were encountered:

fdwr · 2023-11-23T03:36:12Z

@huningxin : Your analysis is persuasive. It might be convenient to callers to allow scale and bias to be optional, but if underlying backends do not support it (forcing implementations to add dummy 0 and 1 tensors), and frameworks are unlikely to generate such a call anyway, then making them required makes sense to me. (and yes, your reading of DML_BATCH_NORMALIZATION_OPERATOR_DESC is correct).

wchao1115 · 2023-11-30T16:44:10Z

@huningxin If I read this correctly, are you saying that tensor params should never be optional b/c it causes the implementation to have to allocate unnecessary buffer resources for them when dealing with a platform API that already treats them as required?

huningxin · 2023-12-01T00:15:59Z

@wchao1115

@huningxin If I read this correctly, are you saying that tensor params should never be optional b/c it causes the implementation to have to allocate unnecessary buffer resources for them when dealing with a platform API that already treats them as required?

The buffer resources are less concerned, because I suppose frameworks have to allocate dummy 0 and 1 tensors if the models don't need scale and bias, like DenseNet.

My point is if majority frameworks and native ML APIs have the scale and bias required, WebNN might be worth aligning with that, because this would simplify WebNN's implementation to deal with this uncommon usages.

However, as I mentioned in last WG call, on the other hand, this may prevent the potential future optimizations that a native implementation may eliminate the unnecessary element-wise multiplication (for scale) and addition (for bias) if the two are not present. So, I am wondering whether there is a such plan to do that optimization in native implementations. We may want to make this interface future-proof.

huningxin · 2024-09-23T05:57:36Z

CoreML mil.ops.defs.iOS15.normalization.batch_norm allows beta (equivalent to bias) and gamma (equivalent to scale) to be optional.

inexorabletash mentioned this issue Feb 6, 2024

Process: Add documentation for labels, current and proposed #533

Merged

3 tasks

anssiko added the operator specific label Feb 7, 2024

inexorabletash mentioned this issue May 2, 2024

Meta: Introduce "Interop" label? #673

Closed

inexorabletash added the interop label May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `scale` and `bias` be required inputs for `batchNormalization` op? #481

Should `scale` and `bias` be required inputs for `batchNormalization` op? #481

huningxin commented Nov 19, 2023 •

edited

Loading

fdwr commented Nov 23, 2023 •

edited

Loading

wchao1115 commented Nov 30, 2023

huningxin commented Dec 1, 2023

huningxin commented Sep 23, 2024

Should scale and bias be required inputs for batchNormalization op? #481

Should scale and bias be required inputs for batchNormalization op? #481

Comments

huningxin commented Nov 19, 2023 • edited Loading

Frameworks:

Native ML APIs:

fdwr commented Nov 23, 2023 • edited Loading

wchao1115 commented Nov 30, 2023

huningxin commented Dec 1, 2023

huningxin commented Sep 23, 2024

Should `scale` and `bias` be required inputs for `batchNormalization` op? #481

Should `scale` and `bias` be required inputs for `batchNormalization` op? #481

huningxin commented Nov 19, 2023 •

edited

Loading

fdwr commented Nov 23, 2023 •

edited

Loading