Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should scale and bias be required inputs for batchNormalization op? #481

Open
huningxin opened this issue Nov 19, 2023 · 4 comments
Open

Comments

@huningxin
Copy link
Contributor

huningxin commented Nov 19, 2023

(Thanks @wacky6 for raising this issue during reviewing Chromium CL-5034594)

Regarding to the existing batchNormalization definition, scale and bias operands are optional members of MLBatchNormalizationOptions dictionary. Regarding to its calculation, if scale is not present, the element-wise multiplication can be eliminated, and if bias is not present, the element-wise addition can be eliminated too.

// Assume input tensor is 4-D of the "nchw" layout.
const shape = [1, c, 1, 1];
let output = builder.div(
    builder.sub(input, builder.reshape(mean, shape)),
    builder.sqrt(builder.add(builder.reshape(variance, shape), builder.constant(options.epsilon))));
if (options.scale)
    output = builder.mul(builder.reshape(options.scale, shape), output);
if (options.bias)
    output = builder.add(builder.reshape(options.bias, shape), output);
return output;

However, the optional scale and bias are not widely supported across frameworks and native ML APIs. This would cause the implementation more complex for those native ML APIs which don't support optional scale and bias, e.g., by making bias tensor with all 0 and scale tensor with all 1 at graph building time, if the scale and bias are not present.

Frameworks:

  • TensorFlow's tf.nn.batch_normalization: offset (equivalent to bias) and scale are required parameters.
  • ONNX's BatchNormalization: scale and B (equivalent to bias) are required inputs.
  • Pytorch's BatchNorm: gamma (equivalent to scale) and beta (equivalent to bias) are optional inputs, controlled by affine parameter.

Native ML APIs:

The proposal is to make the two operands required , for example

dictionary MLBatchNormalizationOptions {
  unsigned long axis = 1;
  float epsilon = 1e-5;
  MLActivation activation;
};

partial interface MLGraphBuilder {
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
                               MLOperand scale, MLOperand bias,
                               optional MLBatchNormalizationOptions options = {});
};

For some models that won't use scale and bias at the inference time, e.g., DenseNet 121, the frameworks can set scale's values to 1 and bias's values to 0.

/cc @wchao1115 @fdwr

@fdwr
Copy link
Collaborator

fdwr commented Nov 23, 2023

@huningxin : Your analysis is persuasive. It might be convenient to callers to allow scale and bias to be optional, but if underlying backends do not support it (forcing implementations to add dummy 0 and 1 tensors), and frameworks are unlikely to generate such a call anyway, then making them required makes sense to me. (and yes, your reading of DML_BATCH_NORMALIZATION_OPERATOR_DESC is correct).

@wchao1115
Copy link
Collaborator

@huningxin If I read this correctly, are you saying that tensor params should never be optional b/c it causes the implementation to have to allocate unnecessary buffer resources for them when dealing with a platform API that already treats them as required?

@huningxin
Copy link
Contributor Author

@wchao1115

@huningxin If I read this correctly, are you saying that tensor params should never be optional b/c it causes the implementation to have to allocate unnecessary buffer resources for them when dealing with a platform API that already treats them as required?

The buffer resources are less concerned, because I suppose frameworks have to allocate dummy 0 and 1 tensors if the models don't need scale and bias, like DenseNet.

My point is if majority frameworks and native ML APIs have the scale and bias required, WebNN might be worth aligning with that, because this would simplify WebNN's implementation to deal with this uncommon usages.

However, as I mentioned in last WG call, on the other hand, this may prevent the potential future optimizations that a native implementation may eliminate the unnecessary element-wise multiplication (for scale) and addition (for bias) if the two are not present. So, I am wondering whether there is a such plan to do that optimization in native implementations. We may want to make this interface future-proof.

@huningxin
Copy link
Contributor Author

CoreML mil.ops.defs.iOS15.normalization.batch_norm allows beta (equivalent to bias) and gamma (equivalent to scale) to be optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants