-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should scale
and bias
be required inputs for batchNormalization
op?
#481
Comments
@huningxin : Your analysis is persuasive. It might be convenient to callers to allow scale and bias to be optional, but if underlying backends do not support it (forcing implementations to add dummy 0 and 1 tensors), and frameworks are unlikely to generate such a call anyway, then making them required makes sense to me. (and yes, your reading of DML_BATCH_NORMALIZATION_OPERATOR_DESC is correct). |
@huningxin If I read this correctly, are you saying that tensor params should never be optional b/c it causes the implementation to have to allocate unnecessary buffer resources for them when dealing with a platform API that already treats them as required? |
The buffer resources are less concerned, because I suppose frameworks have to allocate dummy 0 and 1 tensors if the models don't need scale and bias, like DenseNet. My point is if majority frameworks and native ML APIs have the scale and bias required, WebNN might be worth aligning with that, because this would simplify WebNN's implementation to deal with this uncommon usages. However, as I mentioned in last WG call, on the other hand, this may prevent the potential future optimizations that a native implementation may eliminate the unnecessary element-wise multiplication (for scale) and addition (for bias) if the two are not present. So, I am wondering whether there is a such plan to do that optimization in native implementations. We may want to make this interface future-proof. |
CoreML mil.ops.defs.iOS15.normalization.batch_norm allows beta (equivalent to bias) and gamma (equivalent to scale) to be optional. |
(Thanks @wacky6 for raising this issue during reviewing Chromium CL-5034594)
Regarding to the existing
batchNormalization
definition,scale
andbias
operands are optional members ofMLBatchNormalizationOptions
dictionary. Regarding to its calculation, ifscale
is not present, the element-wise multiplication can be eliminated, and ifbias
is not present, the element-wise addition can be eliminated too.However, the optional
scale
andbias
are not widely supported across frameworks and native ML APIs. This would cause the implementation more complex for those native ML APIs which don't support optionalscale
andbias
, e.g., by making bias tensor with all 0 and scale tensor with all 1 at graph building time, if thescale
andbias
are not present.Frameworks:
tf.nn.batch_normalization
:offset
(equivalent tobias
) andscale
are required parameters.BatchNormalization
:scale
andB
(equivalent tobias
) are required inputs.BatchNorm
:gamma
(equivalent toscale
) andbeta
(equivalent tobias
) are optional inputs, controlled byaffine
parameter.Native ML APIs:
DML_BATCH_NORMALIZATION_OPERATOR_DESC
:ScaleTensor
andBiasTensor
are not annotated with_Maybenull_
, so they are supposed to be required.MPSCNNBatchNormalizationDataSource
:beta
(equivalent tobias
) andgamma
(equivalent toscale
) are annotated withRequired
.batch_norm_inference
:scale
andoffset
(equivalent tobias
) are required inputs.The proposal is to make the two operands required , for example
For some models that won't use
scale
andbias
at the inference time, e.g., DenseNet 121, the frameworks can setscale
's values to 1 andbias
's values to 0./cc @wchao1115 @fdwr
The text was updated successfully, but these errors were encountered: