-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Op][Spec] RMSNorm Operator Specification #23569
Changes from 2 commits
5793319
6e870ab
bfc4417
b559e64
d95e21f
04037dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
.. {#openvino_docs_ops_normalization_RMS_14} | ||
|
||
RMS | ||
=== | ||
|
||
|
||
.. meta:: | ||
:description: Learn about RMS-14 - a normalization operation. | ||
|
||
**Versioned name**: *RMS-14* | ||
|
||
**Category**: *Normalization* | ||
|
||
**Short description**: Calculates Root Mean Square (RMS) normalization of the input tensor. | ||
|
||
**Detailed description** | ||
|
||
*RMSNorm* operation performs Root Mean Square (RMS) normalization on a given input ``data`` along dimensions specified by ``axes`` input. | ||
`Reference <https://arxiv.org/abs/1910.07467>`__. | ||
|
||
.. math:: | ||
|
||
(x / Sqrt(ReduceMean(x^2, axes) + eps)) | ||
|
||
|
||
- If the optional ``scale`` input is provided: | ||
|
||
.. math:: | ||
|
||
(x / Sqrt(ReduceMean(x^2, axes) + eps)) * scale | ||
|
||
|
||
**Attributes** | ||
|
||
* *epsilon* | ||
|
||
* **Description**: A very small value added to the variance for numerical stability. Ensures that division by zero does not occur for any normalized element. | ||
* **Range of values**: a positive floating-point number | ||
* **Type**: ``float`` | ||
* **Required**: *yes* | ||
|
||
* *compute_type* | ||
|
||
* **Description**: The precision for internal computation, before scaling. | ||
* **Range of values**: Supported floating point type: "f32", "f16", ... | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any other types except There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the models I've seen cast to f32, in general any type can be allowed to comply with Convert capabilities, but it can be not a real use case. |
||
* **Type**: ``string`` | ||
* **Default value**: "undefined" (the same as the input type) | ||
* **Required**: *no* | ||
|
||
|
||
**Inputs** | ||
|
||
* **1**: ``data`` - Input data to be normalized. A tensor of type *T* and arbitrary shape. **Required.** | ||
|
||
* **2**: ``axes`` - 1D tensor which specifies indices of dimensions in ``data`` that define normalization slices. Allowed range of axes is ``[-r; r-1]`` where ``r = rank(data)``, the order can be not sorted. Negative value means counting dimensions from the back. Type *T_AXES*. **Required.** | ||
|
||
* **3**: ``scale`` - A tensor of type *T* containing the scale values for . The shape should be broadcastable to the shape of ``data`` tensor. **Optional.** | ||
|
||
|
||
**Outputs** | ||
|
||
* **1**: Output tensor of the same shape and type as the ``data`` input tensor. | ||
|
||
**Types** | ||
|
||
* *T*: any floating point type. | ||
* *T_AXES*: ``int64`` or ``int32``. | ||
|
||
**Example** | ||
|
||
.. code-block:: xml | ||
:force: | ||
|
||
<layer ... type="RMS"> | ||
<data eps="1e-6"/> | ||
<input> | ||
<port id="0"> | ||
<dim>6</dim> | ||
<dim>12</dim> | ||
<dim>10</dim> | ||
<dim>24</dim> | ||
</port> | ||
<port id="1"> | ||
<dim>1</dim> <!-- value of [-1] means normalization over the last dimension --> | ||
</port> | ||
</input> | ||
<output> | ||
<port id="2"> | ||
<dim>6</dim> | ||
<dim>12</dim> | ||
<dim>10</dim> | ||
<dim>24</dim> | ||
</port> | ||
</output> | ||
</layer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the final decision to have multiplication by
x
insideRMSNorm
? Why?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion I mentioned in the PR description is about having the
scale
inside or outside the formula.And I proposed to keep it optional for compatibility with existing GPU RMSNorm op.
Could you please precise, do you see other options for the RMSNorm formula?