Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Op][Spec] RMSNorm Operator Specification #23569

Closed

Conversation

mitruska
Copy link
Contributor

@mitruska mitruska commented Mar 20, 2024

Details:

  • RMSNorm Operator Specification

To be discussed:

  • Scale input - optional or outside the formula - proposed as optional input to comply with existing GPU RMSNorm op
  • Axes as input - vector or scalar, input or attribute - proposed as axes 1D/scalar input
  • compute_type - precision for the internal computation and accumulation (usually f32 for better results on lower precisions), inside the op or outside and implemented by Convert - proposed as attribute to comply with existing GPU RMSNorm (output_type)

Related GPU kernel and fusion transformation.

Related PRs:

Tickets:

  • 134914, dicsussion 129027

@mitruska mitruska requested a review from a team as a code owner March 20, 2024 11:17
@mitruska mitruska requested review from zKulesza and removed request for a team March 20, 2024 11:17
@mitruska mitruska self-assigned this Mar 20, 2024
@github-actions github-actions bot added the category: docs OpenVINO documentation label Mar 20, 2024
@mitruska mitruska added category: Opset OpenVINO Opset and removed category: docs OpenVINO documentation labels Mar 20, 2024
@github-actions github-actions bot added the category: docs OpenVINO documentation label Mar 20, 2024
* *compute_type*

* **Description**: The precision for internal computation, before scaling.
* **Range of values**: Supported floating point type: "f32", "f16", ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any other types except fp16 and fp32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the models I've seen cast to f32, in general any type can be allowed to comply with Convert capabilities, but it can be not a real use case.

Comment on lines 23 to 30
(x / Sqrt(ReduceMean(x^2, axes) + eps))


- If the optional ``scale`` input is provided:

.. math::

(x / Sqrt(ReduceMean(x^2, axes) + eps)) * scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the final decision to have multiplication by x inside RMSNorm? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion I mentioned in the PR description is about having the scale inside or outside the formula.
And I proposed to keep it optional for compatibility with existing GPU RMSNorm op.

Could you please precise, do you see other options for the RMSNorm formula?

@mitruska mitruska requested a review from rkazants March 25, 2024 08:15
@mitruska mitruska added this to the 2024.2 milestone Mar 29, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 15, 2024
### Details:
 - RMSNorm op core class
- Registration in the opset and op check (conformance) test will be
added in the next PRs

Spec PR: 
- #23569

### Tickets:
 - 136261
alvoron pushed a commit to alvoron/openvino that referenced this pull request Apr 29, 2024
### Details:
 - RMSNorm op core class
- Registration in the opset and op check (conformance) test will be
added in the next PRs

Spec PR: 
- openvinotoolkit#23569

### Tickets:
 - 136261

This comment was marked as outdated.

@github-actions github-actions bot added the Stale label May 7, 2024
@mitruska mitruska removed the Stale label May 7, 2024
@mitruska
Copy link
Contributor Author

mitruska commented May 9, 2024

Ongoing discussion:

The "compute_type" attribute supposed to cover F16-->RMS(F32/BF16)-->F16 to enable fusing Converts at the beggining and at the end of the RMS subgraph.
But it doesn't cover patterns when the Convert is not the first op in the graph (current GPU case F32-->RMS(F32)-->F16), so the "output_type" attribute is needed when fusing only the final Convert (mentioned as important from the GPU performance perspective).

github-merge-queue bot pushed a commit that referenced this pull request May 9, 2024
### Details:
 - RMSNorm reference implementation
 - The `Scale` input is optional
 
** The conversion of the input/output type is not included (proposed as
`computation_type` - conversion logic can be handled by the Convert
operations (or the reference can be extended with the conversion logic
separately if agreed on the spec).
 
 Related PRs:
- Specification Proposal:
#23569

### Tickets:
 - 136262
@mitruska
Copy link
Contributor Author

Closing - Decided to keep RMS as internal operator for now (moved from gpu custom).
Based on this work, there is a separate PR with documentation of the existing internal::RMS.

@mitruska mitruska closed this May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation category: Opset OpenVINO Opset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants