Add dtype, fix RMS norm for FP16 #8641

metascroy · 2025-02-23T23:30:41Z

Llama1B quality in CoreML is bad due to FP16 arithmetic. Here is a sample of generated text:

"Once upon a time,we had our way,as we navigated,through the vast expanse of our understanding,as we journeyed,through the treacherous terrain of our experiences,as we faced our fears,as we stood tall,as we confronted our demons,as we emerged victorious,as we transcend our limits,as we ascend,as we unite,as we become,as we lose ourselves,as we search,as we remember,as we come back,as we return,as we find myself,as I am,as I am not,as I stand tall,as I hear my voice,as I remember,as I forgive,as I am, as I am, as I become, as I lose myself, as I find myself, as I remember, as I come back, as I return, as I find myself, as I am, as I am not, as I stand tall, as I Hear My Voice, as I Remember, as I Find Myself, as I am, as I Become, as I Lose Myself, as I Find Myself, asных, as I Become, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose Myself, as I Lose My"

The corresponding FP16 eager mode model has much better generated text:

"Once upon a time, in a small village nestled between two great mountains, there lived a young girl named Akira. She was a curious and adventurous soul, with a heart full of wonder and a mind full of questions. Akira lived with her grandmother, a wise and kind woman named Kana, who taught her the ways of the world and the secrets of the universe.

One day, while exploring the village, Akira stumbled upon a mysterious shop tucked away in a quiet alley. The sign above the door read "Moonlit Curios and Antiques," and the windows were filled with a dazzling array of strange and exotic objects. Akira felt an inexplicable pull towards the shop, as if the very fabric of the universe was calling to her."

The discrepancy is that the eager mode model actually computes the RMSNorm in FP32 due to a cast operation (which CoreML appears to ignore):

self._norm(x.float()).type_as(x)

Moreover, the norm computation appears unstable in FP16 and gives bad results. We can improve the numeric quality of the norm in FP16 by first dividing x by its maximum absolute value. Here is the generated text from CoreML in FP16 after this change:

"Once upon a time, in a small village nestled in the rolling hills of Provence, there lived a young girl named Colette. Colette was a curious and adventurous soul, with a heart full of wonder and a mind full of questions. She spent most of her days exploring the village, visiting the local market, and listening to the tales of the old villagers.

One day, while wandering through the village, Colette stumbled upon a small, mysterious shop tucked away on a quiet street. The sign above the door read "Curios and Wonders," and the windows were filled with a dazzling array of strange and exotic objects. Colette's curiosity was piqued, and she pushed open the door to reveal a dimly lit interior filled with the scent of old books and dust."

Note, for 4-bit channelwise quantization, the results do not look good even after this change. The ideal solution is to do QAT for llama1B with 4-bit channelwise quantization + FP16 arithmetic.

pytorch-bot · 2025-02-23T23:30:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8641

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit e87347d with merge base 366d87e ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
pull / unittest / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-02-23T23:31:27Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

metascroy · 2025-02-23T23:33:28Z

@YifanShenSZ It is probably a bug for CoreML to ignore the cast. It was presumably added because FP16 arithmetic was not sufficient enough.

Add dtype, fix RMS norm for FP16

1652a15

metascroy requested review from cccclai and cymbalrush as code owners February 23, 2025 23:30

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 23, 2025

metascroy requested review from YifanShenSZ, kimishpatel and YIWENX14 February 23, 2025 23:31

up

e87347d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dtype, fix RMS norm for FP16 #8641

Add dtype, fix RMS norm for FP16 #8641

metascroy commented Feb 23, 2025 •

edited

Loading

pytorch-bot bot commented Feb 23, 2025 •

edited

Loading

github-actions bot commented Feb 23, 2025

metascroy commented Feb 23, 2025

Add dtype, fix RMS norm for FP16 #8641

Are you sure you want to change the base?

Add dtype, fix RMS norm for FP16 #8641

Conversation

metascroy commented Feb 23, 2025 • edited Loading

pytorch-bot bot commented Feb 23, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8641

❌ 2 New Failures

github-actions bot commented Feb 23, 2025

This PR needs a release notes: label

metascroy commented Feb 23, 2025

metascroy commented Feb 23, 2025 •

edited

Loading

pytorch-bot bot commented Feb 23, 2025 •

edited

Loading

This PR needs a `release notes:` label