Add fp8 support moe models #2928

mht-sharma · 2025-01-20T13:59:33Z

What does this PR do?

As per title!

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

danieldk

Nice! Some small comments.

danieldk · 2025-01-24T09:57:49Z

server/text_generation_server/layers/fp8.py

@@ -63,7 +63,7 @@ def normalize_e4m3fn_to_e4m3fnuz(
    weight_scale: torch.Tensor,
    input_scale: Optional[torch.Tensor] = None,
 ) -> Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]:
-    if weight.dtype == torch.float8_e4m3fn:
+    if weight.dtype == torch.float8_e4m3fn and SYSTEM == "rocm":


The function would now not normalize on SYSTEM != "rocm" even if the data type is float8_e4m3fn. I think either the function should be renamed to normalize_e4m3fn_to_native_float8 or this condition should not be there (and do the conversion regardless SYSTEM).

Done renamed the function

danieldk · 2025-01-24T10:01:52Z

server/text_generation_server/layers/moe/__init__.py

        if (
            isinstance(weights.loader, DefaultWeightsLoader)
            and isinstance(weights.loader.weight_class, UnquantizedWeight)
        ) or isinstance(weights.loader, HybridFP8UnquantLoader):
-            cls = UnquantizedSparseMoELayer
+            if (
+                isinstance(weights.loader, HybridFP8UnquantLoader)
+                and weights.loader.to_fp8
+            ):
+                cls = FP8SparseMoELayer
+            else:
+                cls = UnquantizedSparseMoELayer


I think it would be better to flatten this now. Something like:

if isinstance(weights.loader, DefaultWeightsLoader) and isinstance(weights.loader.weight_class, UnquantizedWeight): cls = UnquantizedSparseMoELayer elif isinstance(weights.loader, HybridFP8UnquantLoader): cls = FP8SparseMoELayer elif #...

I flattened it, but the condition remains the same, because we always use the HybridFP8UnquantLoader to load the weigths.
https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/quantization.py#L202

danieldk · 2025-01-24T10:06:07Z

server/text_generation_server/layers/moe/fp8.py

+
+        if weight.weight.dtype in {torch.float8_e4m3fn, torch.float8_e4m3fnuz}:
+            all_weight[i], all_weight_scales[i], current_input_scale = (
+                normalize_e4m3fn_to_e4m3fnuz(


Ok, I see why the condition was added, more and more in favor of renaming this to normalize_e4m3fn_to_native_float8.

Add fp8 support moe models

1616260

mht-sharma requested review from danieldk and Narsil January 20, 2025 13:59

danieldk reviewed Jan 24, 2025

View reviewed changes

flatten condition

7be2a5f

mht-sharma requested a review from danieldk January 29, 2025 11:16

mht-sharma mentioned this pull request Jan 29, 2025

Add deepseekv3/R1 #2963

Closed

6 tasks

danieldk approved these changes Jan 29, 2025

View reviewed changes

danieldk merged commit 4ef2e04 into main Jan 29, 2025
18 of 19 checks passed

danieldk deleted the add_moe_fp8 branch January 29, 2025 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fp8 support moe models #2928

Add fp8 support moe models #2928

mht-sharma commented Jan 20, 2025

danieldk left a comment

danieldk Jan 24, 2025

mht-sharma Jan 29, 2025

danieldk Jan 24, 2025

mht-sharma Jan 29, 2025

danieldk Jan 24, 2025

mht-sharma Jan 29, 2025

Add fp8 support moe models #2928

Add fp8 support moe models #2928

Conversation

mht-sharma commented Jan 20, 2025

What does this PR do?

Before submitting

danieldk left a comment

Choose a reason for hiding this comment

danieldk Jan 24, 2025

Choose a reason for hiding this comment

mht-sharma Jan 29, 2025

Choose a reason for hiding this comment

danieldk Jan 24, 2025

Choose a reason for hiding this comment

mht-sharma Jan 29, 2025

Choose a reason for hiding this comment

danieldk Jan 24, 2025

Choose a reason for hiding this comment

mht-sharma Jan 29, 2025

Choose a reason for hiding this comment