delete Float8DynamicLinear #304

vkuzo · 2024-07-03T19:19:21Z

Stack from ghstack (oldest at bottom):

-> delete Float8DynamicLinear #304

Summary:

We are standardizing on Float8Linear as the only float8 linear object:

the stack ending with
[9/x]: make dynamic scaling default in Float8Linear #300 moved
all of the functionality of Float8DynamicLinear to Float8Linear.
The default settings of Float8Linear are to use dynamic scaling.
this PR deletes Float8DynamicLinear from the codebase and patches
the relevant callsites in fbsource.

Test Plan:

// all tests pass
./test_everything.sh

// also run all benchmarks and verify correctness

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D59342767

Summary: We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Test Plan: ``` // all tests pass ./test_everything.sh // also run all benchmarks and verify correctness ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Test Plan: ``` // all tests pass ./test_everything.sh // also run all benchmarks and verify correctness ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8ab483377124960fec2f133c0e27fbbaab204528 Pull Request resolved: #304

vkuzo · 2024-07-03T19:22:21Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

drisspg · 2024-07-03T19:24:28Z

benchmarks/bench_multi_gpu.py

@@ -14,7 +14,7 @@
 import torch.multiprocessing as mp
 import torch.nn as nn
 import torch.utils.benchmark as benchmark
-from float8_experimental.float8_linear import Float8Linear
+from float8_experimental.float8_linear import Float8Linear, TensorScalingType


How useful is this benchmark in general?

I haven't used it recently

float8_experimental/float8_dynamic_utils.py

drisspg · 2024-07-03T19:26:53Z

float8_experimental/float8_linear.py

-        # example: "x:del,w:del,dldy:dyn"
-        return f"x:{self.scaling_type_x.short_str()},w:{self.scaling_type_w.short_str()},dldy:{self.scaling_type_dL_dY.short_str()}"
+        # example: "x_del_w_del_dldy_dyn"
+        return f"x_{self.scaling_type_x.short_str()}_w_{self.scaling_type_w.short_str()}_dldy_{self.scaling_type_dL_dY.short_str()}"


Why the change out of curiosity? I think the prior version might be a little more readable

I should have reverted this. Will follow-up in a future PR if that's ok, to make landing this PR easier.

drisspg · 2024-07-03T19:32:26Z

test/test_compile.py

@@ -48,8 +45,12 @@ def _test_compile_base(
    x = torch.randn(*x_shape, device="cuda", dtype=linear_dtype)
    m_ref = nn.Linear(16, 32, bias=True, device="cuda", dtype=linear_dtype)

-    m_fp8 = get_float8_linear(
-        linear_type, m_ref, emulate, scaling_type_x, scaling_type_w, scaling_type_dL_dY
+    m_fp8 = Float8Linear.from_float(


calling 'swap_..' on nn.Linear module returns a model out of place. I think its fine either way

I agree, we can make the tests use that if we want in a future PR.

drisspg · 2024-07-03T19:33:43Z

test/test_dtensor.py

-            "scaling_type_dL_dY": TensorScalingType.DYNAMIC,
-        }
+    # For now, just use Float8Linear with dynamic scaling, which is the
+    # same behavior as Float8Linear.


Float8Dynamic ? But also its probably to to just say, only supports dynamic scaling for all 3 tensors, x, w, dl_dY

agreed, let me fix in a future PR to speed up landing this, since this is a minor point.

drisspg · 2024-07-03T19:34:29Z

test/test_fsdp2/test_fsdp2_common.py

@@ -29,8 +27,7 @@ def check_parity_no_mp(
                for param in model.parameters():
                    dist.all_reduce(param.grad)
                    param.grad.div_(dist.get_world_size())
-            if module_cls is Float8Linear:
-                sync_float8_amax_and_scale_history(model)
+            # TODO(future): add amax syncing once delayed scaling is supported


was this just an unused code path?

drisspg · 2024-07-03T19:35:12Z

test/test_fsdp2/test_fsdp2_eager.py

-            return swap_linear_with_float8_linear(module, Float8Linear, **kwargs)
-        else:
-            return swap_linear_with_float8_linear(module, Float8DynamicLinear, **kwargs)
+    def swap_linear_with_dynamic(self, module: nn.Module, **kwargs: Any) -> nn.Module:


can we just remove this since this is the default?

agreed in principle, but ideally that would be a separate PR since it's only tangentially related

drisspg

Burn it with fire!🔥

facebook-github-bot · 2024-07-05T18:04:18Z

This pull request has been merged in 8e9623a.

Summary: Addressing a couple of nits that slipped in #304 * more defaults to dynamic * undo repr change * fix comment Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Addressing a couple of nits that slipped in #304 * more defaults to dynamic * undo repr change * fix comment Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 49448ecbf3ad15087783f97dcdda278fe4f42d41 Pull Request resolved: #308

Summary: Pull Request resolved: #308 Addressing a couple of nits that slipped in #304 * more defaults to dynamic * undo repr change * fix comment Reviewed By: drisspg Differential Revision: D59521233 fbshipit-source-id: 5f69855cc2d19c6057a230b0963185c4396dcd99

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2024

vkuzo requested review from bdhirsh, drisspg and weifengpy July 3, 2024 19:20

drisspg reviewed Jul 3, 2024

View reviewed changes

float8_experimental/float8_dynamic_utils.py Show resolved Hide resolved

drisspg reviewed Jul 3, 2024

View reviewed changes

drisspg approved these changes Jul 3, 2024

View reviewed changes

facebook-github-bot closed this in 8e9623a Jul 5, 2024

facebook-github-bot added the Merged label Jul 5, 2024

vkuzo mentioned this pull request Jul 8, 2024

fix nits from deletion of Float8DynamicLinear #308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delete Float8DynamicLinear #304

delete Float8DynamicLinear #304

vkuzo commented Jul 3, 2024 •

edited

Loading

vkuzo commented Jul 3, 2024

drisspg Jul 3, 2024

vkuzo Jul 3, 2024

drisspg Jul 3, 2024

vkuzo Jul 3, 2024

drisspg Jul 3, 2024

vkuzo Jul 3, 2024

drisspg Jul 3, 2024

vkuzo Jul 3, 2024

drisspg Jul 3, 2024

vkuzo Jul 3, 2024

drisspg Jul 3, 2024

vkuzo Jul 3, 2024

drisspg left a comment

facebook-github-bot commented Jul 5, 2024

delete Float8DynamicLinear #304

delete Float8DynamicLinear #304

Conversation

vkuzo commented Jul 3, 2024 • edited Loading

vkuzo commented Jul 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drisspg left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 5, 2024

vkuzo commented Jul 3, 2024 •

edited

Loading