[1/x]: Make Float8Linear support dynamic scaling #290

vkuzo · 2024-06-28T19:37:42Z

Stack from ghstack (oldest at bottom):

Summary:

At a high level, we need to make dynamic vs delayed scaling configurable
separately for activations, weights and gradients. The way I am
approaching this is as follows:

PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to Float8Linear
PRs 2..n: one by one, add features implemented in Float8DynamicLinear to Float8Linear, as necessary
last PR: delete Float8DynamicLinear

Test Plan:

./test/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D59305792

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7020d5e88e29710c77baaf8c95ff1d859c91bebd Pull Request resolved: #290

test/test_base.py

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7020d5e88e29710c77baaf8c95ff1d859c91bebd Pull Request resolved: #290

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0aeb524df8b1b693f0538a85c898d24f3f347081 Pull Request resolved: #290

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: bb1bfe5234f6b654bc8e52d48a158d8b1b3bd616 Pull Request resolved: #290

float8_experimental/float8_linear.py

drisspg · 2024-06-28T20:19:05Z

float8_experimental/float8_linear_utils.py

    """Returns whether the given linear_type requires sync before forward."""
-    return linear_type in REQUIRES_SYNC
+    return linear_type is LinearType.DELAYED and any(


should this be or?

since Float8DynamicLinear does not support TensorScalingType, I think and is right?

you are right 👍

drisspg · 2024-06-28T20:20:52Z

float8_experimental/float8_linear_utils.py

-        emulate=emulate,
-    )
+    if linear_type is LinearType.DYNAMIC:
+        return Float8DynamicLinear.from_float(


should we assert that non of the scaling types are True?

technically yes, I was just lazy and unmotivated since this stack is trying to delete Float8DynamicLinear

drisspg · 2024-06-28T20:23:40Z

float8_experimental/float8_linear_utils.py

-def linear_requires_sync(linear_type: LinearType):
+def linear_requires_sync(
+    linear_type: LinearType,
+    scaling_type_x: TensorScalingType = TensorScalingType.DELAYED,


Nit: we should probably remove the defaults right?

hmm, long term probably yes. I'm taking the approach of tackling the "what's the default" recipe separately and not changing default behavior for now.

drisspg

Seems good to me

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

vkuzo · 2024-07-02T23:58:06Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-07-03T14:34:00Z

This pull request has been merged in 3cb42e1.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2024

vkuzo commented Jun 28, 2024

View reviewed changes

test/test_base.py Show resolved Hide resolved

drisspg reviewed Jun 28, 2024

View reviewed changes

float8_experimental/float8_linear.py Show resolved Hide resolved

drisspg reviewed Jun 28, 2024

View reviewed changes

drisspg approved these changes Jun 28, 2024

View reviewed changes

vkuzo mentioned this pull request Jun 28, 2024

[2/x]: fix numerics integration test and test delayed vs dynamic #291

Closed

facebook-github-bot closed this in 3cb42e1 Jul 3, 2024

facebook-github-bot added the Merged label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/x]: Make Float8Linear support dynamic scaling #290

[1/x]: Make Float8Linear support dynamic scaling #290

vkuzo commented Jun 28, 2024 •

edited

Loading

drisspg Jun 28, 2024

vkuzo Jun 28, 2024

drisspg Jun 28, 2024

drisspg Jun 28, 2024

vkuzo Jun 28, 2024

drisspg Jun 28, 2024

vkuzo Jun 28, 2024

drisspg left a comment

vkuzo commented Jul 2, 2024

facebook-github-bot commented Jul 3, 2024

[1/x]: Make Float8Linear support dynamic scaling #290

[1/x]: Make Float8Linear support dynamic scaling #290

Conversation

vkuzo commented Jun 28, 2024 • edited Loading

drisspg Jun 28, 2024

Choose a reason for hiding this comment

vkuzo Jun 28, 2024

Choose a reason for hiding this comment

drisspg Jun 28, 2024

Choose a reason for hiding this comment

drisspg Jun 28, 2024

Choose a reason for hiding this comment

vkuzo Jun 28, 2024

Choose a reason for hiding this comment

drisspg Jun 28, 2024

Choose a reason for hiding this comment

vkuzo Jun 28, 2024

Choose a reason for hiding this comment

drisspg left a comment

Choose a reason for hiding this comment

vkuzo commented Jul 2, 2024

facebook-github-bot commented Jul 3, 2024

vkuzo commented Jun 28, 2024 •

edited

Loading