[low-bit optim] Fix load state dict when device is different #1021

gau-nernst · 2024-10-05T02:09:05Z

In optim.load_state_dict(state_dict), if optim dtype != state_dict dtype, aten._to_copy.default is called. This PR simply implements this op and add appropriate tests.

Update: In PyTorch pre-2.4, calling .to(device, dtype) will not dispatch aten._to_copy.default when dtype is the same but device is different. Thus, I have to manually override .to() method instead. This is only done for PyTorch pre-2.4. FP8 is not affected since FP8 CUDA requires PyTorch 2.4 anyway. We can remove this hack once we drop 2.3 support.

pytorch-bot · 2024-10-05T02:09:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1021

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2c82fc6 with merge base 9e2a253 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bghira · 2024-10-05T13:54:32Z

test/prototype/test_low_bit_optim.py

+
+        for p1, p2 in zip(model.parameters(), model2.parameters()):
+            torch.testing.assert_close(p2, p1)
+
    @pytest.mark.skipif(bnb is None, reason="bitsandbytes is not availablle")


small typo, availablle -> available

Suggested change

@pytest.mark.skipif(bnb is None, reason="bitsandbytes is not availablle")

@pytest.mark.skipif(bnb is None, reason="bitsandbytes is not available")

bghira · 2024-10-05T13:55:41Z

torchao/prototype/low_bit_optim/adam.py

@@ -109,6 +109,7 @@ def step(self, closure=None):

 # this will work with any optim state tensor subclass that implements aten.lerp.Scalar and aten.copy_.default
 # and param tensor subclass that implements aten.add_.Tensor, and aten.addcdiv_.default
+# NOTE: should we cast inputs to FP32 to ensure computations are always in FP32?


i override a few methods to cast the input to the weight dtype because the Flux model occasionally upcasts things to fp32 inside layernorm. you are saying fp32 is more correct than bf16?

For optimizer step, internal calculations should be done in FP32 to ensure accurate results.

bghira · 2024-10-05T13:56:29Z

torchao/prototype/low_bit_optim/subclass_4bit.py

-        return float_data.view(self._shape).to(dtype)
+        if output_dtype is not None:
+            float_data = float_data.to(output_dtype)
+        return float_data.view(self._shape)


will we encounter non-contiguous tensor? if so, view cannot be used, reshape must be

No, unlikely we will have non-contiguous tensor here, because this handles internal data of the tensor subclass, not outputs of another layer in a model.

* fix serialization * fix pytorch 2.3 * fix typo * update note

* Add OPENAI_API_VERSION constant to routes * Add seed, temperature, max_tokens and system_fingerprint paramters to request/response (pytorch#1016)

fix serialization

7d87dba

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 5, 2024

gau-nernst requested a review from msaroufim October 5, 2024 02:11

fix pytorch 2.3

b58d728

bghira reviewed Oct 5, 2024

View reviewed changes

msaroufim approved these changes Oct 5, 2024

View reviewed changes

gau-nernst added 2 commits October 6, 2024 09:21

fix typo

ca53ce4

update note

2c82fc6

gau-nernst merged commit c187f87 into pytorch:main Oct 6, 2024
17 checks passed

gau-nernst deleted the optim_serialization branch October 6, 2024 09:21

gau-nernst mentioned this pull request Oct 10, 2024

optimizer.load_state_dict not working #1044

Closed

jainapurva pushed a commit that referenced this pull request Oct 15, 2024

[low-bit optim] Fix load state dict when device is different (#1021)

8d508ed

* fix serialization * fix pytorch 2.3 * fix typo * update note

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[low-bit optim] Fix load state dict when device is different #1021

[low-bit optim] Fix load state dict when device is different #1021

gau-nernst commented Oct 5, 2024 •

edited

Loading

pytorch-bot bot commented Oct 5, 2024 •

edited

Loading

bghira Oct 5, 2024

bghira Oct 5, 2024

gau-nernst Oct 6, 2024

bghira Oct 5, 2024

gau-nernst Oct 6, 2024

	@pytest.mark.skipif(bnb is None, reason="bitsandbytes is not availablle")
	@pytest.mark.skipif(bnb is None, reason="bitsandbytes is not available")

[low-bit optim] Fix load state dict when device is different #1021

[low-bit optim] Fix load state dict when device is different #1021

Conversation

gau-nernst commented Oct 5, 2024 • edited Loading

pytorch-bot bot commented Oct 5, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1021

✅ No Failures

bghira Oct 5, 2024

Choose a reason for hiding this comment

bghira Oct 5, 2024

Choose a reason for hiding this comment

gau-nernst Oct 6, 2024

Choose a reason for hiding this comment

bghira Oct 5, 2024

Choose a reason for hiding this comment

gau-nernst Oct 6, 2024

Choose a reason for hiding this comment

gau-nernst commented Oct 5, 2024 •

edited

Loading

pytorch-bot bot commented Oct 5, 2024 •

edited

Loading