[TPU] Call torch._sync(param) during weight loading #9437

WoosukKwon · 2024-10-17T00:10:28Z

During weight loading, we often do something like:

narrowed_tensor = param.data.narrow(0, offset, len)
narrowed_tensor.copy_(real_weight)

expecting narrowed_tensor and param.data to share the same storage. However, on TPUs, narrowed_tensor will lazily propagate to the base tensor, which is param.data, leading to the redundant memory usage. This sometimes causes OOM errors during model loading.

This PR address this problem by adding a post-hook to call torch._sync(param) after the weight loader of each param is called.

When loading Llama3-8B (bf16) on v5e-8,

Before this PR: 3.4 GB allocated after weight loading
After this PR: 2.0 GB allocated after weight loading

github-actions · 2024-10-17T00:10:40Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

WoosukKwon · 2024-10-17T00:11:21Z

Thanks @JackCaoG for finding out the bug and providing the solution.

JackCaoG · 2024-10-17T00:19:46Z

vllm/model_executor/utils.py

@@ -28,4 +29,22 @@ def set_weight_attrs(
    for key, value in weight_attrs.items():
        assert not hasattr(
            weight, key), (f"Overwriting existing tensor attribute: {key}")
+
+        # NOTE(woosuk): For TPU, param.data.copy_(weight) happens lazily,


to be more accurate this is because in VLLM we do

narrowed_tensor = param.data.narrow(0, offset, len) narrowed_tensor.copy_(real_weight)

narrowed_tensor and param.data share the same storage. With functionization, the in place update on the narrowed_tensor will lazily propagate to the base tensor which is param.data.

Thanks for the elaboration. Fixed the comment!

JackCaoG · 2024-10-17T00:19:59Z

lgtm

mgoin

Thanks for referencing the CT issue, LGTM!

WoosukKwon added 2 commits October 16, 2024 23:57

[TPU] Ensure torch._sync(param) is called after param.data.copy_()

bb7c741

yapf

cf842bd

WoosukKwon added the tpu Related to Google TPUs label Oct 17, 2024

JackCaoG reviewed Oct 17, 2024

View reviewed changes

JackCaoG approved these changes Oct 17, 2024

View reviewed changes

This was referenced Oct 17, 2024

[Quantization][TPU] compressed-tensors integration for TPU #9301

Open

[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA #9438

Open

WoosukKwon changed the title ~~[TPU] Ensure torch._sync(param) is called after param.data.copy_()~~ [TPU] Call torch._sync(param) during weight loading Oct 17, 2024

Update comment

f5d8d91

mgoin approved these changes Oct 17, 2024

View reviewed changes

WoosukKwon merged commit 8e1cddc into main Oct 17, 2024
30 checks passed

WoosukKwon deleted the tpu-sync branch October 17, 2024 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU] Call torch._sync(param) during weight loading #9437

[TPU] Call torch._sync(param) during weight loading #9437

WoosukKwon commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Oct 17, 2024

WoosukKwon commented Oct 17, 2024

JackCaoG Oct 17, 2024

WoosukKwon Oct 17, 2024

JackCaoG commented Oct 17, 2024

mgoin left a comment

[TPU] Call torch._sync(param) during weight loading #9437

[TPU] Call torch._sync(param) during weight loading #9437

Conversation

WoosukKwon commented Oct 17, 2024 • edited Loading

github-actions bot commented Oct 17, 2024

WoosukKwon commented Oct 17, 2024

JackCaoG Oct 17, 2024

Choose a reason for hiding this comment

WoosukKwon Oct 17, 2024

Choose a reason for hiding this comment

JackCaoG commented Oct 17, 2024

mgoin left a comment

Choose a reason for hiding this comment

WoosukKwon commented Oct 17, 2024 •

edited

Loading