fix `load_state_dict` for xpu and refine xpu safetensor version check #2879

faaany · 2024-06-21T02:54:33Z

What does this PR do?

enable test_load_state_dict on xpu, because torch.device(0) by default is cuda
add xpu support in load_state_dict when device_map's values are identical integers
refine the xpu safetensor version check to make code clean and compact

src/accelerate/utils/modeling.py

faaany · 2024-06-26T06:50:54Z

@SunMarc this PR is ready for review. Could you help review it? Thx a lot!

SunMarc

LGTM ! Thanks for fixing @faaany !

muellerzr

Nice!

HuggingFaceDocBuilderDev · 2024-07-03T08:12:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr · 2024-07-03T08:51:06Z

@faaany small merge conflict (the joys of day-of-release-merging-post-OOO), if you can fix that I'll get this in and it'll be part of the release today 🚀

faaany · 2024-07-03T08:59:31Z

@faaany small merge conflict (the joys of day-of-release-merging-post-OOO), if you can fix that I'll get this in and it'll be part of the release today 🚀

awesome, thx! conflict resolved.

muellerzr · 2024-07-03T11:17:40Z

tests/test_modeling_utils.py

+                expected_device = (
+                    torch.device(f"{torch_device}:{device}") if isinstance(device, int) else torch.device(device)
+                )
+                assert loaded_state_dict[param].device == expected_device


FYI this breaks CUDA tests because on CUDA we end up with cuda:0:0

muellerzr · 2024-07-03T11:18:21Z

Please check out the new failure on main @faaany, thanks! :) (Test is failing bc of the aforementioned note above)

add fix

56b2f84

yao-matrix reviewed Jun 21, 2024

View reviewed changes

src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved

src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved

src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved

faaany added 2 commits June 20, 2024 20:24

update warning

bef7760

no and

684c4c6

faaany marked this pull request as ready for review June 21, 2024 06:25

SunMarc approved these changes Jun 26, 2024

View reviewed changes

muellerzr approved these changes Jul 3, 2024

View reviewed changes

Merge branch 'main' into state-dict

3ef7154

muellerzr merged commit 92404fb into huggingface:main Jul 3, 2024
25 checks passed

muellerzr reviewed Jul 3, 2024

View reviewed changes

faaany deleted the state-dict branch November 4, 2024 06:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix `load_state_dict` for xpu and refine xpu safetensor version check #2879

fix `load_state_dict` for xpu and refine xpu safetensor version check #2879

faaany commented Jun 21, 2024

faaany commented Jun 26, 2024

SunMarc left a comment

muellerzr left a comment

HuggingFaceDocBuilderDev commented Jul 3, 2024

muellerzr commented Jul 3, 2024

faaany commented Jul 3, 2024

muellerzr Jul 3, 2024

SunMarc Jul 3, 2024

muellerzr commented Jul 3, 2024

fix load_state_dict for xpu and refine xpu safetensor version check #2879

fix load_state_dict for xpu and refine xpu safetensor version check #2879

Conversation

faaany commented Jun 21, 2024

What does this PR do?

faaany commented Jun 26, 2024

SunMarc left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 3, 2024

muellerzr commented Jul 3, 2024

faaany commented Jul 3, 2024

muellerzr Jul 3, 2024

Choose a reason for hiding this comment

SunMarc Jul 3, 2024

Choose a reason for hiding this comment

muellerzr commented Jul 3, 2024

fix `load_state_dict` for xpu and refine xpu safetensor version check #2879

fix `load_state_dict` for xpu and refine xpu safetensor version check #2879