Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix load_state_dict for xpu and refine xpu safetensor version check #2879

Merged
merged 4 commits into from
Jul 3, 2024

Conversation

faaany
Copy link
Contributor

@faaany faaany commented Jun 21, 2024

What does this PR do?

  • enable test_load_state_dict on xpu, because torch.device(0) by default is cuda
  • add xpu support in load_state_dict when device_map's values are identical integers
  • refine the xpu safetensor version check to make code clean and compact

src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved
src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved
src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved
@faaany faaany marked this pull request as ready for review June 21, 2024 06:25
@faaany
Copy link
Contributor Author

faaany commented Jun 26, 2024

@SunMarc this PR is ready for review. Could you help review it? Thx a lot!

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks for fixing @faaany !

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@muellerzr
Copy link
Collaborator

@faaany small merge conflict (the joys of day-of-release-merging-post-OOO), if you can fix that I'll get this in and it'll be part of the release today 🚀

@faaany
Copy link
Contributor Author

faaany commented Jul 3, 2024

@faaany small merge conflict (the joys of day-of-release-merging-post-OOO), if you can fix that I'll get this in and it'll be part of the release today 🚀

awesome, thx! conflict resolved.

@muellerzr muellerzr merged commit 92404fb into huggingface:main Jul 3, 2024
25 checks passed
Comment on lines +751 to +754
expected_device = (
torch.device(f"{torch_device}:{device}") if isinstance(device, int) else torch.device(device)
)
assert loaded_state_dict[param].device == expected_device
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this breaks CUDA tests because on CUDA we end up with cuda:0:0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@muellerzr
Copy link
Collaborator

Please check out the new failure on main @faaany, thanks! :) (Test is failing bc of the aforementioned note above)

@faaany faaany deleted the state-dict branch November 4, 2024 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants