-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use MimiModel
with DeepSpeed ZeRO-3
#34735
Conversation
40e26ae
to
ee825e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! thanks for opening this!
Let's push an empty commit to run slow tests before merging:
`git commit --allow-empty -m "[run-slow] mimi"
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ee825e1
to
f60096b
Compare
@ylacombe all done! |
@ylacombe 1 test is failing ( |
@ylacombe ping |
d0d5bbf
to
afa9547
Compare
afa9547
to
f180dc3
Compare
Hey @anferico, the failing slow test is unrelated indeed (see #35696). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks
@eustlb yes I can confirm it fixes the issue, meaning I can use DeepSpeed ZeRO-3 in conjunction with MimiModel with the changes introduced in this PR |
@eustlb just updated the PR description too! |
Great! thanks a lot 🤗 |
Reverted because it caused a failing test on |
Found the issue, we need to keep the registered buffer to be a - self.register_buffer("initialized", torch.Tensor([True]))
+ self.register_buffer("initialized", torch.tensor([True], dtype=torch.float32)) ? |
use torch.tensor(), not torch.Tensor() Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…#35755) Revert "Unable to use `MimiModel` with DeepSpeed ZeRO-3 (huggingface#34735)" This reverts commit 54fd7e9.
use torch.tensor(), not torch.Tensor() Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…#35755) Revert "Unable to use `MimiModel` with DeepSpeed ZeRO-3 (huggingface#34735)" This reverts commit 54fd7e9.
What does this PR do?
Allow using
MimiModel
with DeepSpeed ZeRO-3.Fixes the following error:
Explanation: to begin with, the use of
torch.Tensor(...)
is discouraged/deprecated (cf. https://pytorch.org/docs/stable/tensors.html#torch.Tensor) and should be dropped in favor oftorch.tensor(...)
. Secondly, here is an excerpt of DeepSpeed's source code (deepspeed/runtime/zero/partition_parameters.py
) that explains why we get this error:as can be seen, DeepSpeed ZeRO-3 patches
Tensor.__new__
withget_new_tensor_fn_for_dtype(self.dtype)
which, if called with[True]
as an argument, raises an error because such argument is passed directly totorch.Tensor.new_empty
, which expects asize
as its first argument. On the contrary,torch.tensor
is patched with the correct function (zero_wrapper_for_fp_tensor_constructor(_orig_torch_tensor, self.dtype)
) and won't throw any error when passed[True]
as an argument.Reproducing the error
mimi_mre.sh
:mimi_mre.py
:zero3.json
:Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ylacombe, @eustlb