Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] How to use deepspeed.checkpointing.non_reentrant_checkpoint() properly with Stage3? #4595

Open
MetaBlues opened this issue Nov 1, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@MetaBlues
Copy link

MetaBlues commented Nov 1, 2023

See #4332.

After diffusers 0.17.0, non-reentrant variant of torch.utils.checkpoint.checkpoint becomes default but is incompatible with Deepspeed Stage3.

I found #4118, then replaced torch.utils.checkpoint.checkpoint with deepspeed.checkpointing.non_reentrant_checkpoint before import diffusers but still met an error said "RuntimeError: The size of tensor a (0) must match the size of tensor b (1280) at non-singleton dimension 1".

Maybe I use non_reentrant_checkpoint improperly. Any suggestions for me to make non-reentrant checkpoint compatible with stage3?

cc @hughpu

@MetaBlues MetaBlues added the enhancement New feature or request label Nov 1, 2023
@mumianyuxin
Copy link

I met the same error. Does anybody have a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants