-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(activation_checkpointing): add non_reentrant_checkpoint
to support inputs require no grad
#4118
feat(activation_checkpointing): add non_reentrant_checkpoint
to support inputs require no grad
#4118
Conversation
…af forward tensor refs
* Pass correct node size * formatting --------- Co-authored-by: Connor Holmes <development@cmikeh2.me> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* add deepspeed chat arxiv report * add zeroquant v2 and fp * add selective enhencement * add ignore for 'Youn' in spell checker --------- Co-authored-by: yaozhewei <zheweiy@berkeley.edu> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
…use and add regression tests
@microsoft-github-policy-service agree |
@tjruwase @mrwyattii may I know is there any possibility that this PR can get some suggestion? since it's already pending for 2 weeks without any feedback.
|
@hughpu, apologies for the delay. We will review asap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hughpu for this great PR!
hi @tjruwase, it seems the merging is stopped by some http error raised from huggingface, would you mind to merge it again? |
hi @tjruwase , shall we move forward to merge this PR? feel free to let me know if there is anything that I can do to facilitate this. |
@hughpu, apologies for the delay. It is now queued for merging. |
The added function is union of
torch.utils.checkpoint._checkpoint_without_reentrant
andCheckpointFunction
incheckpointing
module._checkpoint_without_reentrant
has already been implemented in pytorch for a while, the solution is stable at most time except for jit script module.deepspeed.runtime.pipe.module.PipelineModule._is_checkpointable
activation_checkpoint_func
without specifingcheckpointable_layers