-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP] Wrapping model again in FSDP doesn't contain root parameters #648
Comments
Thanks, @SeanNaren! What's the motivation to wrap it again for inference? Can you use the same (wrapped) model for it? I must be missing something obvious. |
Thanks @min-xu-ai Lighting context will help decide if this is something we should look into fixing in Lightning! Currently we treat Using the same wrapped model would break this independency; we'd have to keep a reference to the wrapped model, ensuring all stages use this. This may have lots of unintended side effects in Lightning! However if it is deemed difficult to provide a un-flattened version that can be wrapped with FSDP again, this might be a required change to Lightning to support this case of keeping a reference. |
I see. I think the decoupling makes sense. It is very interesting that DDP in this case wrap the model twice:
I am surprised that this doesn't cause problems. Maybe it is because your second wrap is for inference only and DDP is basically a no-op in that case? I need think more about what's the proper thing do this in this case TBH. |
@SeanNaren, maybe this is related to: #649? What if we have the following:
but
Will the above be a good solution for your use cases? |
Thanks @min-xu-ai! Unfortunately this would still be an issue for Lightning as we reference the model that is stored within the FSDP wrapper. If we had access to the model which has been wrapped with FSDP initially there would be no issues! I'm starting to suspect this is less of something FairScale should try to support as you mentioned for DDP, and a limitation in Lightning we need to address. Our current workaround is to only support user wrapping, rather than lightning automatically wrapping the module for you for now. @shuyingsunshine21 is working on this, and may be a stop gap as we discuss solutions here (but is blocked by #658 atm). |
In case you guys have update on your side, please share. @myleott, @QuentinDuval, check out comment #648 (comment) if you haven't. Do you think it makes sense for FSDP to wrap another FSDP instance? |
hey guys! any update on this? With PyTorch FSDP it's now possible to check if the raw model has already been wrapped or not so that we can avoid this, but with fairscale FSDP it's not. |
Any reason not using pytorch FSDP? fairscale FSDP is in maintenance mode is unlikely to be changed much. |
It's just that in pytorch-lightning, we support both.
so I am wondering, if new/experimental features will be added to fairscale and then upstreamed to PyTorch or new/experimental features will be added to PyTorch itself? |
new/experimental ones will be added to fairscale if pytorch team is not sure whether or not those new feature may end up being useful for general user base or not. If they think a new feature will be useful, I don't see why not add to it directly. |
then, we might need to keep the fairscale FSDP support in pytorch lightning, so is it possible to resolve this issue? |
🐛 Bug
Related Lightning-AI/pytorch-lightning#6152
When wrapping the module twice in FSDP, because we introduce a
FlattenParamsWrapper
that contains all the parameters, this means the second wrapping does not contain the parameters for the model. This is required for Lightning where we wrap theLightningModule
in training, and return thisLightningModule
back to the user who may calltrainer.test(module)
independently.This potentially could be solved by removing
FlattenParamsWrapper
and returning sharded weights back to the correct references permanently after training. Is this doable?I'm not 100% on a solution here (even from the Lightning side) so if you have ideas please let me know!
To Reproduce
Expected behavior
Able to wrap the model in an FSDP wrapper again after model is trained.
cc @ananthsub @min-xu-ai @shuyingsunshine21
The text was updated successfully, but these errors were encountered: