You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Essentially, we shouldn't have to merge sharded checkpoints even if it's quantized.
This will also allow us to more generally use keep_module_in_fp32 for sharded checkpoints. Currently, we have this logic for casting a model (which is tested thoroughly):
As discussed with @SunMarc, it's better to uniformize this so that we don't have to maintain two different codepaths and rely completely on load_model_dict_into_meta(). Marc has kindly agreed to open a PR to attempt this (this could be done in a series of PRs if needed). But I will join if any help is needed.
The text was updated successfully, but these errors were encountered:
Currently, we have got two codepaths:
diffusers/src/diffusers/models/modeling_utils.py
Line 855 in 047bf49
diffusers/src/diffusers/models/modeling_utils.py
Line 886 in 047bf49
And then for the (bnb) quantized checkpoints, we merge a sharded checkpoint:
diffusers/src/diffusers/models/modeling_utils.py
Line 775 in 047bf49
Essentially, we shouldn't have to merge sharded checkpoints even if it's quantized.
This will also allow us to more generally use
keep_module_in_fp32
for sharded checkpoints. Currently, we have this logic for casting a model (which is tested thoroughly):diffusers/src/diffusers/models/modeling_utils.py
Line 997 in 43534a8
When using
load_model_dict_into_meta()
, we do considerkeep_module_in_fp32
:diffusers/src/diffusers/models/model_loading_utils.py
Line 177 in 43534a8
But since for sharded checkpoints, we use
load_checkpoint_and_dispatch()
, there is no way to passkeep_module_in_fp32
:https://huggingface.co/docs/accelerate/main/en/package_reference/big_modeling#accelerate.load_checkpoint_and_dispatch
As discussed with @SunMarc, it's better to uniformize this so that we don't have to maintain two different codepaths and rely completely on
load_model_dict_into_meta()
. Marc has kindly agreed to open a PR to attempt this (this could be done in a series of PRs if needed). But I will join if any help is needed.The text was updated successfully, but these errors were encountered: