[Core] refactor model loading #10013

sayakpaul · 2024-11-25T10:44:58Z

Currently, we have got two codepaths:

For non-sharded checkpoints we do:

diffusers/src/diffusers/models/modeling_utils.py

Line 855 in 047bf49

unexpected_keys = load_model_dict_into_meta(
For sharded checkpoints we do:

diffusers/src/diffusers/models/modeling_utils.py

Line 886 in 047bf49

accelerate.load_checkpoint_and_dispatch(

And then for the (bnb) quantized checkpoints, we merge a sharded checkpoint:

diffusers/src/diffusers/models/modeling_utils.py

Line 775 in 047bf49

    
           model_file = _merge_sharded_checkpoints(sharded_ckpt_cached_folder, sharded_metadata)

Essentially, we shouldn't have to merge sharded checkpoints even if it's quantized.

This will also allow us to more generally use keep_module_in_fp32 for sharded checkpoints. Currently, we have this logic for casting a model (which is tested thoroughly):

diffusers/src/diffusers/models/modeling_utils.py

Line 997 in 43534a8

    
           elif torch_dtype is not None and hf_quantizer is None and not use_keep_in_fp32_modules:

When using load_model_dict_into_meta(), we do consider keep_module_in_fp32:

diffusers/src/diffusers/models/model_loading_utils.py

Line 177 in 43534a8

keep_in_fp32_modules=None,

But since for sharded checkpoints, we use load_checkpoint_and_dispatch(), there is no way to pass keep_module_in_fp32:
https://huggingface.co/docs/accelerate/main/en/package_reference/big_modeling#accelerate.load_checkpoint_and_dispatch

As discussed with @SunMarc, it's better to uniformize this so that we don't have to maintain two different codepaths and rely completely on load_model_dict_into_meta(). Marc has kindly agreed to open a PR to attempt this (this could be done in a series of PRs if needed). But I will join if any help is needed.

The text was updated successfully, but these errors were encountered:

sayakpaul · 2025-01-03T02:27:27Z

@huggingface/diffusers Marc has started working on this 🥳

sayakpaul assigned sayakpaul and SunMarc Nov 25, 2024

sayakpaul added the refactor label Nov 25, 2024

This was referenced Nov 25, 2024

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

Open

[core] TorchAO Quantizer #10009

Merged

sayakpaul added the roadmap Add to current release roadmap label Dec 11, 2024

github-project-automation bot added this to Diffusers Roadmap 0.33 Dec 11, 2024

DN6 moved this to Future Release in Diffusers Roadmap 0.33 Dec 12, 2024

a-r-r-o-w mentioned this issue Dec 17, 2024

Add support for sharded models when TorchAO quantization is enabled #10256

Merged

a-r-r-o-w mentioned this issue Dec 24, 2024

Fix TorchAO related bugs; revert device_map changes #10371

Merged

sayakpaul mentioned this issue Dec 27, 2024

[FEAT] DDUF format #10037

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] refactor model loading #10013

[Core] refactor model loading #10013

sayakpaul commented Nov 25, 2024 •

edited

Loading

sayakpaul commented Jan 3, 2025

[Core] refactor model loading #10013

[Core] refactor model loading #10013

Comments

sayakpaul commented Nov 25, 2024 • edited Loading

sayakpaul commented Jan 3, 2025

sayakpaul commented Nov 25, 2024 •

edited

Loading