Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate check when wrapping layers for FSDP #2947

Closed
2 of 4 tasks
fc-jian opened this issue Jul 19, 2024 · 3 comments
Closed
2 of 4 tasks

Inappropriate check when wrapping layers for FSDP #2947

fc-jian opened this issue Jul 19, 2024 · 3 comments

Comments

@fc-jian
Copy link

fc-jian commented Jul 19, 2024

System Info

- `Accelerate` version: 0.32.1
- Platform: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17
- `accelerate` bash location: /home/user/miniforge3/envs/torch231/bin/accelerate
- Python version: 3.12.4
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.1 (False)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 186.87 GB
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: FSDP
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - fsdp_config: {'fsdp_activation_checkpointing': False, 'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch': 'BACKWARD_PRE', 'fsdp_cpu_ram_efficient_loading': True, 'fsdp_forward_prefetch': False, 'fsdp_offload_params': False, 'fsdp_sharding_strategy': 'HYBRID_SHARD_ZERO2', 'fsdp_state_dict_type': 'SHARDED_STATE_DICT', 'fsdp_sync_module_states': True, 'fsdp_use_orig_params': True}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

In the following code from utils/dataclasses.py

    def set_auto_wrap_policy(self, model):
        from torch.distributed.fsdp.wrap import size_based_auto_wrap_policy, transformer_auto_wrap_policy

        default_transformer_cls_names_to_wrap = (
            ",".join(model._no_split_modules) if getattr(model, "_no_split_modules", None) is not None else ""
        )
        if self.auto_wrap_policy is None:
            auto_wrap_policy = os.environ.get("FSDP_AUTO_WRAP_POLICY", "NO_WRAP")
            if auto_wrap_policy == FSDP_AUTO_WRAP_POLICY[0]:
                transformer_cls_names_to_wrap = os.environ.get(
                    "FSDP_TRANSFORMER_CLS_TO_WRAP", default_transformer_cls_names_to_wrap
                ).split(",")
                transformer_cls_to_wrap = set()
                for layer_class in transformer_cls_names_to_wrap:
                    transformer_cls = get_module_class_from_name(model, layer_class)
                    if transformer_cls is None:
                        raise Exception("Could not find the transformer layer class to wrap in the model.")
        ...

It requires that all layers specified in model._no_split_modules must be observed in the model. However, several transformers models have variants that do not contain all types of layers specified in _no_split_modules (which is usually defined in the XXXPretrainedModel). This will lead to the Exception("Could not find the transformer layer class to wrap in the model.") even if my model does contain some (but not all) of the layers specified in model._no_split_modules.

I found this when playing with EsmModel in transformers, which defines a _no_split_modules layer named EsmFoldTriangularSelfAttentionBlock in EsmPretrainedModel, but not all ESM models contain this layer.

Expected behavior

I added the following code before constructing the Trainer to remove the layers in _no_split_modules that are not included in the model, and everything goes well.

from accelerate.utils.dataclasses import get_module_class_from_name
_update_wrap_layers = []
    for layer in model._no_split_modules:
        if get_module_class_from_name(model, layer) is not None:
            _update_wrap_layers.append(layer)
    model._no_split_modules = _update_wrap_layers

I have no idea if it should be a problem of accelerate or models in transformers, so I did not directly start a PR.

@iAaronLau
Copy link

Thanks @fc-jian,

Here I found that FSDP_TRANSFORMER_CLS_TO_WRAP should not be defined in the environment (e.g. ~/.cache/huggingface/accelerate/default_config.yaml), or this code block will not take effect.

Because the transformer_cls_names_to_wrap will be overwritten by FSDP_TRANSFORMER_CLS_TO_WRAP at:

  transformer_cls_names_to_wrap = os.environ.get(
      "FSDP_TRANSFORMER_CLS_TO_WRAP", default_transformer_cls_names_to_wrap
  ).split(",")

where the default_transformer_cls_names_to_wrap as a default value, and FSDP_TRANSFORMER_CLS_TO_WRAP is not None.

@muellerzr
Copy link
Collaborator

Hi all, I think #2998 might assist with this, if I’m understanding right? :)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants