You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
In the following code from utils/dataclasses.py
defset_auto_wrap_policy(self, model):
fromtorch.distributed.fsdp.wrapimportsize_based_auto_wrap_policy, transformer_auto_wrap_policydefault_transformer_cls_names_to_wrap= (
",".join(model._no_split_modules) ifgetattr(model, "_no_split_modules", None) isnotNoneelse""
)
ifself.auto_wrap_policyisNone:
auto_wrap_policy=os.environ.get("FSDP_AUTO_WRAP_POLICY", "NO_WRAP")
ifauto_wrap_policy==FSDP_AUTO_WRAP_POLICY[0]:
transformer_cls_names_to_wrap=os.environ.get(
"FSDP_TRANSFORMER_CLS_TO_WRAP", default_transformer_cls_names_to_wrap
).split(",")
transformer_cls_to_wrap=set()
forlayer_classintransformer_cls_names_to_wrap:
transformer_cls=get_module_class_from_name(model, layer_class)
iftransformer_clsisNone:
raiseException("Could not find the transformer layer class to wrap in the model.")
...
It requires that all layers specified in model._no_split_modules must be observed in the model. However, several transformers models have variants that do not contain all types of layers specified in _no_split_modules (which is usually defined in the XXXPretrainedModel). This will lead to the Exception("Could not find the transformer layer class to wrap in the model.") even if my model does contain some (but not all) of the layers specified in model._no_split_modules.
I found this when playing with EsmModel in transformers, which defines a _no_split_modules layer named EsmFoldTriangularSelfAttentionBlock in EsmPretrainedModel, but not all ESM models contain this layer.
Expected behavior
I added the following code before constructing the Trainer to remove the layers in _no_split_modules that are not included in the model, and everything goes well.
Here I found that FSDP_TRANSFORMER_CLS_TO_WRAP should not be defined in the environment (e.g. ~/.cache/huggingface/accelerate/default_config.yaml), or this code block will not take effect.
Because the transformer_cls_names_to_wrap will be overwritten by FSDP_TRANSFORMER_CLS_TO_WRAP at:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
In the following code from
utils/dataclasses.py
It requires that all layers specified in
model._no_split_modules
must be observed in the model. However, severaltransformers
models have variants that do not contain all types of layers specified in_no_split_modules
(which is usually defined in theXXXPretrainedModel
). This will lead to theException("Could not find the transformer layer class to wrap in the model.")
even if my model does contain some (but not all) of the layers specified inmodel._no_split_modules
.I found this when playing with
EsmModel
intransformers
, which defines a_no_split_modules
layer namedEsmFoldTriangularSelfAttentionBlock
inEsmPretrainedModel
, but not all ESM models contain this layer.Expected behavior
I added the following code before constructing the
Trainer
to remove the layers in_no_split_modules
that are not included in the model, and everything goes well.I have no idea if it should be a problem of
accelerate
or models intransformers
, so I did not directly start a PR.The text was updated successfully, but these errors were encountered: