Arde/fsdp activation checkpointing #25771

arde171 · 2023-08-25T23:28:37Z

What does this PR do?

Currently, HF Trainer didn't support FSDP activation checkpointing. This PR provides support to FSDP activation checkpointing.
Please see the details about the FSDP activation checkpointing here.
I saw the improvement in training performance for the large LLM models (e.g., LLAMA 70B) with FSDP activation checkpointing as compared to the existing gradient_checkpointing option. It's easy to enable FSDP activation_checkpointing.

we just need to add "activation_checkpointing": "True" to enable the FSDP activation_checkpointing as shown in below example fsdp_config.json file.

fsdp_config.json

{
  "transformer_layer_cls_to_wrap": ["LlamaDecoderLayer"],
  ...
  "activation_checkpointing": "True"
}

Please see the below PR for more details about FSDP activation checkpointing in accelerate repo:
PR: huggingface/accelerate#1891

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2023-08-28T06:55:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ArthurZucker · 2023-08-28T07:07:01Z

Cc @pacman100 if you think this is relevant 🤗

pacman100

Thank you @arde171 for adding this. Please raise an error if both activation_checkpointing in FSDP config and training arg gradient_checkpointing are set to True. The error should mention that both can't be set to True and to use FSDP's checkpointing logic when using FSDP.

src/transformers/trainer.py

pacman100

Thank you @arde171!

* add FSDP config option to enable activation-checkpointing * update docs * add checks and remove redundant code * fix formatting error

arde171 added 2 commits August 25, 2023 12:30

add FSDP config option to enable activation-checkpointing

b3b79b6

update docs

d1d6c66

pacman100 reviewed Aug 28, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

arde171 added 2 commits August 28, 2023 18:23

add checks and remove redundant code

d283fda

fix formatting error

aad11bb

pacman100 approved these changes Aug 29, 2023

View reviewed changes

pacman100 merged commit 738ecd1 into huggingface:main Aug 29, 2023
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arde/fsdp activation checkpointing #25771

Arde/fsdp activation checkpointing #25771

arde171 commented Aug 25, 2023

HuggingFaceDocBuilderDev commented Aug 28, 2023

ArthurZucker commented Aug 28, 2023

pacman100 left a comment

pacman100 left a comment

Arde/fsdp activation checkpointing #25771

Arde/fsdp activation checkpointing #25771

Conversation

arde171 commented Aug 25, 2023

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 28, 2023

ArthurZucker commented Aug 28, 2023

pacman100 left a comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment