Fix beam search when using model parallel #24969

pfldy2850 · 2023-07-21T05:07:20Z

What does this PR do?

This PR fixes a crash when running beam search on multiple GPUs. Similar issue is also observed and fixed on T5 #11717 and LLama #24224

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @younesbelkada

ArthurZucker

Thanks for the fix!
This seem to apply to a lot of model, would be good to fix them all in one go!

HuggingFaceDocBuilderDev · 2023-07-21T07:09:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

pfldy2850 · 2023-07-21T08:51:53Z

I agree with you!
Which do you think is better, fixing the rest of the model in this PR or creating a new PR fixing the rest?

I am willing to do further work based on your comments.

ArthurZucker · 2023-07-21T09:08:57Z

I think it will be easier to have everything in one PR, given how small and repetitive of a change it is!

pfldy2850 · 2023-07-21T09:37:07Z

@ArthurZucker

I have added fixed commits for every model that required correction, and I have also made modifications to the cookiecutter template.

And I have updated the PR title and content to align with the task.

As there were numerous models that needed correction, there is a possibility that some parts might have been overlooked. Therefore, I would appreciate it if you could review the changes again. Thank you for your attention to this matter.

Code updated

ArthurZucker · 2023-07-21T10:01:02Z

There is no device in tensorflow, let's limit the changes to pytorch!

pfldy2850 · 2023-07-21T10:09:49Z

Oh! Thank you for correcting the mistake.
As you suggested, I have dropped the modifications for the tf base model.

amyeroberts

LGTM - thanks for fixing this!

cc @gante for reference

pfldy2850 · 2023-08-01T08:02:01Z

@ArthurZucker

Is there anything else you'd like me to fix?
I want to use the merged main branch for my work.

ArthurZucker · 2023-08-01T08:13:47Z

Not at all sorry, let me have a final look and I'll merge this!

ArthurZucker

LGTM! My last comment is: let's add a slow test for our nightly CI that has multiple GPU to make sure this is tested (beam search on multiple GPU)! It would go in test_modeling_common.py !

examples/research_projects/onnx/summarization/bart_onnx/generation_onnx.py

pfldy2850 · 2023-08-04T09:31:22Z

Hmm, by the way.
It seems like there's already a test in the script you provided to test beam search on multi GPUs.

https://github.com/pfldy2850/transformers/blob/e07126aac6840568b0db0b369d199f3a0cefa28f/tests/test_modeling_common.py#L2468-L2494

Why was this test not conducted for this issue beforehand?

ArthurZucker · 2023-08-04T12:21:43Z

Your are right! Might be the test_model_parallel, it is set False by default

pfldy2850 · 2023-08-08T03:31:28Z

@ArthurZucker

What do you think about setting the test_model_parallel=True in the existing modeling test file instead of creating a new test?

ArthurZucker · 2023-08-08T08:50:45Z

Great idea, the problem is that this might also trigger other tests, and there might be a reason why don't test them (maybe too slow / model doesn't need these test as it is not used that much). Pinging @amyeroberts for a final answer 🤗

amyeroberts · 2023-08-08T10:35:55Z

OK, I think this is hitting on some areas of our testing suite which are non-obvious and / or need updating.

AFAICT, there are only two models which have test_model_parallel=True, GPT2 and T5. The tests which check this flag, both use a deprecated method model.parallelize - 1, 2- and so this flag is to control testing for backwards compatibility features.

We have another test which tests for parallelism which doesn't check test_model_parallel: test_model_parallelism, which is an accelerate test and checks if the model has _no_split_modules implemented.

In addition, generate specific tests should be added to GenerationTesterMixin, rather than ModelTesterMixin, as only models with .generate methods should be tested.

What I would suggest is:

Moving test_model_parallel_beam_search to GenerationTesterMixin.
Update the modelings tests so that each of the model's updated in this PR have all_generative_model_classes added as attributes to their model tester.
Update test_model_parallel_beam_search to check if the model class has _no_split_modules implemented. If not, skip. If it does, then load using device_map="auto" instead of test_model_parallel.
Make sure to mark that the test requires multi gpus

pfldy2850 · 2023-08-23T17:36:20Z

@amyeroberts @ArthurZucker

I'm sorry, it has been delayed due to busy work.

I've done the fix as you said.

test_model_parallel_beam_search function has been moved from ModelTextMixin to GenerationMixin.
Based on whether _no_split_modules is implemented, skip logic has been written.
Parallelized to multiple devices using device_map="auto".
Errors caused by the new test have been fixed.

amyeroberts

Thanks for iterating and working on updating & improving the docs!

Changes look good to me - just a comment on the default values in the test. Once that done and @ArthurZucker has approved we can merge :)

tests/generation/test_utils.py

dev-cotyledon · 2023-09-04T11:22:15Z

Is there any update?
I need this feature for my academic project.

amyeroberts · 2023-09-05T11:45:51Z

Friendly ping @ArthurZucker :)

amyeroberts · 2023-09-05T11:55:12Z

@dev-cotyledon If you need this for your project, it's possible to work from this branch until it's been merged into main:

Clone the repo: git clone git@github.com:huggingface/transformers.git
Create a local environment e.g. python -m venv my-env
Install packages from source in dev mode cd transformers && pip install -e .
Add fork to remote git remote add pfldy2850 https://github.com/pfldy2850/transformers.git
Fetch this branch git fetch pfldy2850 fix-gpt-neox-parallelize-beam
Checkout this branch git checkout fix-gpt-neox-parallelize-beam

Your environment will now be running the version of transformers of this branch.

dev-cotyledon · 2023-09-05T12:08:05Z

How kind of you! Thanks a lot sir 🙏

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

pfldy2850 · 2023-09-14T05:02:28Z

I had to modify the author of the commit, so I amend and then force push.

amyeroberts · 2023-09-14T11:16:04Z

@ArthurZucker Are you happy for us to merge?

ArthurZucker · 2023-09-14T14:50:35Z

Yep! Sorry must have missed the ping 😢

ArthurZucker

Thanks for the contribution! (I'm just running the slow tests to make sure we don´t have a bad surprise)

ArthurZucker · 2023-09-14T15:01:16Z

Thanks for the fix @pfldy2850, tests are all green so merged the PR 😉

* Fix GPTNeoX beam search when using parallelize * Fix beam search idx device when using model parallel * remove onnx related stuff Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin * fix: add right item to _no_split_modules of MegaPreTrainedModel * fix: add num_beams within parallelized beam_search test Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ArthurZucker previously approved these changes Jul 21, 2023

View reviewed changes

ArthurZucker requested a review from amyeroberts July 21, 2023 06:59

pfldy2850 changed the title ~~Fix GPTNeoX beam search when using parallelize~~ Fix beam search when using model parallel Jul 21, 2023

pfldy2850 force-pushed the fix-gpt-neox-parallelize-beam branch from 68a41f5 to 9e978c4 Compare July 21, 2023 10:08

amyeroberts reviewed Jul 21, 2023

View reviewed changes

amyeroberts approved these changes Jul 21, 2023

View reviewed changes

ArthurZucker reviewed Aug 3, 2023

View reviewed changes

examples/research_projects/onnx/summarization/bart_onnx/generation_onnx.py Outdated Show resolved Hide resolved

pfldy2850 force-pushed the fix-gpt-neox-parallelize-beam branch from 6d0a7d7 to d6c50e2 Compare August 23, 2023 17:11

amyeroberts approved these changes Aug 25, 2023

View reviewed changes

tests/generation/test_utils.py Show resolved Hide resolved

pfldy2850 force-pushed the fix-gpt-neox-parallelize-beam branch from d98a286 to d9f7c55 Compare September 7, 2023 05:36

Fix GPTNeoX beam search when using parallelize

0bdac84

pfldy2850 and others added 5 commits September 7, 2023 14:40

Fix beam search idx device when using model parallel

82aa0ee

remove onnx related stuff

f82529c

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin

79a2afb

fix: add right item to _no_split_modules of MegaPreTrainedModel

f72e5b7

fix: add num_beams within parallelized beam_search test

88089af

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

pfldy2850 force-pushed the fix-gpt-neox-parallelize-beam branch from d9f7c55 to 88089af Compare September 7, 2023 05:41

ArthurZucker approved these changes Sep 14, 2023

View reviewed changes

ArthurZucker merged commit 8881f38 into huggingface:main Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix beam search when using model parallel #24969

Fix beam search when using model parallel #24969

pfldy2850 commented Jul 21, 2023 •

edited

Loading

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Jul 21, 2023

pfldy2850 commented Jul 21, 2023

ArthurZucker commented Jul 21, 2023

pfldy2850 commented Jul 21, 2023

ArthurZucker commented Jul 21, 2023

pfldy2850 commented Jul 21, 2023

amyeroberts left a comment

pfldy2850 commented Aug 1, 2023 •

edited

Loading

ArthurZucker commented Aug 1, 2023

ArthurZucker left a comment

pfldy2850 commented Aug 4, 2023

ArthurZucker commented Aug 4, 2023

pfldy2850 commented Aug 8, 2023

ArthurZucker commented Aug 8, 2023

amyeroberts commented Aug 8, 2023

pfldy2850 commented Aug 23, 2023 •

edited

Loading

amyeroberts left a comment

dev-cotyledon commented Sep 4, 2023 •

edited

Loading

amyeroberts commented Sep 5, 2023

amyeroberts commented Sep 5, 2023

dev-cotyledon commented Sep 5, 2023

pfldy2850 commented Sep 14, 2023

amyeroberts commented Sep 14, 2023

ArthurZucker commented Sep 14, 2023

ArthurZucker left a comment •

edited

Loading

ArthurZucker commented Sep 14, 2023

Fix beam search when using model parallel #24969

Fix beam search when using model parallel #24969

Conversation

pfldy2850 commented Jul 21, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 21, 2023

pfldy2850 commented Jul 21, 2023

ArthurZucker commented Jul 21, 2023

pfldy2850 commented Jul 21, 2023

ArthurZucker commented Jul 21, 2023

pfldy2850 commented Jul 21, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

pfldy2850 commented Aug 1, 2023 • edited Loading

ArthurZucker commented Aug 1, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

pfldy2850 commented Aug 4, 2023

ArthurZucker commented Aug 4, 2023

pfldy2850 commented Aug 8, 2023

ArthurZucker commented Aug 8, 2023

amyeroberts commented Aug 8, 2023

pfldy2850 commented Aug 23, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

dev-cotyledon commented Sep 4, 2023 • edited Loading

amyeroberts commented Sep 5, 2023

amyeroberts commented Sep 5, 2023

dev-cotyledon commented Sep 5, 2023

pfldy2850 commented Sep 14, 2023

amyeroberts commented Sep 14, 2023

ArthurZucker commented Sep 14, 2023

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

ArthurZucker commented Sep 14, 2023

pfldy2850 commented Jul 21, 2023 •

edited

Loading

pfldy2850 commented Aug 1, 2023 •

edited

Loading

pfldy2850 commented Aug 23, 2023 •

edited

Loading

dev-cotyledon commented Sep 4, 2023 •

edited

Loading

ArthurZucker left a comment •

edited

Loading