-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix beam search when using model parallel #24969
Fix beam search when using model parallel #24969
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
This seem to apply to a lot of model, would be good to fix them all in one go!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
I agree with you! I am willing to do further work based on your comments. |
I think it will be easier to have everything in one PR, given how small and repetitive of a change it is! |
I have added fixed commits for every model that required correction, and I have also made modifications to the cookiecutter template. And I have updated the PR title and content to align with the task. As there were numerous models that needed correction, there is a possibility that some parts might have been overlooked. Therefore, I would appreciate it if you could review the changes again. Thank you for your attention to this matter. |
There is no |
68a41f5
to
9e978c4
Compare
Oh! Thank you for correcting the mistake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for fixing this!
cc @gante for reference
Is there anything else you'd like me to fix? |
Not at all sorry, let me have a final look and I'll merge this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! My last comment is: let's add a slow test for our nightly CI that has multiple GPU to make sure this is tested (beam search on multiple GPU)! It would go in test_modeling_common.py
!
examples/research_projects/onnx/summarization/bart_onnx/generation_onnx.py
Outdated
Show resolved
Hide resolved
Hmm, by the way. Why was this test not conducted for this issue beforehand? |
Your are right! Might be the |
What do you think about setting the |
Great idea, the problem is that this might also trigger other tests, and there might be a reason why don't test them (maybe too slow / model doesn't need these test as it is not used that much). Pinging @amyeroberts for a final answer 🤗 |
OK, I think this is hitting on some areas of our testing suite which are non-obvious and / or need updating. AFAICT, there are only two models which have We have another test which tests for parallelism which doesn't check In addition, generate specific tests should be added to GenerationTesterMixin, rather than ModelTesterMixin, as only models with What I would suggest is:
|
6d0a7d7
to
d6c50e2
Compare
I'm sorry, it has been delayed due to busy work. I've done the fix as you said.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating and working on updating & improving the docs!
Changes look good to me - just a comment on the default values in the test. Once that done and @ArthurZucker has approved we can merge :)
Is there any update? |
Friendly ping @ArthurZucker :) |
@dev-cotyledon If you need this for your project, it's possible to work from this branch until it's been merged into main:
Your environment will now be running the version of transformers of this branch. |
How kind of you! Thanks a lot sir 🙏 |
d98a286
to
d9f7c55
Compare
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
d9f7c55
to
88089af
Compare
I had to modify the author of the commit, so I amend and then force push. |
@ArthurZucker Are you happy for us to merge? |
Yep! Sorry must have missed the ping 😢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! (I'm just running the slow tests to make sure we don´t have a bad surprise)
Thanks for the fix @pfldy2850, tests are all green so merged the PR 😉 |
* Fix GPTNeoX beam search when using parallelize * Fix beam search idx device when using model parallel * remove onnx related stuff Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin * fix: add right item to _no_split_modules of MegaPreTrainedModel * fix: add num_beams within parallelized beam_search test Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
What does this PR do?
This PR fixes a crash when running beam search on multiple GPUs. Similar issue is also observed and fixed on T5 #11717 and LLama #24224
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @younesbelkada