-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nail in edge case of torch dtype being overriden permantly in the case of an error #35845
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this bug. I had a few points of discussion, please LMK what you think.
@@ -245,6 +246,25 @@ def set_zero3_state(): | |||
_is_ds_init_called = False | |||
|
|||
|
|||
def restore_default_torch_dtype(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach should generally work. It would fail if there was ever a case where calling from_pretrained
should purposefully change the default dtype for the rest of the process. I'm not sure if such a use case exist, just wanted to highlight it.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I noticed two more potential issues, please check.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…e of an error (#35845) * Nail in edge case of torch dtype * Rm unused func * Apply suggestions from code review Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * Refactor tests to only mock what we need, don't introduce injection functions * SetUp/TearDown * Do super --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
…e of an error (huggingface#35845) * Nail in edge case of torch dtype * Rm unused func * Apply suggestions from code review Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * Refactor tests to only mock what we need, don't introduce injection functions * SetUp/TearDown * Do super --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
…e of an error (huggingface#35845) * Nail in edge case of torch dtype * Rm unused func * Apply suggestions from code review Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * Refactor tests to only mock what we need, don't introduce injection functions * SetUp/TearDown * Do super --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
What does this PR do?
@BenjaminBossan and I have been trying to hunt down an issue causing some headache over the last few days, as to why some tests were failing if done in specific instances.
What we cornered it to, is if a user:
from_pretrained
or_from_config
), with adtype
other than the torch default dtypeWhat will occur is the torch dtype will permantly change, which is why some users may "discover" this issue and it goes away after a full restart (a likely scenario that could happen). What we currently do doesn't catch if we have an exception, this does, and is used as an additive
This PR introduces a safety decorator called
restore_default_torch_dtype
which will restore whatever torch's original dtype was prior to calling the function (since we restore it anyways at the end).This PR also adds a new
test_injection
function. The aim is for this function to be mocked at critical points where we need testing, and we cannot easily get to it (e.g. in this case, there was only one function we could mock infrom_pretrained
and zero functions we could mock in_from_config
to make sure this fix worked!)Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SunMarc @BenjaminBossan @ArthurZucker