-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix accelerator prepare during eval only mode #24014
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this changes the whole logic of the evaluation in the Trainer and should not be done.
src/transformers/trainer.py
Outdated
|
||
model = self._wrap_model(self.model, training=False, dataloader=dataloader) | ||
|
||
if len(self.accelerator._models) == 0 and model is self.model: | ||
model = self.accelerator.prepare(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we only want to do this for DeepSpeed, not all the time. Putting a model in DistributedDataParallel
just for evaluation will waste some memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree on the DDP case and hence I didn't update it earlier but as mentioned below we will be missing mixed precision coverage for eval-only mode
The thing is that mixed precision application for eval only mode won't work unless we prepare model |
* fix mixed precision prep during eval only mode * update to address comments * update to reflect the changes in accelerate
What does this PR do?
prepare
method is happening only during training loop. If the user is directly doingeval
/predict
without the training loop, the model isn't prepared leading to wrong behaviour. This PR is aimed at fixing it.evaluation_mode
accelerate#1540