Deepcopy on Accelerator to return self #1694
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of slack thread:
A skorch user had some issues with using accelerate. To cut the story short, the general setup is multi-gpu training on a single node. During evaluation, when accelerator.gather_for_metrics is called, the excess samples are not being truncated. I was digging through the code and saw the check here:
accelerate/src/accelerate/accelerator.py
Lines 1843 to 1856 in ecd1288
The problem is that the accelerator.gradient_state never thinks that end_of_dataloader is True and remainder is always -1, so naturally the condition is never triggered. Strangely, when I inspect the
gradient_state
of the DataLoader instance, it correctly determines end_of_dataloader and remainder. If I do a hacky accelerator.gradient_state = dataloader.gradient_state, the issue goes away.I suspect that the two gradient states should not diverge. Probably the prepare step is somehow incorrect, leading to this. Any ideas if my suspicion is correct and do I need to pay attention to anything specific during the prepare step?
Post discussion:
I found a potentially simple solution to the problem, but I wanted to check first if you believe it works correctly. The solution is to add a
__deepcopy__
method to the Accelerator class, so basically:I'm not sure what deepcopy does by default that's different, but this solves the problem. Do you think this is safe to use and would it be possible to add a deepcopy method to Accelerator?
I'm attaching a minimal example to reproduce the error and the fix (it still contains sklearn stuff, but skorch is completely removed, so should be mostly straightforward). Without the fix in line 120, I get the error described above with the gradient states diverging.
Solution:
The proposed solution (which works) is to have
Accelerator
use a__deepcopy__
method which will returnself
. Considering we only want to initialize an accelerator object once (unless under very extreme circumstances), returningself
here is what we want because after deepcopying the ID for everything is exactly the same, so it's just another reference object with everything in-tact.If this somehow breaks what downward users are doing, it's good to have this here because users shouldn't be really deepcopying the Accelerator object anyways, as that adds headaches with clearing memory, stored references, etc, etc.
cc @BenjaminBossan
Alternative:
Another alternative approach which is much more isolated than all of
deepcopy
is to use__sklearn_clone__
, which is specific tosklearn
and shouldn't break the existing code which should do the same, or have skorch apply their own Accelerator class which adds this, and removes the issue fromaccelerate
directly