Deepcopy on Accelerator to return self #1694

muellerzr · 2023-07-07T15:34:33Z

Summary of slack thread:

A skorch user had some issues with using accelerate. To cut the story short, the general setup is multi-gpu training on a single node. During evaluation, when accelerator.gather_for_metrics is called, the excess samples are not being truncated. I was digging through the code and saw the check here:

accelerate/src/accelerate/accelerator.py

Lines 1843 to 1856 in ecd1288

    
           if self.use_distributed: 
        
               if self.gradient_state.remainder == -1: 
        
                   logger.info( 
        
                       "The used dataset had no length, returning gathered tensors. You should drop the remainder yourself." 
        
                   ) 
        
                   return tensor 
        
               try: 
        
                   # Then see if we're on the last batch of our eval dataloader 
        
                   if self.gradient_state.end_of_dataloader and self.gradient_state.remainder > 0: 
        
                       # Last batch needs to be truncated on distributed systems as it contains additional samples 
        
                       def _adjust_samples(tensor): 
        
                           return tensor[: self.gradient_state.remainder] 
        
                       return recursively_apply(_adjust_samples, tensor)

The problem is that the accelerator.gradient_state never thinks that end_of_dataloader is True and remainder is always -1, so naturally the condition is never triggered. Strangely, when I inspect the gradient_state of the DataLoader instance, it correctly determines end_of_dataloader and remainder. If I do a hacky accelerator.gradient_state = dataloader.gradient_state, the issue goes away.
I suspect that the two gradient states should not diverge. Probably the prepare step is somehow incorrect, leading to this. Any ideas if my suspicion is correct and do I need to pay attention to anything specific during the prepare step?

Post discussion:

I found a potentially simple solution to the problem, but I wanted to check first if you believe it works correctly. The solution is to add a __deepcopy__ method to the Accelerator class, so basically:

class MyAccelerator(Accelerator):
    def __deepcopy__(self, memo):
        cls = type(self)
        instance = cls()  # <= this ignores non default args for now
        return instance

I'm not sure what deepcopy does by default that's different, but this solves the problem. Do you think this is safe to use and would it be possible to add a deepcopy method to Accelerator?
I'm attaching a minimal example to reproduce the error and the fix (it still contains sklearn stuff, but skorch is completely removed, so should be mostly straightforward). Without the fix in line 120, I get the error described above with the gradient states diverging.

Solution:

The proposed solution (which works) is to have Accelerator use a __deepcopy__ method which will return self. Considering we only want to initialize an accelerator object once (unless under very extreme circumstances), returning self here is what we want because after deepcopying the ID for everything is exactly the same, so it's just another reference object with everything in-tact.

If this somehow breaks what downward users are doing, it's good to have this here because users shouldn't be really deepcopying the Accelerator object anyways, as that adds headaches with clearing memory, stored references, etc, etc.

cc @BenjaminBossan

Alternative:

Another alternative approach which is much more isolated than all of deepcopy is to use __sklearn_clone__, which is specific to sklearn and shouldn't break the existing code which should do the same, or have skorch apply their own Accelerator class which adds this, and removes the issue from accelerate directly

HuggingFaceDocBuilderDev · 2023-07-07T15:39:14Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Works for me! I don't think this should break anything, but we can keep the second solution in mind if it ends up breaking another users's stuff.

src/accelerate/accelerator.py

Deepcopy

0cda42e

muellerzr requested a review from sgugger July 7, 2023 15:34

Clean

0cbce7a

sgugger approved these changes Jul 11, 2023

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

deepcopy

f668c28

muellerzr merged commit a4c2654 into main Jul 11, 2023
26 checks passed

muellerzr deleted the deepcopy branch July 11, 2023 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepcopy on Accelerator to return self #1694

Deepcopy on Accelerator to return self #1694

muellerzr commented Jul 7, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 7, 2023 •

edited

Loading

sgugger left a comment

	if self.use_distributed:
	if self.gradient_state.remainder == -1:
	logger.info(
	"The used dataset had no length, returning gathered tensors. You should drop the remainder yourself."
	)
	return tensor
	try:
	# Then see if we're on the last batch of our eval dataloader
	if self.gradient_state.end_of_dataloader and self.gradient_state.remainder > 0:
	# Last batch needs to be truncated on distributed systems as it contains additional samples
	def _adjust_samples(tensor):
	return tensor[: self.gradient_state.remainder]

	return recursively_apply(_adjust_samples, tensor)

Deepcopy on Accelerator to return self #1694

Deepcopy on Accelerator to return self #1694

Conversation

muellerzr commented Jul 7, 2023 • edited Loading

Summary of slack thread:

Solution:

Alternative:

HuggingFaceDocBuilderDev commented Jul 7, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

muellerzr commented Jul 7, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 7, 2023 •

edited

Loading