Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider top-level buffers when computing infer_auto_device_map #792

Merged
merged 5 commits into from
Oct 27, 2022

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Oct 26, 2022

What does this PR do?

This PR adds list(model._buffers) inside modules_to_treat when computing the auto_device_map. This scenario occured when I tried to add accelerate support for BART-like models when the final_logits_bias is registered as a buffer and is different than a uint type. It seems that we need to assign a device to this buffer.

The other solution is to "force-ignore" the buffer in check_device_map here since the tensors that are in model._buffers are stored in the state_dict.

cc @sgugger @muellerzr

slow tests from tests/test_bigmodeling.py pass!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 26, 2022

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada changed the title Add buffers support when computing infer_auto_device_map Add model._buffers support when computing infer_auto_device_map Oct 27, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how some buffers don't end up in model._buffers because I don't fully understand that part.

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Oct 27, 2022

So if I understood it correctly, if you have some modules such as nn.BatchNorm in your model (as it is done in the accelerate CI test), the buffers running_mean and running_var will not be stored inside model._buffers but in model.named_buffers(). That is why I had to "filter out" the buffers by considering only the ones that are inside model._buffers and model.named_buffers()

Here is an example that I have quickly tried:

model = nn.Sequential(nn.Linear(1, 1), nn.BatchNorm1d(1), nn.Embedding(1, 1), nn.LayerNorm(1)) 
print(list(model.named_buffers()))
>>>[('1.running_mean', tensor([0.])), ('1.running_var', tensor([1.])), ('1.num_batches_tracked', tensor(0))]
print(list(model._buffers))
>>> []
model.register_buffer("position_bias", torch.ones(1))
print(list(model._buffers))
>>> ['position_bias']
print(list(model.named_buffers()))
>>> [('position_bias', tensor([1.])), ('1.running_mean', tensor([0.])), ('1.running_var', tensor([1.])), ('1.num_batches_tracked', tensor(0))]

@sgugger
Copy link
Collaborator

sgugger commented Oct 27, 2022

I think in this case, it's just the difference between named_buffers(recurse=True) and named_buffers(recurse=False). I'm not convinced this fix is the right fix, so would like to learn more what is failing.

@younesbelkada
Copy link
Contributor Author

Ah yes I see, you're probably right here! Let me dig a bit more and get back to you here

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Oct 27, 2022

@sgugger I might have more clue on what is failing
I think that the problem comes from the fact that the infer_auto_device_map does take into account only modules and submodules, I have made a script below to better illustrate the problem

import torch.nn as nn
import torch
from accelerate.utils import infer_auto_device_map
from accelerate.big_modeling import dispatch_model

class SubModule(nn.Module):
    def __init__(self):
        super().__init__()

        self.register_buffer("position_bias", torch.ones(1, 1000))

class Model(nn.Module):
    def __init__(self, wrap_module=True):
        super().__init__()
        self.l1 = nn.Linear(1000, 1000)
        self.l2 = nn.Linear(1000, 1000)
        self.l3 = nn.Linear(1000, 1000)

        self.bn1 = nn.BatchNorm1d(1000)
        self.bn2 = nn.BatchNorm1d(1000)

        if wrap_module:
            self.position_bias = SubModule()
        else:
            self.register_buffer("position_bias", torch.ones(1, 1000))

# Test 1: wrapping with a module - this will pass
model = Model()
device_map = infer_auto_device_map(model, {0:"10MB", "cpu":"100MB"})
model = dispatch_model(model, device_map)

# Test 2: below will fail
model = Model(wrap_module=False)
device_map = infer_auto_device_map(model, {0:"10MB", "cpu":"100MB"})
model = dispatch_model(model, device_map)

Let me know what do you think!

I guess this failed for BartPreTrainedModel since the position_bias buffer is on the parent module itself

@sgugger
Copy link
Collaborator

sgugger commented Oct 27, 2022

Ah, in this case it looks very much like the problem #747 fixed for top-level parameters, so the fix should be pretty similar here too!

- use `model.named_buffers(recurse=False)` instead
Co-authored-by: Sylvain Gugger <sgugger@users.noreply.github.com>
@younesbelkada younesbelkada changed the title Add model._buffers support when computing infer_auto_device_map Consider top-level buffer when computing infer_auto_device_map Oct 27, 2022
@younesbelkada younesbelkada changed the title Consider top-level buffer when computing infer_auto_device_map Consider top-level buffers when computing infer_auto_device_map Oct 27, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Oct 27, 2022

The whole testing suite (including slow tests) is green! 🟢 Merging !

@younesbelkada younesbelkada merged commit 415b738 into huggingface:main Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants