Consider top-level buffers when computing `infer_auto_device_map` #792

younesbelkada · 2022-10-26T23:49:33Z

What does this PR do?

This PR adds list(model._buffers) inside modules_to_treat when computing the auto_device_map. This scenario occured when I tried to add accelerate support for BART-like models when the final_logits_bias is registered as a buffer and is different than a uint type. It seems that we need to assign a device to this buffer.

The other solution is to "force-ignore" the buffer in check_device_map here since the tensors that are in model._buffers are stored in the state_dict.

cc @sgugger @muellerzr

slow tests from tests/test_bigmodeling.py pass!

HuggingFaceDocBuilderDev · 2022-10-26T23:53:00Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Can you explain how some buffers don't end up in model._buffers because I don't fully understand that part.

younesbelkada · 2022-10-27T13:44:17Z

So if I understood it correctly, if you have some modules such as nn.BatchNorm in your model (as it is done in the accelerate CI test), the buffers running_mean and running_var will not be stored inside model._buffers but in model.named_buffers(). That is why I had to "filter out" the buffers by considering only the ones that are inside model._buffers and model.named_buffers()

Here is an example that I have quickly tried:

model = nn.Sequential(nn.Linear(1, 1), nn.BatchNorm1d(1), nn.Embedding(1, 1), nn.LayerNorm(1)) 
print(list(model.named_buffers()))
>>>[('1.running_mean', tensor([0.])), ('1.running_var', tensor([1.])), ('1.num_batches_tracked', tensor(0))]
print(list(model._buffers))
>>> []
model.register_buffer("position_bias", torch.ones(1))
print(list(model._buffers))
>>> ['position_bias']
print(list(model.named_buffers()))
>>> [('position_bias', tensor([1.])), ('1.running_mean', tensor([0.])), ('1.running_var', tensor([1.])), ('1.num_batches_tracked', tensor(0))]

sgugger · 2022-10-27T14:55:07Z

I think in this case, it's just the difference between named_buffers(recurse=True) and named_buffers(recurse=False). I'm not convinced this fix is the right fix, so would like to learn more what is failing.

younesbelkada · 2022-10-27T15:28:40Z

Ah yes I see, you're probably right here! Let me dig a bit more and get back to you here

younesbelkada · 2022-10-27T16:14:00Z

@sgugger I might have more clue on what is failing
I think that the problem comes from the fact that the infer_auto_device_map does take into account only modules and submodules, I have made a script below to better illustrate the problem

import torch.nn as nn
import torch
from accelerate.utils import infer_auto_device_map
from accelerate.big_modeling import dispatch_model

class SubModule(nn.Module):
    def __init__(self):
        super().__init__()

        self.register_buffer("position_bias", torch.ones(1, 1000))

class Model(nn.Module):
    def __init__(self, wrap_module=True):
        super().__init__()
        self.l1 = nn.Linear(1000, 1000)
        self.l2 = nn.Linear(1000, 1000)
        self.l3 = nn.Linear(1000, 1000)

        self.bn1 = nn.BatchNorm1d(1000)
        self.bn2 = nn.BatchNorm1d(1000)

        if wrap_module:
            self.position_bias = SubModule()
        else:
            self.register_buffer("position_bias", torch.ones(1, 1000))

# Test 1: wrapping with a module - this will pass
model = Model()
device_map = infer_auto_device_map(model, {0:"10MB", "cpu":"100MB"})
model = dispatch_model(model, device_map)

# Test 2: below will fail
model = Model(wrap_module=False)
device_map = infer_auto_device_map(model, {0:"10MB", "cpu":"100MB"})
model = dispatch_model(model, device_map)

Let me know what do you think!

I guess this failed for BartPreTrainedModel since the position_bias buffer is on the parent module itself

sgugger · 2022-10-27T16:18:46Z

Ah, in this case it looks very much like the problem #747 fixed for top-level parameters, so the fix should be pretty similar here too!

- use `model.named_buffers(recurse=False)` instead Co-authored-by: Sylvain Gugger <sgugger@users.noreply.github.com>

sgugger

Perfect, thanks!

younesbelkada · 2022-10-27T21:13:34Z

The whole testing suite (including slow tests) is green! 🟢 Merging !

add buffers support when computing infer_auto_device_map

ec5abee

younesbelkada added 2 commits October 27, 2022 08:52

should fix broken test

db6ab8a

fix broken test

30e44e2

younesbelkada changed the title ~~Add buffers support when computing infer_auto_device_map~~ Add model._buffers support when computing infer_auto_device_map Oct 27, 2022

younesbelkada requested review from sgugger and muellerzr October 27, 2022 09:40

sgugger reviewed Oct 27, 2022

View reviewed changes

simpler solution

771f35b

- use `model.named_buffers(recurse=False)` instead Co-authored-by: Sylvain Gugger <sgugger@users.noreply.github.com>

younesbelkada changed the title ~~Add model._buffers support when computing infer_auto_device_map~~ Consider top-level buffer when computing infer_auto_device_map Oct 27, 2022

younesbelkada changed the title ~~Consider top-level buffer when computing infer_auto_device_map~~ Consider top-level buffers when computing infer_auto_device_map Oct 27, 2022

forward contrib credits from suggestion

20923ca

younesbelkada mentioned this pull request Oct 27, 2022

Add accelerate support for BART-like models huggingface/transformers#19927

Merged

sgugger approved these changes Oct 27, 2022

View reviewed changes

younesbelkada merged commit 415b738 into huggingface:main Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider top-level buffers when computing `infer_auto_device_map` #792

Consider top-level buffers when computing `infer_auto_device_map` #792

younesbelkada commented Oct 26, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 26, 2022 •

edited

Loading

sgugger left a comment

younesbelkada commented Oct 27, 2022 •

edited

Loading

sgugger commented Oct 27, 2022

younesbelkada commented Oct 27, 2022

younesbelkada commented Oct 27, 2022 •

edited

Loading

sgugger commented Oct 27, 2022

sgugger left a comment

younesbelkada commented Oct 27, 2022 •

edited

Loading

Consider top-level buffers when computing infer_auto_device_map #792

Consider top-level buffers when computing infer_auto_device_map #792

Conversation

younesbelkada commented Oct 26, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 26, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

younesbelkada commented Oct 27, 2022 • edited Loading

sgugger commented Oct 27, 2022

younesbelkada commented Oct 27, 2022

younesbelkada commented Oct 27, 2022 • edited Loading

sgugger commented Oct 27, 2022

sgugger left a comment

Choose a reason for hiding this comment

younesbelkada commented Oct 27, 2022 • edited Loading

Consider top-level buffers when computing `infer_auto_device_map` #792

Consider top-level buffers when computing `infer_auto_device_map` #792

younesbelkada commented Oct 26, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 26, 2022 •

edited

Loading

younesbelkada commented Oct 27, 2022 •

edited

Loading

younesbelkada commented Oct 27, 2022 •

edited

Loading

younesbelkada commented Oct 27, 2022 •

edited

Loading